-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture FQDN in APM agents #805
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly I think using hostname -f
(for non-Windows) is fine, with a fallback
to hostname
(or whatever technique that uses the gethostname
syscall).
On Windows I don't have a sense of whether the envvars are sufficient replacements
for the win32 API call (see what libbeat does below), but I'm less concerned about Windows.
I have some questions and notes below.
Q1: Does the "avoid using DNS" warning/condition apply now that we prefer getting the FQDN?
Back when this "Hostname" spec section was added (#517) it was mentioned, part way through the discussion, that DNS should be avoided.
Many of the various techniques I've found for getting the FQDN will do a
DNS query. That includes what beats are doing:
- https://github.com/elastic/go-sysinfo/blob/9ea2eba5301c77b511c0475d7137ea3437a55a9a/providers/shared/fqdn.go#L29-L43
- Windows: https://github.com/elastic/go-sysinfo/blob/9ea2eba5301c77b511c0475d7137ea3437a55a9a/providers/windows/host_windows.go#L87-L94 (uses the
ComputerNamePhysicalDnsFullyQualified
field from https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getcomputernameexa)
IIUC, hostname -f
will possibly do a DNS lookup. (ref: https://serverfault.com/a/367496)
A surprising result I found when playing with the libbeat implementation is that in a Docker container on my Mac, it (my small fqdn.go) will get the hostname of my mac, whereas hostname -f
will not.
root@f224d07b5aee:/repo# hostname -f
f224d07b5aee
root@f224d07b5aee:/repo# ./fqdn
2023/06/16 00:04:26 XXX hostname: "f224d07b5aee"
2023/06/16 00:04:26 XXX cname: "pink.local."
2023/06/16 00:04:26 XXX fqdn: "pink.local" <-- it gets "pink.local" from a DNS query of some sort
Q2: Should we also be specifying that the FQDN should be lowercased, as the ECS host.name docs recommend? The libbeat impl is lowercasing.
Q3: (Unrelated to FQDN) Do other implementations set both of metadata.hostname
and metadata.detected_hostname
or do you wait for APM Server version detection to know if the APM server is >=7.4?
Some links to various implementations I've seen or heard, in case it helps others:
- Python has
socket.getfqdn()
which usesgethostname
andgethostbyaddr
. IIUC, when called with no argument to use the local host, that'll always just return whatevergethostname()
returns. - opentelemetry-collector-config has a system metadata provider that uses Showmax/go-fqdn that: first tries using /etc/hosts, then falls back to a DNS query. The DNS query code looks the same as the libbeat info above.
Another argument for lowercasing, at least for Windows is this from MS docs (https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/hostname):
Aside: Yeah Microsoft for (a) the uppercasing and (b) especially for the "usually". |
A1: I think capturing the FQDN will very probably involve a DNS query, while that might add some latency and introduce some flakyness I don't think we have better options. In the cases where it's not reliable the fallback using environment variables (the "standard ones" or our own) could still be used.
A2: we should lower-case, spec has been updated to cover this.
A3: for Java at least we do check the server version >= before sending |
if (hostname == null || hostname.length == 0) | ||
hostname = env.get("HOSTNAME") | ||
if (hostname == null || hostname.length == 0) | ||
hostname = env.get("HOST") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a final nit, I am not implementing this fallback of using the HOSTNAME
and HOST
envvars in the Node.js APM agent.
I couldn't quickly find any references for the HOST
envvar being set by anything authoritative. The HOSTNAME
var is set by Bash, which doesn't strike me as an authority we should use. FWIW, this SO post points out that HOSTNAME
is not an envvar defined by POSIX (https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html).
This earlier spec comment was the only discussion around having the HOSTNAME
and HOST
fallbacks that I could see. I don't know if it originated from what Beats were doing. My understanding is that now beats are using an algorithm that does not use these vars (https://github.com/elastic/beats/blob/c7c18b37eb34f0665f6d9a3b2c7ea4543018ba6a/libbeat/cmd/instance/beat.go#L789-L803 https://github.com/elastic/go-sysinfo/blob/9ea2eba5301c77b511c0475d7137ea3437a55a9a/providers/shared/fqdn.go#L29-L43)
Regarding current usage, it looks to me like only the Java and .NET agents are currently looking at the HOSTNAME and HOST envvars. (FWIW. the .NET agent looks at all of HOSTNAME, HOST, and COMPUTERNAME on all platforms). None of the other agents that I saw are using those envvars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and feedback @trentm !
I agree with you that those variables might definitely not standard nor available to the runtime as they are usually set by shell and are unlikely to be set on all OSes and distributions. I am assuming (slight speculation on my end) the intent to have them was to provide a "fallback just in case", and that they might not be available most of the time.
However, I don't see any major downside to having them in the spec and using them as fallback, even if it's rarely used in practice. So for now I would suggest we keep them as-is and revisit this decision later if that proves to be an issue or a source of extra complexity.
hostname = exec "uname -n" // or any equivalent * | ||
if (hostname == null || hostname.length == 0) | ||
hostname = exec "hostname" // or any equivalent * | ||
hostname = exec "hostname -f" // or any equivalent * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SylvainJuge In my testing while implementing for .NET, it seems that this command may return with a period suffix (e.g. DESKTOP-NCGCCT0.
) or even the suffix .localdomain
. Do we send as-is, or should we consider trimming in either case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevejgordon Where did you see an example like DESKTOP-NCGCCT0.
? Is that an OS other than Windows?
or even the suffix .localdomain
I'm not sure if it is equivalent, but on macOS the domain is commonly ".local", e.g. on my laptop:
% hostname -f
pink.local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm testing using WSL2 on Windows with Ubuntu. On my Windows laptop, where no domain is set, it reports the period suffix when I run hostname -f
from Ubuntu (using WSL2). I also set up a Windows 11 VM to test adding the DNS suffix and again added WSL Ubuntu to test the code and picked up the FQDN, which it did. Once I removed the DNS suffix, .localdomain
started appearing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For .NET, as the primary mechanism, I'm planning to use the Dns.GetHostEntry(string.Empty).HostName
API, which is cross-platform aware and seems to potentially include the suffix, at least when testing on Ubuntu via WSL2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here that's clearly a case where using the API seems to be more consistent than the equivalent with hostname
.
Does it also returns a consistent value when invoked outside of WSL ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevejgordon Assuming I followed the code correctly, Beats are currently using https://github.com/elastic/go-sysinfo/blob/9ea2eba5301c77b511c0475d7137ea3437a55a9a/providers/windows/host_windows.go#L87-L94
(https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getcomputernameexa)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trentm The .NET API eventually seems to call into the native Windows Sockets 2, Winsock API gethostname
function and on Linux the gethostname
system call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I understand this correctly, you are suggesting to just remove the extra .
that is appended when calling this API through WSL ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SylvainJuge Originally, yes, I thought perhaps we should. However, as it's consistent with hostname -f
and therefore, presumably what other agents on the same host will send, perhaps best to leave it unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevejgordon do you think we need to add a clarification about this in the spec or could we leave it as-is ?
Summary
Specification change for #793 .
APM agents should capture the FQDN instead of just the host name, APM Server will handle compatibility with current behavior through configuration (as it's a breaking change).
Fixes #794
Checklist
System Properties > Computer Name > Change > More > Primary DNS suffix of this computer
, then reboot. When set theSystem Properties > Computer Name
should show the FQDN name inFull computer name
field.detected_hostname
for APM server < 8.9 for the agents that did not already used the FQDN ?host.name
is not used very much in APM, thus it's not considered to be a breaking change: commentCODEOWNERS
)To auto-merge the PR, add
/
schedule YYYY-MM-DD
to the PR description.Create implementation issues through the meta issue template (this will automate issue creation for individual agents)already created in Report FQDN fordetected_hostname
#793If this spec adds a new dynamic config option, add it to central config.n/a