-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify use of hostname, subdomain, domain in source/destination #84
Comments
Thanks @andrewkroh Sorry to take so long to get back to this. This seems easy until you try to spell it out :-). I’d propose the following: ECS
CASE 1
CASE 2
CASE 3
CASE 4
|
@MikePaquette So the idea is to break down each of these hostnames, wherever they are defined, correct? I'm asking this because your examples (e.g. Case 2) don't do that explicitly. Here's how it should be populated, as I understand it: Case 2 If we have details about a device If we have src/dst details about a connection (I changed this part slightly vs your case 2, to illustrate a host talking to an API, for example): |
@webmat yes, that is correct, the same breakdown would apply to each namespace/object/prefix where And yes, your example of a host talking to an API is consistent with this definition. I'll update the entire set of cases with the missing fields for completeness. |
Here's an updated set of reference Cases and clarifications, based on @webmat's feedack. Added his example as CASE 5: ECS
CASE 1
CASE 2
CASE 3
CASE 4
CASE 5
|
Engineer perspective question: Assuming someone has a FQND, is it possible in a fully automated way to do the split up? |
It's possible, but will be a mess. Consider third and fourth level domains (see the .ca mess). I assume there's some sort of database that lists out all TLDs. I wonder how it deals with things like the .ca situation. It's been impossible to register a .qc.ca since 2010, but these domains still resolve. There's also the government one -- .gc.ca -- that's actually a domain, not a TLD, but is used by all branches of govt (so behaves like a third level domain). Now I'm curious how Packetbeats computes |
Thank you for clarifying the field definitions. Implementing the logic to generate The Lastly what should an implementor do when they do not have a device's FQDN? Often I see syslog messages that contain a hostname that is not fully qualified (so it doesn't meet the requirements of |
Thanks @andrewkroh Regarding the last question (if no FQDN available), would it be better to create a separate field, or to just populate the *.hostname field with the best info available? |
@andrewkroh Why do you say the The hostname vs fqdn is a thing that bugs me. I've often run infrastructures where the app servers didn't actually have an FQDN on purpose. The instances didn't have a public IP and they didn't have a DNS entry. There was just a variable amount of app servers in an autoscaling group, only available via the load balancer (or ssh via a VPN hop). So expecting hostname to be an FQDN doesn't quite fit reality, IMO. Perhaps we need one more fields in this bunch?
So the "central" field would become the |
Specifically in the case of network monitoring, having the full domain of a remote resource in
Do people (esp. my ECS colleagues, @MikePaquette & @ruflin) feel strongly that |
|
Except for the specialize use by the ML job to compute a score exclusively on the subdomain value, I cannot think of any uses for it that are not covered by using a combination of FQDN and |
Yes, my point is that source.hostname: webscale42
destination.fqdn: api.example.com
destination.domain: example.com
destination.subdomain: api If instead my webserver's source.hostname: webscale42
source.fqdn: webscale42.scalableexample.com
source.domain: scalableexample.com
source.subdomain: webscale42
destination.fqdn: api.example.com
destination.domain: example.com
destination.subdomain: api This is what I understand from the initial discussion, at least. Note that I don't actually see the usefulness of mixing I'll reformulate a bit what I think would be the most straightforward way to approach this, I'd like to understand people's POV if this is missing anything. The below definitions are exactly the same for
@andrewkroh Ok perhaps we can take it out. It's true it can be computed any time we need it. One of the security use cases is to look at the length of a subdomain, but perhaps there's no need to have it saved on every single event. I guess the question is: are there times where we need to aggregate specifically on subdomain (regardless of the actual domain, so all |
I suggest we remove |
Yeah ok, I don't mind removing subdomain for now. I agree this is getting needlessly complex. This leaves us to determining the name for the fields we actually use. Here's how I would do it:
Please let me know what you think @MikePaquette and @ruflin so we can close the loop on this :-) |
I'm good with |
|
To keep this moving:
? |
To recap what was discussed elsewhere, I'll insist again on a distinction I would like us to make. The link between a
So I think nowadays people expect The use of
The question around the breaking down of a full domain essentially revolves around which one of the value we consider the "default" or most interesting piece of information. That one should be named Full domain first:
Registerable domain first:
We can decide to just not use
I do think we need to define two fields for domains, not just What I gather so far on who prefers what:
I'm unsure where @MikePaquette stands on this, as initially his comments were about |
So given all this, if you guys feel strongly about having |
Note that |
Opened PR #141 to close the loop on this. I went your preference of I've actually come around to liking how much more compact |
@webmat after this discussion, I'll change my original proposal and vote for
here would be some anticipated common mappings:
I'm not sure why we need the additional hostname field for URL. Is there a good example where this would not map to ecs |
If someone has For the url: let's not mix this in. The url is split up based on common patterns from different programming language. If we need at one stage also domain, we can add it but not now. |
I will break #141 down into smaller PRs, as we just discussed. The We've also discussed that saving both values for the domain (the FQDN and the registered domain) as an array in one field is not the way to go, because all domains with a subdomain would then be counted twice in aggregations. Once as In working on PR #141, I realized that So with all of this said, it looks like we didn't actually have agreement on how to name the domain breakdown fields. Here's the options once more: Full domain name including subdomain:
Highest Registerable Domain:
So we need to find a suitable pair of field names that makes sense to host the full domain name and an optional field, without a subdomain. It would be helpful if we could have a simple vote in the comments here on everyone's favourite pair of field names. Note about |
Here's my vote (I keep going back & forth):
|
@ruflin @MikePaquette @andrewkroh @robgil I'd like another round of opinions on the two field names around |
I think in the end we should use what is most intuitive for the users. Very few people will read this thread to figure out what to put into this fields. My current take:
For subdomain I don't think it should be part of ECS for now. |
So for a DNS request to get the IP of a domain, you'd put this in |
I'm more thinking DNS related stuff should go into it's own prefix: #10 |
@webmat With ECS 1.1, we added the |
I propose having a parent_domain field to store the parent domain. foo.example.com is a sub/child-domain and it's parent domain is example.com "The original “base” zone is referred to as the parent zone, e.g. domain.com; the separated subdomain is referred to as child zone or cut node, e.g. sub.domain.com. For more technical detail, please see RFC 1035." https://help.dyn.com/child-and-parent-zones-in-dynect/ "In general, subdomains are domains subordinate to their parent domain" "Child: "The entity on record that has the delegation of the domain Parent: "The domain in which the Child is registered." (Quoted from https://tools.ietf.org/html/rfc8499 Having a domain and hostname field could get confusing for users who don't know dns. Having a sub-domain field seems overkill as it's not really useful. Users can search for all logs to a parent domain where the domain and parent domain don't match to get the sub-domains. |
@mbudge Thanks for the input! By the way this is a very old issue. We didn't end up going with
With |
It's not clear to me how to populate the
hostname
,subdomain
, anddomain
fields ofsource
/destination
. More detailed descriptions of each field are needed with examples.It would probably be helpful to establish some terminology that could be used in clarifying the descriptions.
Terms:
.com
,.net
,.bmw
,.us
) [list of TLDs].com
,.co.uk
andpvt.k12.wy.us
) [these get determined with the help of the public suffix list]example.com
,example.co.uk
)co
is the SLD ofwww.example.co.uk
)Examples showing the mappings of these FQDNs to ECS would probably be sufficient to clarify the topic for me.
example.com
www.example.com
www.example.co.uk
Logstash has a TLD filter that uses similar field names, possibly(?) with different meanings.
The text was updated successfully, but these errors were encountered: