Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added parent domain field to source, destination and url. #531

Closed
wants to merge 1 commit into from

Conversation

mbudge
Copy link
Contributor

@mbudge mbudge commented Aug 24, 2019

The parent domain field is the domain without any sub-domain. The domain field is a exact-match keyword field, which means it is not possible to search for all connections to a domain when a sub-domain is involved. The parent domain field will allow users to store normalized domains using the public suffix list.

For example, the registered domain for "foo.malware.com" is "malware.com".

This value can be determined precisely with a list like the public suffix list (http://publicsuffix.org). Trying to approximate this by simply taking the last two labels will not work well for TLDs such as "co.uk". If the parent domain normalization process fails, users should store with original domain with the sub-domain in the parent-domain field. Punny-code domains are more likely to fail when using common TLD extract libraries which using the public suffix list to get the parent domain, hence using a best-effort approach means users can still search one field to more accurately find network connections.

This is important in SIEM and log management functions, as users need to be able to find all logs when they are searching for a known bad IOC domain. Users could index domains into an extra text field in their schema, but this is slow and expensive when searching many TB's of data in Elasticsearch.

The parent domain field is the domain without any sub-domain. The domain field is exact-match, which means it is not possible to search for all connections to a domain when a sub-domain is involved. The field will allow users to store normalized domains using the public suffix list.
@mbudge
Copy link
Contributor Author

mbudge commented Aug 24, 2019

I made a feature request to add a TLD extract filter to Logstash.

elastic/logstash#11079

@MikePaquette
Copy link
Contributor

@mbudge Thanks for creating this PR. I think there is wide agreement in the community for adding a field to capture the parent/higher level/registered domain. However, there is not agreement about what to call it :-). Please see Issue #84 for a further discussion. There is some support (#84 (comment)) for creating a field *.registered_domain to capture what you are looking for.

I have three suggestions:

  • Let's finish the conversation in Clarify use of hostname, subdomain, domain in source/destination #84 about what to name this field.
  • Let's not use the naming you suggested domain.parent, as this would cause domain to become an Elasticsearch object, something ECS tries to avoid for fields that are already defined, such as *.domain. (i.e. let's not use the "." character after "domain", regardless of what name we agree upon.)
  • If/when we do add this field, let's add it in all the places that *.domain is used today in ECS 1.1, which includes:
  1. client.domain
  2. client.user.domain
  3. destination.domain
  4. destination.user.domain
  5. host.user.domain
  6. server.domain
  7. server.user.domain
  8. source.domain
  9. source.user.domain
  10. url.domain
  11. user.domain

@MikePaquette
Copy link
Contributor

@mbudge A related question please: Would you also propose to add related ECS fields for the subdomain portion of the domain? (i.e. the part left over after extracting the parent/higher level/registered part)

Previous sentiment on adding such fields has been mixed, with some votes against it (#84 (comment)) but would love your thoughts.

@webmat
Copy link
Contributor

webmat commented Aug 26, 2019

Thanks for submitting this PR! I responded to you over on #84 before seeing this pull request :-)

I would suggest only adding the new field for the parent / registered domain in this PR, for now. I'm not sure extracting "only the subdomain" to another field is as valuable.

In the list of possible place where we need the new field, I would actually exclude those under user.*, as these are for recoding an AD / LDAP domain. While this value can be a fully qualified domain name, I'm not sure how useful it is to extract the registered domain out of it? I'm ready to be convinced otherwise, though :-)

Since we already went with the name registered_domain for this concept in the DNS field set, I would propose you modify your PR to add the field in these places (and remove domain.parent), to get started:

  • client.registered_domain
  • destination.registered_domain
  • server.registered_domain
  • source.registered_domain
  • url.registered_domain

@mbudge
Copy link
Contributor Author

mbudge commented Aug 28, 2019

It's true, parent child traverses the DNS hierarchy.

Submitted a new pull request to add registered_domain.

#533

@MikePaquette
Copy link
Contributor

Great, thanks for creating the new PR!
Agreed that we can leave this field out of the user.* fields.
We can close this PR out.

@webmat
Copy link
Contributor

webmat commented Aug 30, 2019

Indeed, closing in favor of #533. Thanks @mbudge for making the change!

@webmat webmat closed this Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants