Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added question.subdomain field #561

Closed
wants to merge 1 commit into from
Closed

Conversation

mbudge
Copy link
Contributor

@mbudge mbudge commented Sep 15, 2019

Added to question.subdomain field for security use cases such as looking for dns-exfil.

Currently domains are only indexed as domain, top_level_domain and registered_domain.

The subdomain field will allow users to find parent domains with a abnormally high number of sub-domains.

Added to question.subdomain field for security use cases such as looking for dns-exfil.
@webmat
Copy link
Contributor

webmat commented Sep 23, 2019

I wonder if we specifically need to store the text representation of the subdomain.

So given a DNS query to "exfilpayload.shady.example.co.uk", considering all other PRs in flight, the DNS event should already contain:

{ "dns": { "question":
  { "name": "exfilpayload.shady.example.co.uk",
    "registered_domain": "example.co.uk",
    "top_level_domain": "co.uk",
    "subdomain": "exfilpayload.shady" # proposed addition
  }
}}

On one hand, I think the registered_domain and top_level_domain's appeal is really high, as it's natural to aggregate events per both of these fields.

On the other hand, what we want to get out of the subdomain is the likeliness of it being used for exfiltration. So length or entropy are what we're looking for, there.

Before adding this field, the situation is:

  • Searching for the use of a precise value/subdomain is efficient with a prefix search on the keyword field, e.g. dns.question.name:exfilpayload*.
  • Visualizing the subdomain of each event in a table view can be accomplished at view time (scripted field or custom app code).
  • I don't think there's a need to aggregate on the subdomains? A clue of it being used for exfil is its entropy given the arbitrary values potentially being sent. As a consequence, this would specifically sound like a column we would not want to aggregate on. Unless I'm missing something?

I wonder if the subdomain length would be enough for the need? Or perhaps another numeric value around entropy?

WDYT @randomuserid @MikePaquette @dainperkins?

@dainperkins
Copy link
Contributor

honestly I'd like to see subdomain & hostname as they are logistically portions of the fqdn, and unlike trying to separate out the whole domain structure parsing out [hostname, subdomain, registered domain] is fairly straight forward. theres also no need to populate all of them...

I can see a unique subdomain report being useful, even if Exfil ML is in the mix, there could potentially be other applications (large companies who have lost control of internal dns... yes I have seen it...)

@neu5ron
Copy link

neu5ron commented Sep 27, 2019

I would love to see a subdomain of any sort..This is something that has proven of great value in various environments over many years..

ontop of all the ML features everyone is mentioning -
the white listing features are great (granted this isn't a silver bullet) but say for example I am looking for some sort of phish/typosquatting of google.. its almost never that google as the second level is NOT registered to them. so I can filter out domain_level_2_name:google -- blog similar to what I show here

@webmat
Copy link
Contributor

webmat commented Sep 27, 2019

@neu5ron But the field we're discussing here wouldn't contain "google", it would contain "www"

@neu5ron
Copy link

neu5ron commented Sep 27, 2019

ah yes, you are correct. I meant to just say subdomains of any sort are good - this one (related to this PR) is included :)

@webmat
Copy link
Contributor

webmat commented Sep 27, 2019

Thanks for the input, Nate and Dain!

I'm still not convinced this field needs to be stored, nor defined in ECS.

I could be convinced of adding it, if someone were to give a use case where an aggregation on unique subdomains (e.g. "www", "account", "superlong-exfilpayload") would be useful. Perhaps a rare terms aggregation? Would rare terms work well on subdomains?

So far I'm still under the impression that all other things we need about "subdomain" fields can be resolved with either a scripted field or a prefix search on the "domain" field.

@randomuserid
Copy link

Thinking about this. ATM moment we detect things like dnscat with this ML job:

high_info_content("dns.question.name") over "dns.question.etld_plus_one"
high_info_content("dns.question.name") over tld

The other DNS job looks for rare domains by looking for rare DNS questions;

rare by "dns.question.name"

It sounds like the use case for subdomains might be things like this work by Nathan: https://blog.perched.io/dns-tunneling-other-hunts-w-rocknsm-bro-elk-52a4486e44d0

If so, I suppose there might be non-ML enabled people out there who want o do things like this - but this was done in the past when no subdomain field existed. @neu5ron can you elaborate on what kinds of things would you do with an ECS subdomain field that cannot be done today?

@neu5ron
Copy link

neu5ron commented Sep 27, 2019

yeah i would believe a multitude of aggregations would be helpful. However, I can not specifically talk to using (rare) - I would mainly just use (stack) count aggregations.
I am afraid scripted fields will be a huge burden on a system.. 500GBs to 1TB a day of DNS logs at large organizations.. Then would have to turn the stack into the other brute force databases. Also, prefix searching is tough when certain levels of the subdomain are in various places...
randomlevelexample.randomlevelexample.google.some.bad.domain.local
filtering *google would exclude all sorts of results..

I hope I can convey you how much I/team would use the subdomains for filtering/aggregations.
Granted, is it as useful as TLD or TLD+2nd level.. no. However, apart from the usefulness of the subdomains by themselves as we discussed above.. IMHO, another useful feature is for organizations that have pretty long domains finicial.corp.local and thus a hostname would leave in the 4th subdomain . Then there are organizations with much longer domains..

I hate to keep tooting my own horn.. However, haven't really seen anybody else discussing this in practice..I did notice in one of FireEye's blogs of a screenshot using a similar structure (however unable to find that blog at the moment).. I would also imagine OpenDNS is using a structure like what we are discussing.. Which brings up a use case, passive (domain) databases
so firing away with more of my blogs on showing various usage of subdomains..

dns tunneling/exfil show here:
length, whitelist/filtering, and stack counting of any of the 3rd, 4th, 5th, and so on subdomains shown here:
https://blog.perched.io/dns-tunneling-other-hunts-w-rocknsm-bro-elk-52a4486e44d0
granted, for this we have a nice ML job now.. However, again white-listing will still come in handy in LARGE environments as well as (its) rare) registrants that allow subdomains beginning at the 4th level (such as

typo squatting:
https://blog.neu5ron.com/2018/04/typosquatting-detection-with-elk-bro-nsm.html
again mostly filtering comes into play here. However, having the ability to do this on various levels/subdomains comes into handy..

Finally, I should say - I think if we are able to define these fields for those who want/use these subdomains and less of having to implement it in everything such as ingest processors or beats could be a fair compromise.

@webmat
Copy link
Contributor

webmat commented Sep 27, 2019

Yeah the only use I would recommend a scripted field for is to cleanly display the subdomain in a table view. Not to do aggregations on :-)

Note however that my understanding here is that we're not splitting out each level or labels of the subdomain. We're simply cutting before the registered domain, and all of the subdomain levels below that get shoved into subdomain.

More visually, sub3.sub2.sub1.example.com gets broken down like this:

dns.question.name:              sub3.sub2.sub1.example.co.uk
dns.question.registered_domain:                example.co.uk
dns.question.top_level_domain:                         co.uk
dns.question.subdomain:         sub3.sub2.sub1

Is "subdomain" like this useful for what you have in mind?

@MikePaquette
Copy link
Contributor

Note however that my understanding here is that we're not splitting out each level or labels of the subdomain. We're simply cutting before the registered domain, and all of the subdomain levels below that get shoved into subdomain.

@webmat Yes, my understanding matches exactly that.

For history, one of our longest running ECS issues ever #84 addressed this topic:

I am +1 to adding *.subdomain in dns.question.*

Question: Is there any value in adding *.subdomain anywhere else? I am thinking no.

@jamesspi
Copy link

@webmat a few thoughts.

Personally, I would love to see the subdomain in there for a number of reasons.

  • Distinct count of subdomains per registered domain (why does user/process x perform lookups over 50 subdomains for a given registered domain, vs 2000 for user y)
  • high info content in the ML context (why should we expect users to have to use a scripted field for this?)
  • certificate dns names (show me all the subdomain certificates generated for this registered domain)

Those are just a few off the top of my head.

@dainperkins
Copy link
Contributor

I feel like we should be categorizing as correctly as possible, and potentially providing field concatenation & ingest for highly useful features (in this case ** dns.question.intra_domain_id)

Should we be breaking TLD out into tld & country code, or at least providing the field for those who want to use it?

dns.question.name: host.sub3.sub2.sub1.example.co.uk
dns.question.registered_domain: example.co.uk
dns.question.top_level_domain: co.uk
dns.question.subdomain: sub3.sub2.sub1
dns.question.hostname host
dns.question.intra_domain_id host.sub3.sub2.sub1 **

Entropy for hostname, subdomain, intra_domain_id could all be useful, tho likely the highest value would be the infra_domain_id (field name suggestion not being particularly important)

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's add this field.

There's only one clarification I'd like to make on the field definition before merging. This should make usage of this field unambiguous.

For now let's add it only to DNS. If there's a need for subdomain elsewhere, we can add it as a separate PR.

Thanks everyone for chiming in!

type: keyword
short: The subdomain of the domain.
description: >
A subdomain is a hostname under it's parent domain.
Copy link
Contributor

@webmat webmat Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence is great as is, let's keep it.

However I would like the description to clarify two details (array v string, and trailing period). I know it will be interpreted differently by different people, if we don't specify. Could you add something like this as a second paragraph, please?

If the subdomain has multiple levels, such as "sub2.sub1.example.com", the subdomain field should contain "sub2.sub1", with no trailing period.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 to adding *.subdomain in dns.question.*

Question: Is there any value in adding *.subdomain anywhere else? I am thinking no.

sure - using it to cut up domains in urls would be useful for running the same sorts of analytics as running on DNS info... typically tls connections will start with a reference to the original fqdn of the session in the clear iirc

webmat pushed a commit to webmat/ecs that referenced this pull request Oct 1, 2019
@webmat
Copy link
Contributor

webmat commented Oct 3, 2019

Won't be able to merge this one directly, as there's conflicts in generated files. Will resolve via #574

@webmat webmat closed this in #574 Oct 3, 2019
webmat pushed a commit that referenced this pull request Oct 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants