Added question.subdomain field #561

mbudge · 2019-09-15T08:51:02Z

Added to question.subdomain field for security use cases such as looking for dns-exfil.

Currently domains are only indexed as domain, top_level_domain and registered_domain.

The subdomain field will allow users to find parent domains with a abnormally high number of sub-domains.

Added to question.subdomain field for security use cases such as looking for dns-exfil.

webmat · 2019-09-23T17:58:03Z

I wonder if we specifically need to store the text representation of the subdomain.

So given a DNS query to "exfilpayload.shady.example.co.uk", considering all other PRs in flight, the DNS event should already contain:

{ "dns": { "question":
  { "name": "exfilpayload.shady.example.co.uk",
    "registered_domain": "example.co.uk",
    "top_level_domain": "co.uk",
    "subdomain": "exfilpayload.shady" # proposed addition
  }
}}

On one hand, I think the registered_domain and top_level_domain's appeal is really high, as it's natural to aggregate events per both of these fields.

On the other hand, what we want to get out of the subdomain is the likeliness of it being used for exfiltration. So length or entropy are what we're looking for, there.

Before adding this field, the situation is:

Searching for the use of a precise value/subdomain is efficient with a prefix search on the keyword field, e.g. dns.question.name:exfilpayload*.
Visualizing the subdomain of each event in a table view can be accomplished at view time (scripted field or custom app code).
I don't think there's a need to aggregate on the subdomains? A clue of it being used for exfil is its entropy given the arbitrary values potentially being sent. As a consequence, this would specifically sound like a column we would not want to aggregate on. Unless I'm missing something?

I wonder if the subdomain length would be enough for the need? Or perhaps another numeric value around entropy?

WDYT @randomuserid @MikePaquette @dainperkins?

dainperkins · 2019-09-23T21:31:28Z

honestly I'd like to see subdomain & hostname as they are logistically portions of the fqdn, and unlike trying to separate out the whole domain structure parsing out [hostname, subdomain, registered domain] is fairly straight forward. theres also no need to populate all of them...

I can see a unique subdomain report being useful, even if Exfil ML is in the mix, there could potentially be other applications (large companies who have lost control of internal dns... yes I have seen it...)

neu5ron · 2019-09-27T03:52:07Z

I would love to see a subdomain of any sort..This is something that has proven of great value in various environments over many years..

ontop of all the ML features everyone is mentioning -
the white listing features are great (granted this isn't a silver bullet) but say for example I am looking for some sort of phish/typosquatting of google.. its almost never that google as the second level is NOT registered to them. so I can filter out domain_level_2_name:google -- blog similar to what I show here

webmat · 2019-09-27T14:47:47Z

@neu5ron But the field we're discussing here wouldn't contain "google", it would contain "www"

neu5ron · 2019-09-27T16:36:43Z

ah yes, you are correct. I meant to just say subdomains of any sort are good - this one (related to this PR) is included :)

webmat · 2019-09-27T17:31:04Z

Thanks for the input, Nate and Dain!

I'm still not convinced this field needs to be stored, nor defined in ECS.

I could be convinced of adding it, if someone were to give a use case where an aggregation on unique subdomains (e.g. "www", "account", "superlong-exfilpayload") would be useful. Perhaps a rare terms aggregation? Would rare terms work well on subdomains?

So far I'm still under the impression that all other things we need about "subdomain" fields can be resolved with either a scripted field or a prefix search on the "domain" field.

randomuserid · 2019-09-27T17:42:32Z

Thinking about this. ATM moment we detect things like dnscat with this ML job:

high_info_content("dns.question.name") over "dns.question.etld_plus_one"
high_info_content("dns.question.name") over tld

The other DNS job looks for rare domains by looking for rare DNS questions;

rare by "dns.question.name"

It sounds like the use case for subdomains might be things like this work by Nathan: https://blog.perched.io/dns-tunneling-other-hunts-w-rocknsm-bro-elk-52a4486e44d0

If so, I suppose there might be non-ML enabled people out there who want o do things like this - but this was done in the past when no subdomain field existed. @neu5ron can you elaborate on what kinds of things would you do with an ECS subdomain field that cannot be done today?

neu5ron · 2019-09-27T18:01:33Z

yeah i would believe a multitude of aggregations would be helpful. However, I can not specifically talk to using (rare) - I would mainly just use (stack) count aggregations.
I am afraid scripted fields will be a huge burden on a system.. 500GBs to 1TB a day of DNS logs at large organizations.. Then would have to turn the stack into the other brute force databases. Also, prefix searching is tough when certain levels of the subdomain are in various places...
randomlevelexample.randomlevelexample.google.some.bad.domain.local
filtering *google would exclude all sorts of results..

I hope I can convey you how much I/team would use the subdomains for filtering/aggregations.
Granted, is it as useful as TLD or TLD+2nd level.. no. However, apart from the usefulness of the subdomains by themselves as we discussed above.. IMHO, another useful feature is for organizations that have pretty long domains finicial.corp.local and thus a hostname would leave in the 4th subdomain . Then there are organizations with much longer domains..

I hate to keep tooting my own horn.. However, haven't really seen anybody else discussing this in practice..I did notice in one of FireEye's blogs of a screenshot using a similar structure (however unable to find that blog at the moment).. I would also imagine OpenDNS is using a structure like what we are discussing.. Which brings up a use case, passive (domain) databases
so firing away with more of my blogs on showing various usage of subdomains..

dns tunneling/exfil show here:
length, whitelist/filtering, and stack counting of any of the 3rd, 4th, 5th, and so on subdomains shown here:
https://blog.perched.io/dns-tunneling-other-hunts-w-rocknsm-bro-elk-52a4486e44d0
granted, for this we have a nice ML job now.. However, again white-listing will still come in handy in LARGE environments as well as (its) rare) registrants that allow subdomains beginning at the 4th level (such as

typo squatting:
https://blog.neu5ron.com/2018/04/typosquatting-detection-with-elk-bro-nsm.html
again mostly filtering comes into play here. However, having the ability to do this on various levels/subdomains comes into handy..

Finally, I should say - I think if we are able to define these fields for those who want/use these subdomains and less of having to implement it in everything such as ingest processors or beats could be a fair compromise.

webmat · 2019-09-27T19:39:20Z

Yeah the only use I would recommend a scripted field for is to cleanly display the subdomain in a table view. Not to do aggregations on :-)

Note however that my understanding here is that we're not splitting out each level or labels of the subdomain. We're simply cutting before the registered domain, and all of the subdomain levels below that get shoved into subdomain.

More visually, sub3.sub2.sub1.example.com gets broken down like this:

dns.question.name:              sub3.sub2.sub1.example.co.uk
dns.question.registered_domain:                example.co.uk
dns.question.top_level_domain:                         co.uk
dns.question.subdomain:         sub3.sub2.sub1

Is "subdomain" like this useful for what you have in mind?

MikePaquette · 2019-09-28T00:24:21Z

Note however that my understanding here is that we're not splitting out each level or labels of the subdomain. We're simply cutting before the registered domain, and all of the subdomain levels below that get shoved into subdomain.

@webmat Yes, my understanding matches exactly that.

For history, one of our longest running ECS issues ever #84 addressed this topic:

I proposed adding *.subdomain field(s) (we had not yet defined all the places where we'd keep domain information) in Clarify use of hostname, subdomain, domain in source/destination #84 (comment) matching this definition.
@andrewkroh questioned its usefulness in Clarify use of hostname, subdomain, domain in source/destination #84 (comment)
@webmat thought it was useful in Clarify use of hostname, subdomain, domain in source/destination #84 (comment)
@andrewkroh further commented on lack of need for this in Clarify use of hostname, subdomain, domain in source/destination #84 (comment)
@ruflin recommended we leave it out but consider re-adding later when needed in Clarify use of hostname, subdomain, domain in source/destination #84 (comment)
Now It's been nearly one year later, and @neu5ron has better articulated its usefulness

I am +1 to adding *.subdomain in dns.question.*

Question: Is there any value in adding *.subdomain anywhere else? I am thinking no.

jamesspi · 2019-09-28T11:04:16Z

@webmat a few thoughts.

Personally, I would love to see the subdomain in there for a number of reasons.

Distinct count of subdomains per registered domain (why does user/process x perform lookups over 50 subdomains for a given registered domain, vs 2000 for user y)
high info content in the ML context (why should we expect users to have to use a scripted field for this?)
certificate dns names (show me all the subdomain certificates generated for this registered domain)

Those are just a few off the top of my head.

dainperkins · 2019-09-28T13:50:05Z

I feel like we should be categorizing as correctly as possible, and potentially providing field concatenation & ingest for highly useful features (in this case ** dns.question.intra_domain_id)

Should we be breaking TLD out into tld & country code, or at least providing the field for those who want to use it?

dns.question.name: host.sub3.sub2.sub1.example.co.uk
dns.question.registered_domain: example.co.uk
dns.question.top_level_domain: co.uk
dns.question.subdomain: sub3.sub2.sub1
dns.question.hostname host
dns.question.intra_domain_id host.sub3.sub2.sub1 **

Entropy for hostname, subdomain, intra_domain_id could all be useful, tho likely the highest value would be the infra_domain_id (field name suggestion not being particularly important)

webmat

Ok let's add this field.

There's only one clarification I'd like to make on the field definition before merging. This should make usage of this field unambiguous.

For now let's add it only to DNS. If there's a need for subdomain elsewhere, we can add it as a separate PR.

Thanks everyone for chiming in!

webmat · 2019-09-27T19:55:58Z

schemas/dns.yml

+      type: keyword
+      short: The subdomain of the domain.
+      description: >
+        A subdomain is a hostname under it's parent domain.


First sentence is great as is, let's keep it.

However I would like the description to clarify two details (array v string, and trailing period). I know it will be interpreted differently by different people, if we don't specify. Could you add something like this as a second paragraph, please?

If the subdomain has multiple levels, such as "sub2.sub1.example.com", the subdomain field should contain "sub2.sub1", with no trailing period.

I am +1 to adding *.subdomain in dns.question.*

Question: Is there any value in adding *.subdomain anywhere else? I am thinking no.

sure - using it to cut up domains in urls would be useful for running the same sorts of analytics as running on DNS info... typically tls connections will start with a reference to the original fqdn of the session in the clear iirc

webmat · 2019-10-03T12:52:18Z

Won't be able to merge this one directly, as there's conflicts in generated files. Will resolve via #574

Added question.subdomain field

b4c44f3

Added to question.subdomain field for security use cases such as looking for dns-exfil.

webmat suggested changes Sep 30, 2019

View reviewed changes

webmat pushed a commit to webmat/ecs that referenced this pull request Oct 1, 2019

Apply PR elastic#561 feedback

46519b7

webmat mentioned this pull request Oct 1, 2019

Add dns.question.subdomain field #574

Merged

webmat closed this in #574 Oct 3, 2019

webmat pushed a commit that referenced this pull request Oct 3, 2019

Add dns.question.subdomain field (#574, #561)

393eafb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added question.subdomain field #561

Added question.subdomain field #561

mbudge commented Sep 15, 2019

webmat commented Sep 23, 2019

dainperkins commented Sep 23, 2019

neu5ron commented Sep 27, 2019

webmat commented Sep 27, 2019 •

edited

Loading

neu5ron commented Sep 27, 2019

webmat commented Sep 27, 2019

randomuserid commented Sep 27, 2019

neu5ron commented Sep 27, 2019 •

edited

Loading

webmat commented Sep 27, 2019 •

edited

Loading

MikePaquette commented Sep 28, 2019

jamesspi commented Sep 28, 2019

dainperkins commented Sep 28, 2019

webmat left a comment

webmat Sep 27, 2019 •

edited

Loading

dainperkins Oct 2, 2019

webmat commented Oct 3, 2019

Added question.subdomain field #561

Added question.subdomain field #561

Conversation

mbudge commented Sep 15, 2019

webmat commented Sep 23, 2019

dainperkins commented Sep 23, 2019

neu5ron commented Sep 27, 2019

webmat commented Sep 27, 2019 • edited Loading

neu5ron commented Sep 27, 2019

webmat commented Sep 27, 2019

randomuserid commented Sep 27, 2019

neu5ron commented Sep 27, 2019 • edited Loading

webmat commented Sep 27, 2019 • edited Loading

MikePaquette commented Sep 28, 2019

jamesspi commented Sep 28, 2019

dainperkins commented Sep 28, 2019

webmat left a comment

Choose a reason for hiding this comment

webmat Sep 27, 2019 • edited Loading

Choose a reason for hiding this comment

dainperkins Oct 2, 2019

Choose a reason for hiding this comment

webmat commented Oct 3, 2019

webmat commented Sep 27, 2019 •

edited

Loading

neu5ron commented Sep 27, 2019 •

edited

Loading

webmat commented Sep 27, 2019 •

edited

Loading

webmat Sep 27, 2019 •

edited

Loading