Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "dns" field set #438

Merged
merged 23 commits into from
Aug 21, 2019
Merged

Add "dns" field set #438

merged 23 commits into from
Aug 21, 2019

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Apr 18, 2019

This PR takes contribution #180 by @yugoslavskiy and adjusts it to the latest developments in the ECS repo.


This pull request was worked on for a long time, amidst many other priorities. For the sake of simplicity, I'm nuking the body of the PR to reflect the current thinking, as of the July 23rd push.


  • Packetbeat defines additional sections for opt, authorities and additionals. ECS doesn't include them for now. These Packetbeat fields are to be considered custom fields (they don't need to be removed).
  • We're renaming dns.question.etld_plus_one to dns.question.registered_domain
  • Packetbeat's dns.flags (an object containing boolean fields) is being replaced by an array of keywords, that should be populated by standard DNS flags, such as "AA", "DD", etc. It's more compact to visualize, and can be used to create tag clouds. The new field is dns.header_flags. The name is longer, unfortunately, but we didn't want to introduce a breaking change mid cycle.
  • All fields are intentionally level: extended for now. They can be promoted to core later, if need be.
  • To mitigate the difficulty of using arrays of objects in Kibana, we've added the field dns.resolved_ip, to make the IP addresses easy to get to, and have them the be in the ip datatype.
  • Got rid of phase attribute, it's no longer used.
  • Made gocodegen more verbose on unrecognized types, add support for boolean.
  • Adjust the es_template.py generator to support explicitly declared type=objects parent fields + explicitly defined leaf fields under them (like dns.answers).

Ping @andrewkroh, @MikePaquette, @ruflin

Closes #180

@webmat webmat self-assigned this Apr 18, 2019
@webmat
Copy link
Contributor Author

webmat commented Apr 18, 2019

Here's my take on some of the points above.

core vs extended

I think we should at least mark the dns.answer.* fields as core as well.

etld_plus_one

I'd like to rename it. Here's some ideas. More are welcome:

  • registerable_domain / registered_domain
  • master_domain (some clashes with Win NT networking and pop culture)
  • parent_domain
  • root_domain (I kinda like this one. See this)

opt

@andrewkroh What does it stand for, "optional"?

When are dns.opt.* fields populated? They're almost never populated on my Mac running Pb 7.

@webmat
Copy link
Contributor Author

webmat commented Apr 19, 2019

Array of objects

For any given DNS query, the response may contain more than one answer. However arrays of objects are problematic.

My preference would be to denormalize it like Suricata does, and have one event emitted per query, and one event per answer to that query. The query/answers can always be tied back via an event ID such as a flow ID

Suricata example (has both a DNS-specific ID and a flow ID):

suricata-dns

@ruflin
Copy link
Contributor

ruflin commented Apr 23, 2019

I would have all fields in extended as they are protocol specific.

No nested objects in ECS, so +1 on your idea.

@webmat
Copy link
Contributor Author

webmat commented Apr 23, 2019

Thanks for the review, @ruflin.

Looking at Suricata's dramatically simpler support for DNS, I wonder if we shouldn't leave a lot of these fields out of ECS. Packetbeat can still populate all of them, and they'll be custom fields. Also, I can't easily give some of examples we're missing, because my instance of Packetbeat isn't filling all of these fields... Another sign that they may be too corner case for addition in the schema.

We should find the common denominator of what the core of DNS events are.

I like the "flags" idea as an array field with predefined values. If we go this route, should we actually call it tags, for consistency with the rest of ECS? Or do we keep flags, for consistency with the DNS space?

@webmat webmat requested a review from andrewkroh April 23, 2019 17:59
@yugoslavskiy
Copy link

etld_plus_one

I'd like to rename it. Here's some ideas. More are welcome:

  • registerable_domain / registered_domain
  • master_domain (some clashes with Win NT networking and pop culture)
  • parent_domain
  • root_domain (I kinda like this one. See this)

I think registered_domain is the best option from the list.

There is a Public Suffix List (which is eTLD), which consist of domain names controlled by registrars and could not be registered by internet users. So next level (+1) is something which could be registered. And if we consider looking for events based on this field, it's about registered domain (because it was registered already).

Probably registerable_domain would be useful for registrars or for organisations which provide analytics related to domain names which could be registered (in other words, not registered yet). In other cases most of people work with already registered domains.

master_domain/parent_domain/root_domain — these options are too ambiguous, and could be eisaly mixed up with real root domain, which is completely irrelevant.

@MikePaquette
Copy link
Contributor

We could also use highest_registered_domain to be consistent with the below Elastic ML script fields function, although it is a bit long in the name.

https://www.elastic.co/guide/en/elastic-stack-overview/7.0/ml-configuring-transform.html#ml-configuring-transform8

I would be +1 for registered_domain as well.

@ruflin
Copy link
Contributor

ruflin commented Apr 24, 2019

We have log.flags already in Beats with can contain trucated, multiline so I think it would fit into this concept?

+1 on reducing the number of fields.

@webmat
Copy link
Contributor Author

webmat commented Apr 24, 2019

Excellent thanks for the feedback. I agree with you @ruflin that we could make them all extended for now.

I still have many things I'd like people's opinion on, the exhaustive list is in the body of the issue.

But in summary:

  • More rename suggestions:
    • dns.response_code to dns.status_code
    • dns.question.size to dns.question.length
  • Clarification on the usefulness of, and perhaps adjusting the names of the following:
    • dns.opt.*
    • dns.op_code

I'll ponder some more by taking a closer look at how Suricata and Zeek do it. I'll see if I can put together a more specific proposal that would work for all of them. I'd like to find an approach that can work (at least in part) with both approaches (1 event vs 1 event per answer)

@MikePaquette
Copy link
Contributor

More rename suggestions:
dns.response_code to dns.status_code

+1

dns.question.size to dns.question.length

+1

Clarification on the usefulness of, and perhaps adjusting the names of the following:
dns.opt.*

No comment on usefulness, but suggest changing opt.* option.* if we keep

dns.op_code

+1

@webmat
Copy link
Contributor Author

webmat commented Jul 23, 2019

@andrewkroh @MikePaquette @dainperkins @cwurm The fruit of our recent discussions.

@webmat
Copy link
Contributor Author

webmat commented Jul 23, 2019

I haven't had time to create an issue per thing affected by DNS getting into ECS. But off the top of my head:

  • Packetbeat
  • Suricata Filebeat module
    • Note that the old Suricata DNS format will be difficult to get to one event containing all answers. I recommend adding support for their DNS v2 format in our module (available in Suricata 4.1+), then making this one follow ECS.
  • Zeek Filebeat module
  • CoreDNS Filebeat module
  • Update Sysmon DNS PR

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me. I'm just debating on the utility of dns.grouped.domain.

Copy link

@cwurm cwurm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on question.length, answers_count (both can be easily calculated), and grouped.ip (not a fan of the name).

Looks good otherwise.

Mathieu Martin added 2 commits August 14, 2019 14:08
- removed pre-calculated counts/lengths
- removed grouped.domain
- renamed grouped.ip to resolved_ip
@webmat
Copy link
Contributor Author

webmat commented Aug 15, 2019

@elasticmachine, run elasticsearch-ci/docs

Copy link

@cwurm cwurm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this as it is. To recap (there've been many changes following our discussions), my understanding is that a "canonical ECS" DNS event would now look like this:

{
  dns: {
    type: "answer",
    id: "40269",
    op_code: "QUERY",
    flags: ["RA", "RD"],
    response_code: "NOERROR",
    question: {
      name: "client.dropbox.com",
      type: "A",
      class: "IN",
      registered_domain: "dropbox.com",
    },
    answers: [
      {
        name: "client.dropbox.com",
        type: "CNAME",
        class: "IN",
        ttl: 11,
        data: "client.dropbox-dns.com"
      },
      {
        name: "client.dropbox-dns.com",
        type: "A",
        class: "IN",
        ttl: 59,
        data: "162.125.11.3"
      }
    ],
    resolved_ip: [162.125.11.3]
  }
}

That looks good to me.

@webmat
Copy link
Contributor Author

webmat commented Aug 15, 2019

Yes, that's precisely how it should look.

And a second example, in case an event source only logs queries and not answers (e.g. as DNS server logs are often configured), the event should look like this:

{
  dns: {
    type: "query",
    id: "40269",
    op_code: "QUERY",
    flags: ["RD"],
    question: {
      name: "client.dropbox.com",
      type: "A",
      class: "IN",
      registered_domain: "dropbox.com",
    }
  }
}

@webmat
Copy link
Contributor Author

webmat commented Aug 15, 2019

We need to circumvent the conflict with Packetbeat's definition for dns.flags (object containing booleans), which conflicts with ECS' array of keywords. It's pains me to set the future in stone with a less good name because of a temporary (in the grand scheme of things) conflict. But we need DNS now, not in Stack v8.0.

Welcome to the compromise land ;-)

So DNS in ECS could use another name for this, to avoid the conflict. Some ideas:

  • dns.flag
    • Con: I think it will be confusing to people, because it's just one letter apart from Packetbeat's field name. So which one's the right one will be difficult to remember.
    • Con: Other flags fields in ECS should be named flags, not flag. We strive for proper pluralization. So this will become an inconsistency vs other uses of flags.
    • Pro: It's really short
    • Pro: It'll be the first choice in the autocomplete, vs dns.flags, so laziness can become the mnemonic to remember this is the right field?
  • dns.header_flags
    • Pro: It's legit DNS terminology.
    • Pro: As soon as people get used to it, it'll be different enough that it'll be easy to remember, I think.
    • Con: It's longer.
  • dns.tags
    • Pro: flags are semantically similar to tags, in their usage.
    • Con: It's not at all in line with the DNS lingo, so it will be confusing to people

I don't think we can simply nest it elsewhere, such as dns.question.flags, as some of the flags are provided on DNS answers (e.g. AA, RA), not on the query.

I'm open to other suggestions for field names.

But for now, dns.header_flags sounds to me like the best compromise.

@dainperkins
Copy link
Contributor

dainperkins commented Aug 15, 2019 via email

@cwurm
Copy link

cwurm commented Aug 15, 2019

+1 on dns.header_flags. Packetbeat 8.0 can drop dns.flags, and then we can switch to it in Stack 9.0 (in AD 2022?).

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick chat with @webmat to clarify some of my questions. I'm good with getting this in as is.

One concerned I raised that needs investigation is why in the template nowhere "nested" shows up because I thought that is need for nested object. But this is not a blocker for this PR and can be followed up with.

@webmat webmat merged commit 95a1d4f into elastic:master Aug 21, 2019
@webmat
Copy link
Contributor Author

webmat commented Aug 21, 2019

Thanks @ruflin! I'll follow up and confirm whether we need to change the datatype for dns.answers from object to nested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants