Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created the Vulnerability Schema #581

Merged
merged 37 commits into from
Nov 19, 2019
Merged

Created the Vulnerability Schema #581

merged 37 commits into from
Nov 19, 2019

Conversation

peasead
Copy link
Contributor

@peasead peasead commented Oct 7, 2019

The thought is that this would be used for vulnerability scanners or associated projects (Qualys, Nexpose, Nessus, OpenVas, VulnWisperer, etc.) so they can send their data to Elasticsearch with ECS.

Some suggested fields were from #113

Fields created for the vulnerability-* schema:

cvss.score.base
cvss.score.temporal
cvss.score.environmental
category
cve.description
cve.id
severity
status
detected.first
detected.last
scanned.first
scanned.last

@peasead peasead added the enhancement New feature or request label Oct 7, 2019
Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for opening this, @peasead!

Is it fair to say that ingesting one scan/report should generate multiple events that populate vulnerability.*? In other words, each event captures one finding?

If so, should we have a keyword field to associate each event with a report/scan ID?

@peasead
Copy link
Contributor Author

peasead commented Oct 22, 2019

I had over filtered Github notifications. I'm reviewing these comments now. Sorry for the delay.

@joshbressers
Copy link

It could make sense to have an arbitrary ID field. While many of the findings have CVE IDs, not all do. There are other vulnerability identifier that show up every now and then. Here's an example I pulled from Snyk
https://snyk.io/vuln/SNYK-JS-PM2-474345

On that note as well, being able to link to one or more reference URLs would be helpful. In the case of CVE IDs you could easily just link to MITRE or NVD. Other identifiers may be less obvious and it's possible some identifier won't even have a central location.

@dainperkins
Copy link
Contributor

dainperkins commented Oct 30, 2019

Should we ditch the CVE/CVSS specifics as well and leave it as:

vulnerability.score.base
vulnerability.score.temporal
vulnerability.score.environmental
vulnerability.category
vulnerability.description
vulnerability.id

vulnerability.classification e.g. CVSS
vulnerability.enumeration e.g. CVE
vulnerability.url e.g. link to url describing vulnerability

@stiltz
Copy link

stiltz commented Oct 30, 2019

I see this just closed but the only additional comment I have is that most vulnerability scan vendors include a unique vulnerability id for their scanner. For example Qualys has a QID which may include multiple CVE's . I don't see where that fits in here.

@dainperkins
Copy link
Contributor

good idea

vulnerability.scanner.id
vulnerability.scanner.name

or we could use the observer fields to detail scan vendor, etc.

@dainperkins dainperkins reopened this Oct 30, 2019
@webmat
Copy link
Contributor

webmat commented Oct 30, 2019

Yeah I like the idea of supporting an additional arbitrary ID. We need to find a good name for that, though. I don't think vulnerability.arbitrary_id works well ;-)

@peasead
Copy link
Contributor Author

peasead commented Nov 7, 2019

good idea

vulnerability.scanner.id
vulnerability.scanner.name

or we could use the observer fields to detail scan vendor, etc.

Awesome ideas @dainperkins and @stiltz

I took those changes and updated the PR.

@dainperkins
Copy link
Contributor

Am I reading this correctly that the vulnerability fields end up as items nested under host/source/destination/observer, etc?

@peasead
Copy link
Contributor Author

peasead commented Nov 14, 2019

Am I reading this correctly that the vulnerability fields end up as items nested under host/source/destination/observer, etc?

@dainperkins The top level is vulnerability or am I misunderstanding what you're asking?

@dainperkins
Copy link
Contributor

generated/csv/fields.csv showed vulnerability as both top level and nested e.g. client.vulnerability... Wasn't sure if that was on purpose, an option, or mistake.

vulnerability.name: Flash buffer overflow
vulnerability.client: [array of affected hosts]

vs.
vulnerability.name: Flash buffer overflow
host.name: [array of affected hosts]

could just be I was thinking about it differently but for reporting vunerabilities I was thinking the vulnerability info would all be top level, with teh addition of a [host] array or similar to define the vulnerability to vulnerable system mappings

@webmat
Copy link
Contributor

webmat commented Nov 15, 2019

@dainperkins No, just a matter of re-running make. Was puzzled by that too, initially :-)

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a much more detailed review, thanks for working on this @peasead :-)

  • As we already discussed, please run make prior to each commit, so generated files are updated
  • Please add a changelog entry to CHANGELOG.next.md. You can get inspiration from CHANGELOG.md.
  • Please break off long definitions at around 80-90 chars. Double return will result in new paragraphs, but single return will keep the paragraph together in the various generated artifacts

See also a lot of small adjustments provided as comments below.

I really like where this is already, content is top notch! My comments are for minor adjustments mostly.


example: CVE

- name: url
Copy link
Contributor

@webmat webmat Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this field to reference. We're in the process of establishing a pattern where reference url fields are always named .reference :-)

So far: threat.tactic.reference, threat.technique.reference, and soon package.reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧐 you renamed the field above 🙂

Here's what I meant:

  • vulnerability.url should become vulnerability.reference
  • The field you have renamed used to be named vulnerability.enumeration. I'm ok with keeping that original field name, as long as the description is fleshed out a little, like described here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. I've made the adjustments.

@webmat
Copy link
Contributor

webmat commented Nov 15, 2019

How do we expect people to use the vulnerability field set? My current understanding is that this is meant to import report findings, not to enrich events with vulnerability details. Correct?

If the idea is indeed to import scan findings, should we explore support for two related kinds of documents?

  • document for the report itself (not explicitly addressed right now)
  • documents for each finding of the report (what this PR currently describes)

Or are we good with just documents per findings + the report ID?

cc @joshbressers @stiltz

@ghost
Copy link

ghost commented Nov 18, 2019

Hi @peasead, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in your Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things to address, see review comments below.

I would also like clarifications on a few questions:

  • @peasead do you think we need to be able to ingest a document representing the report itself, in addition to each finding? Or are we good with the report details just being referenced via vulnerability.report_id?
  • Do we need vulnerability.status? See longer version of the question here
  • There's a few fields that I would like to know whether we should expect a single value or an array of values, please let me know your thinking about each of them:
    • vulnerability.category
    • vulnerability.scanner.id

Additional note on the array question above. Even if some scanners make a decision to pick the "best match" (or highest severity) when assigning a category or vuln ID to a finding, that's not the standard we have to match. What we have to cater to is if any scanner can associate multiple categories or return multiple CVE IDs for a finding, then we should make these field arrays.

Note that there's no direct support for arrays in ECS right now, but it's coming. So any of those should be arrays, we would simply mention it in the description, and give an example that's actually an array (example: '[CVE-2019-0001]').


example: CVE

- name: url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧐 you renamed the field above 🙂

Here's what I meant:

  • vulnerability.url should become vulnerability.reference
  • The field you have renamed used to be named vulnerability.enumeration. I'm ok with keeping that original field name, as long as the description is fleshed out a little, like described here.

@peasead
Copy link
Contributor Author

peasead commented Nov 18, 2019

A few more things to address, see review comments below.

I would also like clarifications on a few questions:

  • @peasead do you think we need to be able to ingest a document representing the report itself, in addition to each finding? Or are we good with the report details just being referenced via vulnerability.report_id?

I think that the vulnerability.report_id is sufficient.

  • Do we need vulnerability.status? See longer version of the question here

I think that this is probably scanner/UI dependent and if we removed it, we'd be okay. To whit, I'm not sure other scanners beyond Qualys track this information, so it may be very scanner dependent. I'll remove this.

  • There's a few fields that I would like to know whether we should expect a single value or an array of values, please let me know your thinking about each of them:

    • vulnerability.category
    • vulnerability.scanner.id

I think category and scanner_id would generally be a single value. In the event that they cover more than one category/ID, I believe that there'd be an additional entry, which would remove the need for an array consideration.

Additional note on the array question above. Even if some scanners make a decision to pick the "best match" (or highest severity) when assigning a category or vuln ID to a finding, that's not the standard we have to match. What we have to cater to is if any scanner can associate multiple categories or return multiple CVE IDs for a finding, then we should make these field arrays.

Note that there's no direct support for arrays in ECS right now, but it's coming. So any of those should be arrays, we would simply mention it in the description, and give an example that's actually an array (example: '[CVE-2019-0001]').

While I agree that we don't want to be in the business of assigning a category, we're collecting the data that is being presented by the scanner. So if the scanner says "Firewall", we should take the observer at its word vs. trying to maintain a matrix.

@webmat
Copy link
Contributor

webmat commented Nov 18, 2019

I think category and scanner_id would generally be a single value. In the event that they cover more than one category/ID, I believe that there'd be an additional entry, which would remove the need for an array consideration.

By additional entry here, do you mean a distinct field? In other words, would they have a "main category" and a list of other applicable categories?

Additional note on the array question above. Even if some scanners make a decision to pick the "best match" (or highest severity) when assigning a category or vuln ID to a finding, that's not the standard we have to match. What we have to cater to is if any scanner can associate multiple categories or return multiple CVE IDs for a finding, then we should make these field arrays.

While I agree that we don't want to be in the business of assigning a category, we're collecting the data that is being presented by the scanner. So if the scanner says "Firewall", we should take the observer at its word vs. trying to maintain a matrix.

I think my comment was misunderstood there. I don't want to override or map anything. What I'm saying is much more simple: if any scanner is going to give us an array of categories or multiple CVEs for a finding, I want our field to be an array that can take it all in, as is. This would also mean that in most cases when we have a single category/CVE id, we'd have arrays of 1 item, but that's fine.

@webmat
Copy link
Contributor

webmat commented Nov 18, 2019

I'll take care of the CHANGELOG.next.md conflict at merge time, btw. No need to worry about that. If you fix now, it may complain again after the next PR gets merged 🙂

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 more things I noticed. Sorry for the back & forth here ;-)

And we still need to close on whether .category and .id should be array fields.

short: Fields to describe the vulnerabilty relevant to an event.
description: >
The vulnerability fields describe information about a vulnerabilty that is
relevant to an event.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo first "vulnerabilty" => "vulnerability", then a more philosophical point:

The description of the field set makes it sound like users are expected to enrich "normal" events with vulnerability information. Is that our expectation?

My expectation would be more along the lines of users directly importing scan findings. In other words, this isn't another kind of event being enriched with this info. The event/document coming in is literally one finding from the report, which is getting imported for correlation in the SIEM.

If you agree with this, we should address this directly. Perhaps:

  short: Fields to describe a vulnerability finding.
  description: >
    The vulnerability fields describe information about a vulnerability that was
    reported by a scan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo.

I think adjusting this description makes sense. That said, just as some context: #581 (comment)

@peasead
Copy link
Contributor Author

peasead commented Nov 19, 2019

How do we expect people to use the vulnerability field set? My current understanding is that this is meant to import report findings, not to enrich events with vulnerability details. Correct?

If the idea is indeed to import scan findings, should we explore support for two related kinds of documents?

  • document for the report itself (not explicitly addressed right now)
  • documents for each finding of the report (what this PR currently describes)

Or are we good with just documents per findings + the report ID?

cc @joshbressers @stiltz

To address your question on how the vulnerability.* field-set could be used, I think both importing vulnerability information as well as enrichment. Importing scan findings is obvious, but this information can also be used to enrich if you're tracing an event back to "patient zero", knowing what vulnerabilities could have been leveraged is important to understand adversary:

  • resourcing
  • sophistication
  • reconnaissance
  • capabilities

In the second part of your question regarding documents per findings and report ID, I think that having the report ID is sufficient to guide the analyst to a place to get additional information i.e. the scanner vs. importing the entire scan report. I see a lot of products in the intelligence space that attempt to replicate "all the things" instead of providing a launch point for analysts.

Maybe we could adjust vulnerability.reference to be a link to the scanner report instead of an external reference? <-- I like this idea.

Just my $0.02 from some time in the field.

Copy link
Contributor Author

@peasead peasead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 more things I noticed. Sorry for the back & forth here ;-)

And we still need to close on whether .category and .id should be array fields.

I don't think they should be arrays.

id, the actual CVE #, are assigned 1 at a time (https://cve.mitre.org/about/faqs.html#what_is_cve). I wasn't able to find a vulnerability with more than 1 CVE, so I think the id wouldn't be an array field. I downloaded the entire 2019 CVE database and all of the vulnerabilities only had 1 CVE (id).

category isn't a MITRE created field (like CVE is). I looked at the reference that I'd used in the field-set (from Qualys - https://qualysguard.qualys.com/qwebhelp/fo_portal/knowledgebase/vulnerability_categories.htm) and it looks like everything is 1:1, so I don't know that we'd need an array here either. That said, is there any harm of making category an array to err on the side of caution?

Open to discuss, obviously, if I'm misunderstanding.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed thoughts, @peasead!

In the second part of your question regarding documents per findings and report ID, I think that having the report ID is sufficient to guide the analyst to a place to get additional information i.e. the scanner vs. importing the entire scan report. I see a lot of products in the intelligence space that attempt to replicate "all the things" instead of providing a launch point for analysts.

  • I'm also good on only capturing the report ID and not capturing "all the things". I just wanted to make sure this was also how you saw it. Looks like we're on the same page. 👍

Maybe we could adjust vulnerability.reference to be a link to the scanner report instead of an external reference? <-- I like this idea.

  • I think we have a need for both URLs.
    • For now I would say .reference should remain a URL that points to general documentation about a thing, in this case the CVE. This is the semantics we've been establishing, so far for "reference URL" fields.
    • I also really love the idea of capturing the URL an analyst would want to visit, in order to view the actual finding in their vulnerability scanner. It's another type of URL we will need in a few places (e.g. endpoint alerts), but we don't have a good name for that field yet. So IMO we leave this out of this PR, and add it as a follow-up PR.

On category and ID as arrays:

My thinking there isn't whether a published vulnerability could have more than one ID or category necessarily. It's rather "can one scanner finding correspond to more than one CVE", or could it have more than one category (orthogonal ones: "Windows" & "Firewall", or a wide problem: "SUSE" + "Debian" + "RHEL" ).

I do think it would be better to go on the safe side here, and the effort is pretty trivial. Elasticsearch actually doesn't care if docs contain a single value vs an array. The reason in ECS we try to flesh this out more explicitly is that the pipelines handling the events or generated language libs like ecs-dotnet have to know, in order to do the right thing.

We could even promote lightweight semantics, where if a scanner has such a thing as "main category" and "other categories", the main category should be first in the array.

To close on this, I'm 99% sure it's useful for category; for ID I'm not sure yet, I'm starting to think not. Perhaps the way scanners are structured means each CVE has its own detection, and a given finding will always only have one CVE? In other words the same piece of software would simply have multiple findings (one per CVE) if applicable...

See PR comment, for a concrete suggestion on what we can do to make category into an array.

or Firewall).
For example (https://qualysguard.qualys.com/qwebhelp/fo_portal/knowledgebase/vulnerability_categories.htm)

example: Firewall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this an array, here's what I have in mind. Feel free to adjust:

      description: >
        The type of system or architecture that the vulnerability affects. These may be
        platform-specific (for example, Debian or SUSE) or general (for example, Database
        or Firewall).
        For example (https://qualysguard.qualys.com/qwebhelp/fo_portal/knowledgebase/vulnerability_categories.htm)

        This field must be an array.

      example: '["Firewall"]'

I don't know if we need to expand on main category being first + other categories coming next. We can add that later, if there's confusion.

Copy link
Contributor Author

@peasead peasead Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm also good on only capturing the report ID and not capturing "all the things". I just wanted to make sure this was also how you saw it. Looks like we're on the same page. 👍

Roger roger.

Maybe we could adjust vulnerability.reference to be a link to the scanner report instead of an external reference? <-- I like this idea.

  • I think we have a need for both URLs.

    • For now I would say .reference should remain a URL that points to general documentation about a thing, in this case the CVE. This is the semantics we've been establishing, so far for "reference URL" fields.
    • I also really love the idea of capturing the URL an analyst would want to visit, in order to view the actual finding in their vulnerability scanner. It's another type of URL we will need in a few places (e.g. endpoint alerts), but we don't have a good name for that field yet. So IMO we leave this out of this PR, and add it as a follow-up PR.

The more I thought about this, I was coming to this conclusion as well to the thought of scanner.reference; but I agree we can follow-up with an enhancement to the schema when we've had more time for it to bake.

On category and ID as arrays:

My thinking there isn't whether a published vulnerability could have more than one ID or category necessarily. It's rather "can one scanner finding correspond to more than one CVE", or could it have more than one category (orthogonal ones: "Windows" & "Firewall", or a wide problem: "SUSE" + "Debian" + "RHEL" ).

I do think it would be better to go on the safe side here, and the effort is pretty trivial. Elasticsearch actually doesn't care if docs contain a single value vs an array. The reason in ECS we try to flesh this out more explicitly is that the pipelines handling the events or generated language libs like ecs-dotnet have to know, in order to do the right thing.

We could even promote lightweight semantics, where if a scanner has such a thing as "main category" and "other categories", the main category should be first in the array.

To close on this, I'm 99% sure it's useful for category; for ID I'm not sure yet, I'm starting to think not. Perhaps the way scanners are structured means each CVE has its own detection, and a given finding will always only have one CVE? In other words the same piece of software would simply have multiple findings (one per CVE) if applicable...

See PR comment, for a concrete suggestion on what we can do to make category into an array.

I like this and will make the adjustment declaring that this must be an array and formatting the example properly for .category.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for all of the adjustments!

The CLA check is failing because git was inadvertently set with a local email. I'm merging anyway, as Andrew has signed the CLA and adjusted his git config with a correct email address.

The email that will show in the merge commit has been used to sign the CLA.

@webmat webmat merged commit 6f8649c into elastic:master Nov 19, 2019
dcode pushed a commit to dcode/ecs that referenced this pull request Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants