Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event categorisation fields #242

Merged
merged 6 commits into from
Dec 7, 2018
Merged

Conversation

tsg
Copy link
Contributor

@tsg tsg commented Dec 6, 2018

This PR is a minimal change to the current ECS fields that allows us to go in the planned path for categorization without requiring breaking changes later on. The hope is that we'll get this in for Beta2. It does the following:

  • Updates the docs for event.action and event.category to be closer to what we are planning, but keeps their definition fairly abstract. The docs don't go as far as showing a list of examples.
  • Changes the event.type definition to be only reserved for now.
  • Adds event.kind and event.outcome, which are fairly non-contentious. I can remove them from the PR if needed to get this in before Beta2.

I plan to follow up with a more complete PR that adds more details.

@tsg tsg requested review from ruflin and MikePaquette December 6, 2018 10:45
fields.yml Outdated

This gives information about what type of information the event
contains, without being specific to the contents of the event. Examples
are `event`, `state`, `alarm`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of event, perhaps we should use log here? For me all 3 kind are an event (based on ECS).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwurm Also mentioned action here as an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log is not great, btw, in Auditbeat we have events like "process start" that we get from a netlink socket, not from a "log".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree "event" is better than "log" for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the definition should be a bit more forceful in that this field must contain one of these values. The goal here is super broad categorization of how this document/event came about. So the values have to be predictable:

  • event: just reporting something that was observed in real-time (e.g. syslog message, log entry, event on a bus such as ebpf).
  • state: generating an update on the state of something on a regular schedule (list of running processes, installed packages, current CPU usage).
    • So for example, metrics from metricbeat should all use event.kind:state.
  • alarm: alerting on the fact that something unwanted happened, or a state changed enough to warrant attention from someone. Could be triggered by a static threshold, ML, a rules engine, etc.

What we want to avoid here is to have people map arbitrary values from their event sources to this field, and we end up with lots of unpredictable values in event.kind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course the corollary is that if people determine that their stuff doesn't fall into one of these 3, we want them to open a GH issue to discuss it, and potentially grow that list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a warning like for the others. I'm not sure we want to define event, state, alarm, etc just yet, there will be more than that and we'd do a poor job if we try to do it by the freeze date.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the defined list of values later on should go into a different document and potentially we have 1 doc per use case and the values for each use case might be different.

log for me does not mean it must come from a file. I'm more thinking of the historical definition an official record of events during the voyage of a ship or aircraft.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwurm Also mentioned action here as an option.

I think it was state_restart. I wanted a way to distinguish between a state that is triggered periodically (e.g. every 1h) and one triggered by a restart of (in this case) Auditbeat. One could imagine other reasons it was sent (e.g. a threshold was breached, or that too many changes accumulated). If we had something like an event.trigger field that captures why a document was sent that would also work - but maybe that's one field too many as well.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, I'd like to get this in.

However I think we need to clearly warn people that a few of these fields will soon have prescribed content, therefore not to use them (or use at their own risk). event.type is pretty clear on this already. Here are the other ones:

  • event.kind
  • event.category
  • event.outcome

If we don't do this and people start using them widely, then suddenly coming up with the list of prescribed values is a breaking change.

.gitignore Outdated
@@ -1,3 +1,4 @@
.DS_Store
*.pyc
env
*.swp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL I have this in my global gitignore, so it never came up.

Please make it *.sw?, though 🙏🏼

fields.yml Outdated

This gives information about what type of information the event
contains, without being specific to the contents of the event. Examples
are `event`, `state`, `alarm`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree "event" is better than "log" for this.

fields.yml Outdated

This gives information about what type of information the event
contains, without being specific to the contents of the event. Examples
are `event`, `state`, `alarm`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the definition should be a bit more forceful in that this field must contain one of these values. The goal here is super broad categorization of how this document/event came about. So the values have to be predictable:

  • event: just reporting something that was observed in real-time (e.g. syslog message, log entry, event on a bus such as ebpf).
  • state: generating an update on the state of something on a regular schedule (list of running processes, installed packages, current CPU usage).
    • So for example, metrics from metricbeat should all use event.kind:state.
  • alarm: alerting on the fact that something unwanted happened, or a state changed enough to warrant attention from someone. Could be triggered by a static threshold, ML, a rules engine, etc.

What we want to avoid here is to have people map arbitrary values from their event sources to this field, and we end up with lots of unpredictable values in event.kind.

fields.yml Outdated

This gives information about what type of information the event
contains, without being specific to the contents of the event. Examples
are `event`, `state`, `alarm`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course the corollary is that if people determine that their stuff doesn't fall into one of these 3, we want them to open a GH issue to discuss it, and potentially grow that list.

This contains high-level information about the contents of the event. It
is more generic than `event.action`, in the sense that typically a
category contains multiple actions.
example: user-management
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan with this field is to have a predetermined set of categories (a few tens of them). Having the field now in ECS sets us up for a sort of breaking change when we come up with that list.

Therefore I think the description can remain as is, but we should add a warning telling people that they're using this field at their own risk, because it will soon be a field that should be populated with prescribed values.

It won't be the end of the world, though. It's not going to break ingestion or anything. It's just that Elastic solutions will expect this field to contain certain values, so if some sources populate this differently, they won't have the best experience...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning added.


- name: action
- name: outcome
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outcome is also meant to be a field with prescribed content. We should add a similar warning as what I suggest for event.category. More or less "We're about to come up with a list of prescribed values, so use with caution."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning added.

@tsg tsg force-pushed the categorization-minimal branch from 3161db9 to 5d372fb Compare December 6, 2018 19:09
@tsg
Copy link
Contributor Author

tsg commented Dec 6, 2018

@webmat I think I addressed your comments, please have a look again.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@webmat
Copy link
Contributor

webmat commented Dec 6, 2018

Please wait for approval by @MikePaquette and @ruflin before merging. This is a big one.

fields.yml Outdated
@@ -396,35 +396,64 @@
Unique ID to describe the event.
example: 8a4f500d

- name: kind
level: core
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, only spotted now but these fields should go into extended for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved event.kind and event.outcome to extended.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be proposing pushing these to core sometime in the future :-)

@ruflin
Copy link
Contributor

ruflin commented Dec 7, 2018

@tsg I think you need to run make again locally to update the generated files.

@tsg
Copy link
Contributor Author

tsg commented Dec 7, 2018

@ruflin, thanks, I pushed again.

Copy link
Contributor

@MikePaquette MikePaquette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tsg tsg merged commit 6439b8a into elastic:master Dec 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants