Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additions to "file" #441

Merged
merged 16 commits into from
May 21, 2019
Merged

Additions to "file" #441

merged 16 commits into from
May 21, 2019

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Apr 24, 2019

This PR addresses/improves a few things in the the "file" field set.

If any of these turns out controversial / complicated, I'll be happy to extract them to a separate PR. The goal here is to get the simple things in.

  • Added file.name (See add file.name  #322)
  • Added file.directory
  • Added a lot of examples
  • Adjusted a few field definitions

Closes #322

@webmat webmat self-assigned this Apr 24, 2019
@webmat webmat requested review from ruflin and cwurm April 24, 2019 18:58
This was referenced Apr 24, 2019
@neu5ron
Copy link

neu5ron commented Apr 24, 2019

BOOM!

schemas/file.yml Outdated
Only relevant when `file.type` is "file".
example: 16384

- name: crtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cryptic for ECS standard I would say? What about created?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be creationtime to keep in line with the *time naming schema we have going on.

The stat output calls it Birth (though always empty on Linux since it does not expose the creation time) - but I don't think that's a good name.

It's maybe unfortunate we already have ctime and mtime which are UNIX names that Windows does not have. ctime could be confused as the creation time. If we ever want to make a change we could go for more descriptive names, e.g. created|modified|changed|accessed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was specifically pondering this, as I was working on this PR.

The file field set didn't get much attention before 1.0.0, unfortunately. But I'm not a fan of the existing field names ctime and mtime, because they go against one of the principles we're aiming for with ECS. We try to avoid contractions and acronyms, unless the convention is overwhelmingly in favour of them (e.g. ip, pid).

Seeing that we already had both of these fields, I went with the other 2 standard field names. I went for internal consistency in file. But this is getting on my nerves even more, as we discuss this 😂

As @cwurm points out, ctime vs crtime are confusing. Any article about Linux filesystem timestamps (debugfs, stat) mention this common confusion.

So I'm very open to revisiting these field names. It's too late for now to change ctime and mtime, although we could create 4 new fields with less ambiguous names and mark the first two as deprecated for the life of ECS 1.x, and remove them in 2.0.

Proposed names:

  • file.created: creation time
  • file.accessed: last access time
  • file.modified (deprecates file.mtime): last modification time for the file content
  • file.metadata_modified (deprecates file.ctime): last modification time for the file metadata.

I'm not a big fan of this last one. Ideas welcome.

For now we can push this naming decision to a bit later, focus on the two new fields, and make a note to deprecate mtime/ctime.

schemas/file.yml Outdated

- name: device
level: extended
type: keyword
description: Device that is the source of the file.
example: sda

- name: system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this belongs here. file is information about a specific file. I see how it could fit it in but stumbled over it.

The other part I would not call it system as I expect us to have a system object in the future related to system metrics. system already exists today in Metricbeat. Alternative names?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file.filesystem?

Copy link
Contributor Author

@webmat webmat Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty strongly in favour of adding this field even though it's not information directly about a given file. It's still useful metadata about the place where a file lives.

I wanted to point to host as very general metadata about an event, but host is a field set, not a field inside one specific field set.

Perhaps we could do the same with filesystem information, and introduce a volume field set?

Volume fields:

  • name
  • filesystem
  • size (bytes)
  • used (bytes)
  • letter (Windows drive letter)
  • kind (ssd, spinning rust)
  • type (platform specific types, such as AWS' gp2, io1 etc)
  • id
  • serial_number
    ...

schemas/file.yml Outdated
description: >
Last time the file attributes or metadata changed.

- name: atime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these are the "standard" names that is why we use them here?

@webmat
Copy link
Contributor Author

webmat commented Apr 29, 2019

For now I'll take out file.system and the timestamps, so we can move forward with this PR.

I'll open a PR right away for the 2 timestamps with clearer names. I'd like to introduce the other 2 timestamps sooner than later and deprecate the two original ones. But this is less pressing. We just need to introduce the deprecation well in advance of ECS 2.0, which is quite a ways off in the future :-)

I'll make a note for the volume field set. We haven't had a pressing need for it yet, so it may take a bit longer before I get around to adding it as a field set. Although the more I think about it, the more I think it makes sense.

@webmat
Copy link
Contributor Author

webmat commented Apr 29, 2019

Thanks for the feedback!

I've taken out the new timestamps as well as file.system. Only non-controversial changes are left here, IMO.

Please have a quick look, to see if this PR is ready.

New, more readable timestamps proposed in #445, and #444 opens the discussion for volume (including volume.filesystem).

schemas/file.yml Outdated

- name: ctime
level: extended
type: date
description: Last time file metadata changed.
description: Last time the file attributes or metadata changed.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Linux and macOS, ctime will also be updated when the file contents change (together with mtime).

Copy link
Contributor Author

@webmat webmat Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a documentation tweak about this. I find it my edit a bit clumsy, and I'm not 100% convinced this belongs in ECS.

In a sense this helps explain the relationship between two ECS fields, so I'm ok with it going in.

On the other hand, I feel like we're getting too close to documenting OS quirks & the like, which can be different from one platform to the next.

But in this case I'm ok with this addition. I'll merge tomorrow, unless there's disagreement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwurm Any other objections about this PR? Does this documentation update work for you?

It's not the best prose, but we can always tweak improve it later if needed :-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwurm Any other objections about this PR? Does this documentation update work for you?

Yes, LGTM

On the other hand, I feel like we're getting too close to documenting OS quirks & the like, which can be different from one platform to the next.

True, though I'm not sure we can avoid it entirely. Only UNIX has atime/mtime/ctime/crtime, Windows has only lpCreationTime/lpLastAccessTime/lpLastWriteTime (and lpLastAccessTime is not tracked by default).

So for Windows then, is lpLastWriteTime == mtime (modified in a future version?), with ctime empty (later changed?)? Or is lpLastWriteTime == mtime == ctime? The Auditbeat FIM module must be implementing this in some way, but I don't know how off the top of my head.

Mathieu Martin added 15 commits May 1, 2019 10:58
We had a situation where

- mtime was the Content change, and
- ctime was the Metadata change.

This adjustment's goal is to avoid having people remember these fields with an
incorrect mnemonic, where mtime=Metadata & ctime=Content. This is why I'm downplaying
the usage of the word "metadata".
This reverts commit dc6b3c5.
@webmat webmat force-pushed the file-additions branch from 12aa9fc to b312872 Compare May 1, 2019 15:02
Copy link
Contributor

@MikePaquette MikePaquette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@webmat General recommendation: Don't mix different things into PR which will get them merged much more quickly. The additions to file belong together, but all the changes to the docs I would consider not related. If there would have been 2 PR's the additional fields could have gone in much more quickly as an example.

@webmat
Copy link
Contributor Author

webmat commented May 10, 2019

@ruflin Agreed. I was just trying to streamline my time, before the month of May, where I have 3/4 weeks not on ECS :-)

@webmat webmat merged commit c185061 into elastic:master May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add file.name
5 participants