Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 2 missing timestamps to file #445

Merged
merged 4 commits into from
May 22, 2019
Merged

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Apr 29, 2019

This is take 2 of something started in #441. We're departing from the internal
consistency of the 4 Linux file timestamp names (atime, ctime, crtime, mtime),
and going for clear and readable names.

Right now this is only introducing the 2 missing fields, created and accessed.

I think it would make sense to rename the other two eventually. But since this would be a breaking
change, we'd have to deprecate them for a while & document how to handle deprecations in ECS, and so on. This can be done as a separate PR.

Adding these 2 fields unapologetically should be straightforward, though :-)

Closes #422

Note for posterity

The first 2 timestamp fields for file -- introduced with ECS 1.0 -- were named in a very Posix-centric way. This PR departs from this approach. We are considering deprecating the initial two fields eventually, in favour of more readable ones.

Old New Notes
ctime ? Future deprecation. ctime is often confused with crtime
mtime modified? Future deprecation.
(missing) created New from this PR. Maps to Posix crtime
(missing) accessed New from this PR. Maps to Posix atime

@webmat webmat self-assigned this Apr 29, 2019
@webmat webmat requested review from ruflin and cwurm April 29, 2019 19:50
@webmat webmat mentioned this pull request Apr 29, 2019
schemas/file.yml Outdated
- name: accessed
level: extended
type: date
description: Last time file content accessed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a file was renamed? Does this count? Does this correlate with touch foo.txt?

Copy link
Contributor Author

@webmat webmat Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I guess we could detail which actions typically update which timestamp, on Windows and Linux platforms. But I'm sure there's edge cases.

Perhaps a more straightforward approach (and easier to maintain) would be to state which timestamps this maps to, for each platform family?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it different between platforms? Don't know much about the windows ones.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this should align with the last access time that the different platforms report natively? I assume that's what users would expect.

Linux and macOS track access times by default and show them in the output of e.g. stat (stat -x on macOS). Windows no longer tracks access time by default for performance reasons, but it can be enabled.

Maybe the description should say Last time file was accessed.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've submitted a small tweak to the description, my initial sentence wasn't great :-)

But I feel like getting into the detail of what happens on renames, touch & the like will not be helpful, and will be somewhat out of place.

I'd vote to leave as is, and update if there seems to be confusion / questions about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the part I'm more concerned is how this competes / conflicts for example with ctime. It's super easy to add new fields but will be very hard to remove fields again. So if we add a field, we should be very clear what it's used for and how.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I hadn't noticed this last comment, when I responded on the PR itself.

I'm not sure what you meant by "this" in "how this competes".

In #441, we're clarifying a quirk of ctime vs mtime, see this.

The cryptic crtime often confuses people vs ctime. This is why instead we're introducing created, as the field for crtime. So I think this is step 1 in clarifying these timestamps.

Step 2 of clarifying these timestamps will be getting rid of the already existing ctime and mtime. But as already discussed in my comment a few hours ago, this should be done separately. I want do have good support for deprecations.

Since we're improving the field descriptions over in #441 and introducing new ones here, let's pull them all together, to have a complete overview of the proposed changes:

  • mtime: Last time the file content was modified.
  • ctime: Last time the file attributes or metadata changed.
    Note that changes to the file content will update mtime. This implies
    ctime will be adjusted at the same time, since mtime is an attribute
    of the file.
  • created: File creation time.
    Note that not all filesystems store the creation time.
  • accessed: Last time the file was accessed.
    Note that not all filesystems keep track of access time.

Is there anything missing or unclear here?

If you're questioning whether they're worth adding at all, I do think so. Both are fundamental pieces of information with no overlap with the 2 timestamps we already have. The fact that not all filesystems / configurations record creation time or last access time for performance reasons is not a reason for not adding them, IMO. When they're available, I think they're useful.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the introduction of created. For ctime / accessed I see some potential conflicts but I'm ok moving forward if it's clear to everyone else. You mention in your PR description, that some should be deprecated. If there is overlap, we should directly deprecate the old fields when a new replacement one is introduced.

@webmat
Copy link
Contributor Author

webmat commented May 1, 2019

@ruflin @cwurm Any objections left about this PR?

@webmat
Copy link
Contributor Author

webmat commented May 1, 2019

@ruflin About the deprecation: yes, I'm not deprecating anything for now. The affected fields would be the already existing ones, and I'm not touching them here.

I agree that we will need to introduce the 2 new fields for them at the same time we introduce the deprecation of the old ones. I'm keeping this for later, because I'd like to investigate the possibility of having specific attributes & handling for deprecations (rather than just editing the field description). This support for deprecation is out of scope for this PR.

@cwurm
Copy link

cwurm commented May 1, 2019

@ruflin @cwurm Any objections left about this PR?

I'm good, LGTM.

@webmat webmat force-pushed the file-timestamps branch from ec81adc to b73a9e6 Compare May 21, 2019 18:53
@webmat webmat merged commit 836c1ec into elastic:master May 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

file creation and access time
3 participants