Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new file generator #336

Merged
merged 37 commits into from
Feb 26, 2019
Merged

A new file generator #336

merged 37 commits into from
Feb 26, 2019

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Feb 20, 2019

This pull request introduces a simplified generator for files generated based on the ECS core files in schemas/*. The idea is to read everything, augment all fields with defaults, copy reusable fields to their intended destination, eventually lint for problems or add validations; then we save the intermediary in memory representation as a generated file (see generated/ecs/fields_flat.yml).

At this point, we can trigger a series of generators based on this. Either in Python, based on the in memory dictionary, or another language, based on this simplified and fully fleshed out intermediary file.

This PR introduces the following generators:

  • The one that saves the intermediary YML representation
  • The schema.csv file. It moves from the root to generated/schema.csv
  • Elasticsearch 6 and 7 sample templates, at generated/elasticsearch/*

This PR also introduces a few other things:

  • ECS version to be used in code generation is now saved in version at the root of the repo
  • Some Python tests for the generator
  • Python tests intended to spec ECS itself. The introductory test in this file ensures we don't introduce a bug where the base fields are nested under base.* :-) The file is scripts/tests/test_ecs_spec.py, and should be used for any high level truism we want to ensure about ECS itself. Not for typical corner cases and unit tests.
  • These tests run as part of make check in Travis, or can be called specifically with make test.
  • The new generator is automatically called in the global make generate, but you can run only the new generator with only make generator

TODO before merging

  • Generate different index template for v6 and v7
  • Remove template.json and schema.csv at the root (they're now in generated/*/)
  • Readme: point people to the generated files.
    • gist: they'll all gradually move to generated/*/ from now on, except for docs
  • Set a default object_type for type: object fields? (only one without a default at this time)

Not in scope for this first PR, but should get attention soon:

  • adding validations / linting rules
  • generating asciidoc
  • generating the readme (will be replaced by asciidoc soon anyway, so will not be ported)
  • generating "perfect" beats yml defs
  • generating the Kibana JSON or the Go library
  • generating a sample Kibana index pattern in line with the ES templates
  • generating a Kibana canvas workpad to explore ECS

@webmat webmat changed the title WIP of a new generator for the files. WIP of a new file generator Feb 20, 2019
@webmat webmat force-pushed the generator-memory-repr branch 4 times, most recently from 0fd3351 to 00bdb10 Compare February 22, 2019 04:23
@webmat
Copy link
Contributor Author

webmat commented Feb 22, 2019

@MikePaquette Ok, the new generator is now able to generate the csv file and the template, including the reusable fields.

In the future I'd like to mostly output the generated files all in one place, in generated/*. But for the purpose of the pull request diffs, I'm also overwriting the old version of the two files, right at the root of the repo.

@MikePaquette
Copy link
Contributor

@webmat love the new CSV output with the nested fields!
Question: Should we / can we include a version column in the CSV output with the ECS release version?
Seems a bit redundant to have it in every row, but we'd be able to import each CSV file into Elasticsearch and keep statistics and visualizations of ECS over time.

@webmat
Copy link
Contributor Author

webmat commented Feb 22, 2019

@MikePaquette I'm a bit hesitant about this, because as you say, it will be the same value for every single line.

I would rather recommend tweaking the import process or script to add this value to the destination, every time a new import is made of a new ECS version.

@MikePaquette
Copy link
Contributor

Thanks @webmat my preference is the CSV file should have some indication of what version of ECS it represents, regardless of any processes that consume it downstream. Yes, putting version on every row seems a bit silly, but it works, and it less than 300 rows, so not a big deal from a space perspective.

@webmat
Copy link
Contributor Author

webmat commented Feb 22, 2019

Alright, let's do it, then :-)

@webmat webmat force-pushed the generator-memory-repr branch from 00bdb10 to 9f272a4 Compare February 22, 2019 20:32
Mathieu Martin added 15 commits February 22, 2019 16:24
- Flat field list generated in generated/ecs/fields_flat.yml
- Nested field list (similar to structure of schemas/*.yml) in generated/ecs/fields_nested.yml
- Reusable fields are correctly listed in fields_flat.yml only, at this time.
- Added a separate test file that's for sanity checks. More about the state of the ECS spec than testing code corner cases.
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan SGTM. I didn't spend much time on the python code, but converging all of the generation code to one place/one language that's fully encapsulated in this repo should be nice.

field['type'],
field['level'],
field.get('example', ''),
version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were a @since <version> type of field that indicates when the field was first added to the spec this would be useful to anyone trying to write backwards compatible code.

Copy link
Contributor Author

@webmat webmat Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. That's not currently the intent, but I'm noting as a future improvement.

The gist of this field specifically is that one can repeatedly import schema.csv in a spreadsheet; once for each ECS version, and then have all versions all at their disposal. A request from Mike.

But I love the idea of the introduction version for each field... We'll see how we can work that in.

@webmat webmat changed the title WIP of a new file generator A new file generator Feb 26, 2019
@webmat webmat merged commit ca9d77f into elastic:master Feb 26, 2019
@webmat webmat deleted the generator-memory-repr branch February 26, 2019 18:46
@webmat webmat removed the in progress label Mar 4, 2019
webmat added a commit to webmat/ecs that referenced this pull request Mar 5, 2019
This introduces a simplified generator for files, based on the ECS core files in `schemas/*`.

The idea is to read everything, augment all fields with defaults, copy reusable fields to their intended destination, eventually lint for problems or add validations; then we save the intermediary in memory representation as a generated file (see `generated/ecs/fields_flat.yml`).

At this point, we can trigger a series of generators based on this. Either in Python, based on the in memory dictionary, or another language, based on this simplified and fully fleshed out intermediary file.

This PR introduces the following generators:

- The one that saves the intermediary YML representation in `generated/ecs/`
- The schema.csv file. It moves from the root to `generated/schema.csv`
- Elasticsearch 6 and 7 sample templates, at `generated/elasticsearch/*`
- The old schema.csv and template.json have been moved to `generated/legacy/` for the time being (still generated).

This PR also introduces a few other things:

- ECS version to be used in code generation is now saved in he file `version` at the root of the repo
- Some Python tests for the generator
- Python tests intended to spec ECS itself. The introductory test in this file ensures we don't introduce a bug where the base fields are nested under `base.*` :-) The file is `scripts/tests/test_ecs_spec.py`, and should be used for any high level truism we want to ensure about ECS itself. Not for typical corner cases and unit tests.
- These tests run as part of `make check` in Travis, or can be called specifically with `make test`.
- The new generator is automatically called in the global `make generate`, but you can run only the new generator with `make generator` #naming-legend
- New section in the readme, pointing people to the generated files
webmat added a commit to webmat/ecs that referenced this pull request Mar 5, 2019
This introduces a simplified generator for files, based on the ECS core files in `schemas/*`.

The idea is to read everything, augment all fields with defaults, copy reusable fields to their intended destination, eventually lint for problems or add validations; then we save the intermediary in memory representation as a generated file (see `generated/ecs/fields_flat.yml`).

At this point, we can trigger a series of generators based on this. Either in Python, based on the in memory dictionary, or another language, based on this simplified and fully fleshed out intermediary file.

This PR introduces the following generators:

- The one that saves the intermediary YML representation in `generated/ecs/`
- The schema.csv file. It moves from the root to `generated/schema.csv`
- Elasticsearch 6 and 7 sample templates, at `generated/elasticsearch/*`
- The old schema.csv and template.json have been moved to `generated/legacy/` for the time being (still generated).

This PR also introduces a few other things:

- ECS version to be used in code generation is now saved in he file `version` at the root of the repo
- Some Python tests for the generator
- Python tests intended to spec ECS itself. The introductory test in this file ensures we don't introduce a bug where the base fields are nested under `base.*` :-) The file is `scripts/tests/test_ecs_spec.py`, and should be used for any high level truism we want to ensure about ECS itself. Not for typical corner cases and unit tests.
- These tests run as part of `make check` in Travis, or can be called specifically with `make test`.
- The new generator is automatically called in the global `make generate`, but you can run only the new generator with `make generator` #naming-legend
- New section in the readme, pointing people to the generated files
webmat added a commit that referenced this pull request Mar 5, 2019
Backport of PR #336 to 1.0 branch. Original message:

This introduces a simplified generator for files, based on the ECS core files in `schemas/*`.

The idea is to read everything, augment all fields with defaults, copy reusable fields to their intended destination, eventually lint for problems or add validations; then we save the intermediary in memory representation as a generated file (see `generated/ecs/fields_flat.yml`).

At this point, we can trigger a series of generators based on this. Either in Python, based on the in memory dictionary, or another language, based on this simplified and fully fleshed out intermediary file.

This PR introduces the following generators:

- The one that saves the intermediary YML representation in `generated/ecs/`
- The schema.csv file. It moves from the root to `generated/schema.csv`
- Elasticsearch 6 and 7 sample templates, at `generated/elasticsearch/*`
- The old schema.csv and template.json have been moved to `generated/legacy/` for the time being (still generated).

This PR also introduces a few other things:

- ECS version to be used in code generation is now saved in he file `version` at the root of the repo
- Some Python tests for the generator
- Python tests intended to spec ECS itself. The introductory test in this file ensures we don't introduce a bug where the base fields are nested under `base.*` :-) The file is `scripts/tests/test_ecs_spec.py`, and should be used for any high level truism we want to ensure about ECS itself. Not for typical corner cases and unit tests.
- These tests run as part of `make check` in Travis, or can be called specifically with `make test`.
- The new generator is automatically called in the global `make generate`, but you can run only the new generator with `make generator` #naming-legend
- New section in the readme, pointing people to the generated files

* Re-generate files after rebasing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants