Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorthand subset notation #805

Merged
merged 7 commits into from
Apr 8, 2020

Conversation

marshallmain
Copy link
Contributor

This PR makes the subset notation less cumbersome by including all subfields if the fields key is missing. Instead of writing

field1:
    fields: *

every time you want to include all subfields, you can instead write

field1: {}

This still leaves the ability to specify other options for the field in the dictionary later on without breaking changes.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, love it!

I think we should make the check a little more strict, and we can merge this. See below.

# Every field must have a 'fields' key or the schema is invalid
if isinstance(val['fields'], dict):
# A missing fields key is shorthand for including all subfields
if 'fields' not in val or val['fields'] == '*':
Copy link
Contributor

@webmat webmat Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'd make the check a little more specific, to fail more explicitly if someone makes a typo.

If it's specifically an empty dictionary (rather than the absence of the key "fields"), then it's the shorthand to grab all subkeys.

Otherwise the following silently gets me all of host instead of just host.name:

base:
  fields: "*"
host:
  feels:
    name:
      fields: "*"

I think realistically, users will use this with many field sets (10-15?), so a subtle typo like that could go unnoticed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, how about a list of allowed keys to safeguard against that case instead? I haven't gotten around to it yet but I'd like to allow index as an option for each field and a list of allowed keys would let you do something like

host:
  index: false

to grab all the host fields with the shorthand notation while also excluding the potential mistyped fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something more strict like a whitelist of allowed keys in that "subset" structure sounds good, yeah.

I'm not sure I follow what index does here, though. Can you clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't know how I missed this, I thought I replied.

index here would correspond to the index property in the elasticsearch mapping. So this would let us specify fields from the schema that we want to exist in documents but don't need to be indexed. This particular use case may or may not be useful but it's something to consider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When saving a document in Elasticsearch, the raw document always remain complete and untouched.

The behaviour of index:false on field myfield simply does not create a searchable mapping entry for myfield. But if you were to read a document that contains myfield, it would be present in the _source of the doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misread your last comment. Sorry for explaining essentially what you just said 😂

Let's keep the whitelist to only accept "fields" for now, if you don't mind.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marshallmain marshallmain merged commit 960d7db into elastic:master Apr 8, 2020
@marshallmain marshallmain deleted the shorthand-subset branch April 8, 2020 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants