-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shorthand subset notation #805
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, love it!
I think we should make the check a little more strict, and we can merge this. See below.
# Every field must have a 'fields' key or the schema is invalid | ||
if isinstance(val['fields'], dict): | ||
# A missing fields key is shorthand for including all subfields | ||
if 'fields' not in val or val['fields'] == '*': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'd make the check a little more specific, to fail more explicitly if someone makes a typo.
If it's specifically an empty dictionary (rather than the absence of the key "fields"), then it's the shorthand to grab all subkeys.
Otherwise the following silently gets me all of host
instead of just host.name
:
base:
fields: "*"
host:
feels:
name:
fields: "*"
I think realistically, users will use this with many field sets (10-15?), so a subtle typo like that could go unnoticed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, how about a list of allowed keys to safeguard against that case instead? I haven't gotten around to it yet but I'd like to allow index
as an option for each field and a list of allowed keys would let you do something like
host:
index: false
to grab all the host fields with the shorthand notation while also excluding the potential mistyped fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something more strict like a whitelist of allowed keys in that "subset" structure sounds good, yeah.
I'm not sure I follow what index
does here, though. Can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't know how I missed this, I thought I replied.
index
here would correspond to the index
property in the elasticsearch mapping. So this would let us specify fields from the schema that we want to exist in documents but don't need to be indexed. This particular use case may or may not be useful but it's something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When saving a document in Elasticsearch, the raw document always remain complete and untouched.
The behaviour of index:false
on field myfield
simply does not create a searchable mapping entry for myfield
. But if you were to read a document that contains myfield
, it would be present in the _source
of the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misread your last comment. Sorry for explaining essentially what you just said 😂
Let's keep the whitelist to only accept "fields" for now, if you don't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR makes the subset notation less cumbersome by including all subfields if the
fields
key is missing. Instead of writingevery time you want to include all subfields, you can instead write
This still leaves the ability to specify other options for the field in the dictionary later on without breaking changes.