-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the default text analyzer to some fields #680
Conversation
@webmat I think that |
Would ngram be better for filepath and process path? These tend to be longer strings. Wildcard search against a text field when searching TB's of data might be slow, if a company is collecting logs from a medium to large enterprise network.
This is how we index dns.question.name
|
@peasead Yes, that's exactly the reason I'm considering adding it there :-) @mbudge Agreed, there are better ways to index the path-like fields like path, url etc. Still, I think it's good to add the default
So overall the thinking for the path fields should be interpreted as "progress over perfection". But we'll still deliver the perfection in time ;-) If the fields specific to paths are superior enough, we could even deprecate their |
@mbudge And you also bring another good point on performance. Right now we're adding these in order to enable efficient detections, mostly for the SIEM alerting engine. If users want to remove these analyzers and lose the ability to do these detections, they're free to do that. |
@dainperkins What would you think about having the default analyzer (full text search) on |
I think thats an excellent idea - I'll make a PR if you show me what needs to be done :) |
Whoops, thanks for the reminder @dainperkins @webmat do you need me to make a fresh PR with the changes to |
@peasead @dainperkins Meant to respond earlier, sorry I forgot to hit "Comment" 😂 I added them both to this PR directly. |
Roger roger. I'll drop the PR. 👍 |
I consider this PR ready for final review, please voice opinions (esp. disagreement) soon, I'd like to merge tomorrow. If you think additional fields would benefit from this, please voice your opinion. But this shouldn't be considered a blocker for the PR, they will be addressed in follow-up PRs :-) |
looks good to me |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, these fields make sense to me. Here are some others that we may want to consider subsequently:
For sure:
package.description
Maybe? (thinking about query patterns):
*.registered_domain
service.name
service.node.name
Agree with some of those. I'd like to do subsequent PRs (in or after 1.4) for them, however. Quick note on values that are dot separated (domains, hostnames) or even dash-separated (hostnames): the default analyzer doesn't deal well with them. They would require a specially crafted analyzer that breaks them up correctly :-) |
…#680) Note: fields that are reused elsewhere are getting the `text` multi-fields in all locations where they're reused as well. `text` introduced on these fields: - as.organization.name - error.stack_trace - file.path - file.target_path - http.request.body - http.response.body - organization.name - os.name - os.full - process.executable - process.name - process.title - process.command_line - process.working_directory - threat.technique.name - url.original - url.full - user.name - user.full - vulnerability.description
This implements many fields mentioned in #570.
Note: I'm not adding
host.name
, mentioned in 570 because the default analyzer doesn't split on-
. So I'm not sure it's worth adding. Please let me know if I'm missing something.In going over the fields, I've identified a few more that I think may be interesting to add to this PR. Please voice your opinions. I'm happy to hold off or add now:
@neu5ron @randomuserid @rw-access @MikePaquette @peasead