Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions re: Message fields #223

Closed
MikePaquette opened this issue Dec 4, 2018 · 5 comments
Closed

Questions re: Message fields #223

MikePaquette opened this issue Dec 4, 2018 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@MikePaquette
Copy link
Contributor

“message” field might be better named as “raw_message” to denote that it’s untouched. But why repeat that in “event.original”? Or “log.original”? (Seems like lots of duplication here.)

No. 6 of 16. This question was asked by a new ECS user, who is familiar with mapping IT events to data models and use cases in other schemas. These questions are being posted as a GitHub issue, because a) they may offer valuable insights. b) we expect that many new users will have similar questions.

@MikePaquette MikePaquette added the question Further information is requested label Dec 4, 2018
@MikePaquette
Copy link
Contributor Author

In ECS, the message field is not defined to hold the entire raw message, but rather the message field of the log, or a string that best represents a description or summary of the event.

That said, there is some clarification that could be applied to these definitions.

@MikePaquette MikePaquette self-assigned this Dec 10, 2018
@webmat
Copy link
Contributor

webmat commented Dec 10, 2018

In the past, we've seen a mix of meanings for the field message. For example, Logstash processing a plaintext event would put the whole string in message as is, making it sound like message is the place for the original value.

It's also been common that people overwrite message with the most relevant part of the message. For example, after parsing a syslog header and extracting these details out to other fields, replace message with the rest of the actual message, in effect "cleaning up" the field.

ECS has to decide on one of these two approaches. We've decided that the "cleaned up" message belongs in the most canonical field, message, and that the original message belongs elsewhere, if users decide to keep the original around.

With this said, I agree we probably need to spend time on event.original vs log.original. event.original is meant to be the "most" raw version of the original event, since log.original's description allows for some preliminary modifications (newlines removed, encoding). Therefore I wonder if log.original is really necessary, when we have event.original.

@Randy-312
Copy link

I would propose that message also have .keyword values so that we can also visualizations on the contents of message fields.

Today, we are identifying where we have multi-line issues (ie stacktrace) by using the following search, based on .keyword.
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"prefix": {
"message.keyword": "\tat"
}
},
{
"prefix": {
"message.keyword": " at"
}
},
{
"prefix": {
"message.keyword": " at"
}
},
{
"prefix": {
"message.keyword": " at"
}
},
{
"prefix": {
"message.keyword": " at"
}
}
]
}
}
}

@Randy-312
Copy link

the template around message should ALSO be adjusted to apply to only the first 256 characters (or is this 1024) of the message field so that we do NOT negatively affect elasticsearch ingestion rate.

@djptek
Copy link
Contributor

djptek commented Nov 16, 2021

This has been resolved see RFC: #1469

closing

@djptek djptek closed this as completed Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants