-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce 'nodejs' field set #480
Conversation
All good questions where we don't have a definitive answer yet but we should come to a conclusion. Let me open Pandoras box. I think units do not belong in the field name but should be part of the field type. We added them to the field name in Metricbeat as it was the simplest solution back then. I would rather have support for elastic/elasticsearch#31244 This would make field names shorter, what unit was ingested or queried would not matter as Elasticsearch would standardise it. The problem is we don't have elastic/elasticsearch#31244 yet but I think we should push for it. Assuming we add units now, we will run into a conflict later. Let's assume we have How does the user know today which unit it is in? Obviously docs plays a part here but I think we can go further and either get support for it in the index pattern in Kibana or do something similar like the SIEM app which takes the defined fields and then exposes it on their own: https://github.com/elastic/kibana/tree/master/x-pack/plugins/siem/server/utils/beat_schema/8.0.0 Now let's get into abbreviations. I unfortunately agree with both ways. In general we should avoid abbreivations but there are exceptions to it. For me the exception are where the abbreviation is so well know that it can't be misinterpreted (or at least I hope it can't). Assuming we get bytes and duration type in Elasticsearch, half of the abbreviations in the current list will not be needed anymore. The other part is around metric types, like avg, count, min, max, pct, norm. (I'm ignoring request, message, and connection which I would argue today should not be in the list. These fields are basically a summary of multiple metrics and we build an object out of it to describe the boundaries. Again I think it would be great to have a type for this in Elasticsearch that would enforce on how max, min, avg etc. should be stored. I remember there have been many discussions around this in the past but I think we never got to a full conclusion. I think apm-server (@graphaelli ) already today uses some standard on how to store such metrics? Something like:
Until we have a type in Elasticsearch, my suggestion would be to store it as an object and use abbreviations here. This also ties partially in the unit part above. If we include the unit in the name, this would become even longer. If we don't, things stay simpler. Assuming we will get a metric type in Elasticsearch, it will become interesting if a metric type can also have a unit? Not sure I fully follow your last example. What would be the unit be for |
@ruflin I tend to agree with everything you've said.
The unit for "nodejs.memory.heap.objects_used" is "object". I think in cases like this where there's no associated magnitude (as there is with "nanosecond", "kilobyte", etc.), there's no need for an explicit unit. Units would be used by the UI to render the values differently, accept alternative query formats, etc. For something like "objects_used", it's just a number. If we do go down the route of removing units from names, then I think it would be fine for the metric name to be So if we did all that, here's how the Node.js metric names would look:
These aligns well with the JVM metrics that the APM Java agent is already publishing:
The Go agent's metrics aren't aligned, but they can be changed later. |
I agree with continuing to keep units out of field names as much as possible, documenting them for now and eventually having better support for units in Elasticsearch. Count metrics such as "objects" and "handles" should absolutely remain without a unit, too. I'm also good with standardizing on some more shortened names for metrics such as avg, pct, min, max. I think they're all pretty standard, and won't confuse anyone. Plus I don't think anyone will really insist on having On the other hand, I'm still absolutely not convinced that we should add a multitude of language/runtime names in ECS, however. Even if different runtimes have different sets of important metrics (e.g. eventloop in Node, more focus on heap/non-heap in Java). I have limited time today to make my case, perhaps we should find a time to discuss live :-) But I do think we could define one general purpose field set, that encapsulates all the metrics commonly seen. Then each language populates what's relevant to them. And of course we can add fields to identify the runtime, rather than having this identification be done (and limited to) a field nesting. Benefits of having one nesting for runtime metrics:
I'm sure more ideas will come to mind later, but let's start with these. I'd like to know what are the benefits of having one field set per runtime, can you elaborate on that? |
In addition, seeing a paste of one set of metrics / one event for each runtime would help make the discussion more concrete. @axw Could you provide a gist of that? |
The biggest benefit of having a field per runtime I think is progress. We need a place for them now and today. I could see us adding these fields now in some way to ECS (we should discuss on how this works) and figure out in the next x months how we can generalise it and potentially have it unified for ECS 2.0. I rather have everyone using now a standard that we change later and have a migration path then not having a standard. We should discuss in more detail "how" we put it in ECS, for example as experimental, but I see this out of scope of this PR. Keep in mind, this PR is against a branch and not master (on purpose), so we can delay this discussion. In any case we should still introduce some runtime fields for storing the runtime name, language etc. Follow up issue? @axw I like your proposal. +1 on |
What I'm hoping to get from having a few metric samples provided as part of this discussion, is insight into how much work it would actually be, if we tried to boil this down to one single field set. In my limited understanding (until I see the requested data samples), I would expect a few fields to be used across the board, then some fields used only in some cases (some for event loop runtimes, some for threading heavy runtimes, etc). And I think that's totally fine. If most of these subsets are orthogonal, it doesn't really make it complicated to merge all this one one field set. I can totally accept that implementing a common field set in Mb and APM is out of scope for the work that needs to happen now. But if we want to get this standardized for 8.0, we need to start having the standardization discussion now, not weeks before 8.0 ;-) Do you guys think we can do both? Start the standardization discussion, while you guys align Metricbeat & APM in the short term? Note that having the standardization discussion now may even surface problems that wouldn't be apparent, when working in one field set per runtime. |
I think to see if we can standardise it we first need to have first all the fields we want to have standardised in one place. That is also what this brach is for. So my suggestion for moving forward:
@axw Any chance you could open a second PR against this branch that summarises our discussion around naming and formatting of metrics? |
Yeah I don't mind merging whatever's helpful for experimentation in the But we'll need to address #480 (comment) before merging to master. |
@axw Could you update the PR that it aligns with our discussion around naming so we can get it in? |
I think of the names as containing an implicit unit type and value intent of |
- drop units from names - make "active" a sub-field of "requests" and "handles" - set eventloop.delay.avg's format to duration
Updated, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, from my perspective we can merge this as is an continue :-)
@Qard Do you see some fields we have in at the moment which would not match your description?
Well, for example using |
The problem is that I'll merge this as is. APM agents might not report metrics initially consistent with this list, as we're going to be making new releases with the runtime metrics pretty soon. We'll keep working on this for eventual consistency. |
| nodejs.eventloop.delay.avg | ||
| The average event loop delay for the reporting period. | ||
|
||
Event loop delay is sampled every 10 milliseconds. Delays shorter than 10 milliseconds may not be observed, for example if a blocking operation starts and ends within the same sampling period. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: The "10 milliseconds" is a configurable number. In the Elastic APM Node.js agent we have currently hard-coded it to 10 ms (which is also the default in Node.js), but the Node.js API we use the extract this information can be configured to any other duration.
Should we make a PR to update this? And if so, what should we write instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I was perhaps being a bit too specific about how the APM agent implementation works. Yes we should probably update this.
How about this?
Event loop delay is periodically sampled, e.g. every 10 milliseconds, but this may vary by source. Delays shorter than the sampling period may not be observed, for example if a blocking operation starts and ends within the same sampling period.
(this discussion doesn't fit under this Node.js issue, but since it happened here, I'll include this here) I just had a discussion with @exekias, @ruflin and @axw. Beats already have a convention to add "bytes" and time unit suffixes (wasn't documented, that's being fixed). There's a big wish to move away from that, to encoding the information in field metadata or similar once ES supports that. Even if we get field level meta data or a field type, generic inputs like Prometheus, might not be able to support that. Bottom line is, there's a convention and many many fields following it, and since we don't have a different solution to the problem of encoding units, we should rather stick to the current convention until we decide on an alternative convention and have a migration plan to that. So, we will postfix time based metrics with their unit and "bytes" on the bytes based metrics for now. We will not include these into ECS for now, but just align across agents outside ECS. The java metrics are already out. We will need to eventually deprecate those and move to the right suffices. |
FYI: The Node.js agent currently also add a |
Node.js-specific metrics pulled from elastic/apm-agent-nodejs#1021
Relates to #474
I have some questions of my own.
ECS has one existing field in one field set for a duration:
event.duration
, which has no unit suffix. There is an open proposal to add uptime fields which also lack suffixes. Does it make sense to include "ns" as the suffix innodejs.eventloop.delay.avg.ns
? It's a bit of an eyesore, and is inconsistent withevent.duration
... but how else does a user know what the unit is?The ECS guidelines state that one should "avoid abbreviations when possible". On the other hand, in the Beats dev guide standardised names specifies using "avg" instead of "average". Which way do we go?
nodejs.memory.heap.allocated.bytes
andnodejs.memory.heap.used.bytes
work well because they have "bytes" at the end, but for some other non-standard unit, what should we do? For example, suppose we were talking about allocated/freed objects:nodejs.memory.heap.objects_used
? Then we're not ending the phrase with a unit.nodejs.memory.used_objects
? Then we don't have a common prefix for auto-completion.