Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Searching for aligned ECS field storing the meta type of an event #243

Closed
simitt opened this issue Dec 6, 2018 · 8 comments
Closed
Labels
discuss question Further information is requested

Comments

@simitt
Copy link
Contributor

simitt commented Dec 6, 2018

@ruflin , @graphaelli and I were discussing offline what would be the proper field to indicate that an event is an apm event.
In the future we might want to query events across different shippers, e.g. combine filebeat metrics with apm metrics, etc. Therefore it would be interesting if there exists a cross-team aligned field that we should use for this type of information, so more concrete apm, metricbeat, filebeat, etc.

There were different thoughts around existing ECS fields:

  • agent.type: Unclear if generic apm information should be stored in the agent key.
  • event.type: If used, then this should rather contain the information about which apm event type it is span, transaction, metric, error.
  • event.module: If used, then this seems to correlate rather to information collected by apm agents, e.g. a mysql query, or an outgoing http request etc.
  • event.dataset: Is APM one big module? Not sure on how that would fit.
@simitt simitt added question Further information is requested discuss labels Dec 6, 2018
@ruflin
Copy link
Contributor

ruflin commented Dec 7, 2018

Few thoughts and ideas around this:

  • UI: Each UI requires specific data structures which are defined by event.dataset. For example the "Host Overview" in the infrastructure UI requires the datasets system.cpu, system.memory, system.network (few more). As soon as at least one of these data set for a host is available, we can show data. The same should be true for APM. As soon as event.dataset: apm.transaction is available we can show some basic data.
  • Aggregations: What the user will do, is running aggregations, be it on host.name or container.id. Assuming all Elasticsearch documents use ECS, in each bucket it will have 1 or multiple event.dataset values available. If in a bucket is one of the apm datasets, we can link to the APM UI. I would assume the filter in the APM UI would happen on the same field + value as the bucket. So if host.name: foo it would open the APM UI with the filtering host.name: foo.
  • Logging UI is a bit special as it can show most. Some basic view in the Logging UI could even work for metrics by just showing the raw JSON so this link should probably be available always. The difference is that based on event.dataset the Logging UI makes a decision on how to visualise each event.

Each solution / UI should register which event.dataset values it supports. The same is true for dashboards: Each dashboard requires one or multiple event.dataset to be available to work.

In general I think the shipper is much less important then the data structure itself. For system.cpu metrics it should not matter if it was ingested by Metricbeat, Functionbeat or an APM agent.

@simitt How does it currently look if APM ingests CPU / Memory / Disk metrics? Will this be 1 event or multiple events? If it's one, I'm wondering if 1 event can belong to multiple datasets?

@MikePaquette
Copy link
Contributor

One slightly-related thought: If the APM server can enrich the event on the way in, it can set observer.type: "APM Server" to provide a possible single-field filter for all APM-related events.

image

@simitt
Copy link
Contributor Author

simitt commented Dec 11, 2018

Using the observer for describing information around the APM Server is an interesting thought. However, I am not sure we can consider the apm server an observer, see #238 (comment).

Generally, the APM Server is only one component of the APM solution and I am more focused on whether there is any recommendation which field to use for setting apm as a type.

@ruflin thanks for the details around event.dataset. We don't use this terminology anywhere in APM and from an APM perspective I don't see additional value in adding it. Having that said, if there is an agreement that this field should be used across all products I don't see anything stopping us from using it. Just for clarification, is this a proposed solution already (and can I follow this somewhere) or was this a starting thought on how we could align?

@simitt
Copy link
Contributor Author

simitt commented Dec 11, 2018

APM currently doesn't ingest CPU or memory metrics.

@ruflin
Copy link
Contributor

ruflin commented Dec 12, 2018

@simitt Was a starting thought, nothing specific yet. We can add dataset to apm at any time later.

@dainperkins
Copy link
Contributor

On this note just ran into a user with metricbeat & winlogbeat, infra dash board shows hosts for both but only details for the metricbeat *nix hosts. Seems like a unified name space for infrastructure and App metrics would be a good thing, and all fall under a single metrics top level, or 2 (APM + HPM)

Basically:
SPM: CPU | Network | Memory etc. (e.g. metric or winlogbeats & snmp via logstash)
APM: broken down into things like application | web page | user | data query?

Will try and grab the metricbeat / winlogbeats / and some snmp mibs for comparison / standardiaztion if no one else has looked at it

@ruflin
Copy link
Contributor

ruflin commented Mar 22, 2019

@dainperkins That would be great. So far we tried to mostly take what Metricbeat creates as the foundation but it's not in ECS yet.

@jamiehynds
Copy link
Contributor

Closing due to in activity, but can re-open or create a new issue for APM mappings if required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants