metric: add static labeling capability to metric metadata #142570
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-observability
Motivation
Currently, there is no way to create and use statically-labeled metrics in CRDB. We do have some ability to have dynamically managed labels at runtime, but we do not emit such metrics by default as they typically have customer-controlled cardinality and are therefore opt-in.
Static labels are desirable in situations where we want to split a particular metric across one or more pre-determined dimensions that make it easy to consume the information. A common example is SQL query counts split up by query type (SELECT, INSERT, etc.). Today these metrics exist with names like
sql.insert.count
andsql.select.count
.Ideally, we should record these metrics under the
sql.count
name and split over atype
label that is pre-set to eitherselect
,insert
, etc. This would enable the user to aggregate the metrics to get a total count, and to view them split across the type dimension if desired.Proposed design
One of the challenges with introducing such a feature is maintaining backwards compatibility with existing metric names, and also with our internal TSDB persistence layer. A running cluster will have metrics saved under
sql.select.count
and if we simply switch to a new naming scheme we would omit these new metrics or be forced to rewrite them which we'd like to avoid.The idea is to introduce static labeling along with a "legacy metric name" that can allow us to transition an existing metric into a statically labeled version while preserving the old name for compatibility with TSDB Here's what such metadata could look like:
In this implementation, TSDB would use the
LegacyStaticName
(sql.select.started.count
) while Prometheus output would contain the labelsql.started.count{query_type=select}
In addition, we may want to expose the labeled metrics on a new endpoint like
/metrics
and allow customers to opt-in to this, while retaining the legacy names on/_status/vars
.Unsolved problems
One critical challenge that's not yet solved is how to enforce groupings of metrics that should be part of the same label set. We could do nothing and offload this responsibility to the DB engineer, but we might benefit from some structure that allows for easy maintenance of groups of metrics together in order to make it easy to keep track of which label sets are contained within which metric name.
With the implementation above it's unclear how someone would know which static labels
sql.started.count
is split by and prevent mistakes where more metrics are registered into the same label set by accident.Jira issue: CRDB-48424
The text was updated successfully, but these errors were encountered: