-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add Guidance wrt Labelling to Naming and Rules Best Practices #2691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 16 commits
e57a291
a40367a
ccf009a
0e87e09
a79aa9b
a0ebfa9
7694523
6b6ceb0
938a3a8
e42cd96
1cb09cd
62bf3d7
675d41f
092cc89
0a7611f
4da0a33
9b7793c
519fedc
2c94ccf
8a013e4
6c95581
3fecc71
c530444
9ac82a7
9b1f734
c13630c
f1b517f
9f4259a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,41 @@ | ||||||
| --- | ||||||
| title: Labels | ||||||
| sort_rank: 2 | ||||||
| --- | ||||||
|
|
||||||
| The label conventions presented in this document are not required | ||||||
| for using Prometheus, but can serve as both a style-guide and a collection of | ||||||
| best practices. Individual organizations may want to approach some of these | ||||||
| practices, e.g. naming conventions, differently. | ||||||
|
|
||||||
| ## Labels | ||||||
|
|
||||||
| Prometheus labels can come from both the target and from | ||||||
| [relabeling in discovery](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) as well as from the target itself. | ||||||
|
|
||||||
| By default Prometheus configures two primary discovery target labels. | ||||||
|
|
||||||
| - `job` | ||||||
| - The `job` is a default target label set by the scrape configs and is used to identify metrics scraped from the same target/exporter. | ||||||
|
conallob marked this conversation as resolved.
Outdated
|
||||||
| - If not specified in PromQL expressions, they will match unrelated metrics with the same name. This is especially true in a multi system or multi tenant installation | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be the intention of the user to not distinguish between unrelated metrics, for example when aggregating across jobs. So I'd turn this into some positive statement instead (use "specify" instead of "not specified"). Multi system seems vague and I don't think multi tenancy has anything to do with this.
Suggested change
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree that the user may want to aggregate across all, or a subset of jobs. But to do so, they should be explicit with the
"Multi-tenent systems" may not be the best term, but I'm referring to a Prometheus, run as a platform for multiple teams (e.g by a DevEx or Platform Engineering team), to prevent every team running their own siloed Prometheus stack. In such a setup, all PromQL expressions should be scoped with a Or framed another way, in such a centralised stack, always write |
||||||
|
|
||||||
| WARNING: When using `without`, be careful not to strip out the `job` label accidentally. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be intentional , so I think this needs to be conditional. |
||||||
|
|
||||||
| - `instance` | ||||||
| - The `instance` label will include the `ip:port` what was scraped | ||||||
|
conallob marked this conversation as resolved.
Outdated
|
||||||
|
|
||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't you need a similar warning for "instance" , depending on usage?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While For certain use cases that require using multiple layers of rules (e.g in a multi region, multi layered tree of Prometheus), you may want to strip out I've added a warning that stripping |
||||||
| ### General Labelling Advice | ||||||
|
|
||||||
| Use labels to differentiate the characteristics of the thing that is being measured: | ||||||
|
|
||||||
| - `api_http_requests_total` - differentiate request types: `operation="create|update|delete"` | ||||||
| - `api_request_duration_seconds` - differentiate request stages: `stage="extract|transform|load"` | ||||||
|
|
||||||
| Do not put the label names in the metric name, as this introduces redundancy | ||||||
| and will cause confusion if the respective labels are aggregated away. | ||||||
|
|
||||||
| CAUTION: Remember that every unique combination of key-value label | ||||||
| pairs represents a new time series, which can dramatically increase the amount | ||||||
| of data stored. Do not use labels to store dimensions with high cardinality | ||||||
| (many different label values), such as user IDs, email addresses, or other | ||||||
| unbounded sets of values. | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,6 +19,9 @@ This page documents proper naming conventions and aggregation for recording rule | |
| Keeping the metric name unchanged makes it easy to know what a metric is and | ||
| easy to find in the codebase. | ||
|
|
||
| IMPORTANT: `job` label is used to scope a PromQL to a specific service/exporter. It is **strongly** recommended that you | ||
| always set it, in order to scope your PromQL expressions to the system you are monitoring. | ||
|
|
||
| To keep the operations clean, `_sum` is omitted if there are other operations, | ||
| as `sum()`. Associative operations can be merged (for example `min_min` is the | ||
| same as `min`). | ||
|
|
@@ -27,6 +30,18 @@ If there is no obvious operation to use, use `sum`. When taking a ratio by | |
| doing division, separate the metrics using `_per_` and call the operation | ||
| `ratio`. | ||
|
|
||
| ## Labels | ||
|
|
||
| NOTE: Omitting a label in a PromQL expression is the functional equivalent of specifying `label=*` | ||
|
conallob marked this conversation as resolved.
Outdated
|
||
|
|
||
| * In both recorded rules and alerting expressions, always specify a `job` label to prevent expression mismatches from occuring. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the need to specify
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate on why Afaik, |
||
| This is especially important in multi-tenant systems where the same metric names may be exported by different jobs or the | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think multi-tenant has anything to do with job and instance labels.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As above, "Multi-tenent systems" may not be the best term, but I'm referring to a Prometheus, run as a platform for multiple teams (e.g by a DevEx or Platform Engineering team), to prevent every team running their own siloed Prometheus stack. In such a setup, all PromQL expressions should be scoped with a Or framed another way, in such a centralised stack, always write |
||
| same job (e.g `node_exporter) in multiple, distinct deployments | ||
|
|
||
| * Always specify a `without` clause with the labels you are aggregating away. | ||
| This is to preserve all the other labels such as `job`, which will avoid | ||
| conflicts and give you more useful metrics and alerts. | ||
|
|
||
| ## Aggregation | ||
|
|
||
| * When aggregating up ratios, aggregate up the numerator and denominator | ||
|
|
@@ -40,10 +55,6 @@ Instead keep the metric name without the `_count` or `_sum` suffix and replace | |
| the `rate` in the operation with `mean`. This represents the average | ||
| observation size over that time period. | ||
|
|
||
| * Always specify a `without` clause with the labels you are aggregating away. | ||
|
conallob marked this conversation as resolved.
|
||
| This is to preserve all the other labels such as `job`, which will avoid | ||
| conflicts and give you more useful metrics and alerts. | ||
|
|
||
| ## Examples | ||
|
|
||
| _Note the indentation style with outdented operators on their own line between | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.