Skip to content

prometheus: add Registry.DisableHelpTextValidation opt-in#2006

Open
spor3006 wants to merge 1 commit into
prometheus:mainfrom
spor3006:fix/1820-relax-help-text-validation
Open

prometheus: add Registry.DisableHelpTextValidation opt-in#2006
spor3006 wants to merge 1 commit into
prometheus:mainfrom
spor3006:fix/1820-relax-help-text-validation

Conversation

@spor3006
Copy link
Copy Markdown

What

Registry.Gather currently returns an error when two collectors emit metrics with the same fully-qualified name but slightly different help text:

collected metric foo {…} has help "X." but should have "X"

The most common trigger is two libraries — or two versions of the same library — registering compatible metrics into the same registry. A trivial difference like a help string gaining or losing a trailing period across a client_golang upgrade turns a metadata mismatch into a hard /metrics outage for the whole binary (see #1820).

The Prometheus server itself is lenient about help-text drift on scrape, so this strict check makes inter-library interop harder than it needs to be.

Design

This PR is strictly additive to the public API and leaves the default behavior unchanged. It adds one new method:

reg := prometheus.NewRegistry()
reg.DisableHelpTextValidation()

When the toggle is enabled, processMetric aggregates later metrics into the family the first metric created and keeps the first help text seen. Other validation (type checks, suffix collisions, dim-hash consistency at Register time, pedantic per-Desc checks) is unchanged.

Why opt-in rather than relaxing the default

The strict default has value: it surfaces accidental help-text changes during development and testing. Silently overwriting a refactored help string under the user is a debuggability footgun. Forcing callers to consciously opt in to the lenient semantics keeps that safety net for everyone who isn't fighting the interop problem.

Pedantic registries

Calling DisableHelpTextValidation on a Registry returned by NewPedanticRegistry has no effect. The pedantic caller has opted into the strictest set of checks; mixing that with lenient help validation would be internally contradictory. The override is computed once per Gather under the existing RLock:

helpTextValidationDisabled := r.helpTextValidationDisabled && !r.pedanticChecksEnabled

What this PR does NOT change

  • Gatherers.Gather (cross-Gatherer merge) and checkDescConsistency (pedantic per-Desc check) are intentionally left strict. They cover different concerns from processMetric's per-fqName check, and changing them is out of scope.
  • Register() still rejects mismatched dim-hashes (which includes help text) at registration time for checked collectors. The toggle only affects what Gather does with metrics that nevertheless reach processMetric with mismatched help — in practice, the unchecked-collector path.

Tests

Four new tests in prometheus/registry_test.go, all stable under go test -count=10 -race:

Test What it pins
TestGatherRejectsHelpTextMismatchByDefault Backwards compat: default Registry still errors on help mismatch.
TestGatherAcceptsHelpTextMismatchWhenDisabled Regression for #1820: with the toggle on, Gather succeeds and aggregates.
TestGatherWithDisabledHelpTextValidationKeepsFirstHelp Pins the documented "first help text wins" guarantee using a single collector for deterministic emit order.
TestPedanticRegistryOverridesHelpTextValidationToggle Pedantic mode ignores the toggle.

Two unexported test helpers are added:

  • helpTextCollector — unchecked collector emitting one metric with configurable fqName/help/label, for the two-collectors-into-one-registry scenarios.
  • orderedHelpTextCollector — single unchecked collector emitting multiple metrics with deterministic source order, for the "first wins" assertion (two separate collectors would race because the registry fans collection out across goroutines).

Verification

  • go test ./prometheus/... — passes (existing + new).
  • go test -race ./prometheus/... — passes.
  • go vet ./prometheus/... — clean.
  • gofmt — clean.

Note: prometheus/collectors and TestGoCollector* continue to fail on this branch and on main because Go 1.25 added/removed runtime metrics (htmlmetacontenturlescape, httpcookiemaxnum, urlmaxqueryparams) that aren't in the expectedMetrics fixture. Pre-existing, unrelated to this fix.

Trade-offs considered

  • Relax the default instead. Rejected on the first iteration after a self-review: a silent default-behavior change to a registry-level contract risks breaking callers who rely on the existing error as a configuration-bug detector, and removes a useful safety net during local development.
  • Add a registry-package Opts struct / new constructor (NewLenientRegistry). Rejected as heavier API surface for a single boolean. A method on *Registry matches the existing NewPedanticRegistry-style pattern: one entry point for the strict mode, one method to opt into the lenient mode.
  • Log a warning instead of silent acceptance. Rejected because the prometheus package does not have a logger in scope here, and threading one through purely for this case would be a much larger change. Documented as the explicit trade-off in the godoc.
  • Also relax Gatherers.Gather / checkDescConsistency. Out of scope. The motivating use case is single-registry interop; cross-Gatherer merging is a separate user contract. Happy to file a follow-up if maintainers think the same reasoning applies there.

Closes #1820.

/cc @bwplotka @kakkoyun @vesari

When two collectors emit metrics with the same fully-qualified name but
slightly different help text — most commonly two libraries (or two
versions of the same library) registering compatible metrics into the
same registry, e.g. a help string gaining or losing a trailing period
across a client_golang upgrade — Registry.Gather currently returns

  collected metric foo {…} has help "X." but should have "X"

turning a metadata difference into a hard /metrics outage for the whole
binary. The Prometheus server itself is lenient about help-text drift on
scrape, so this strict check makes interop between libraries harder than
it needs to be.

Add an opt-in escape hatch:

  reg := prometheus.NewRegistry()
  reg.DisableHelpTextValidation()

When disabled, processMetric aggregates later metrics into the family
that the first metric created, keeping the first help text. The default
behavior is unchanged — strict help-text validation remains the
recommended setting because it surfaces accidental help-text changes
early. Pedantic registries unconditionally override the toggle so that
a pedantic caller never silently gets lenient semantics.

Implementation:

  * Registry gains a private helpTextValidationDisabled field.
  * Gather() snapshots the effective toggle (toggle && !pedantic) under
    the existing RLock so all processMetric calls in one Gather see a
    consistent value.
  * processMetric takes the flag as a new parameter; the rest of its
    behavior (type checks, suffix collisions, pedantic desc-consistency
    checks) is unchanged.
  * Gatherers.Gather (cross-Gatherer merge) and checkDescConsistency
    (pedantic per-Desc check) are deliberately left strict — they cover
    different concerns and changing them is out of scope.

Tests:

  * TestGatherRejectsHelpTextMismatchByDefault locks in
    backwards-compatibility: the strict error is still produced when
    the toggle has not been flipped.
  * TestGatherAcceptsHelpTextMismatchWhenDisabled is the regression
    test for the issue — same fixture as above, with the toggle
    enabled, Gather succeeds.
  * TestGatherWithDisabledHelpTextValidationKeepsFirstHelp uses a
    single collector that emits metrics in deterministic source order
    to pin down the documented "first help text wins" guarantee.
  * TestPedanticRegistryOverridesHelpTextValidationToggle verifies
    that flipping the toggle on a pedantic Registry has no effect.

All four are stable under `go test -count=10 -race`.

Fixes prometheus#1820

Signed-off-by: Sparshal Kothari <41056517+spor3006@users.noreply.github.com>
@spor3006 spor3006 force-pushed the fix/1820-relax-help-text-validation branch from 3d3b81a to c0f198b Compare May 16, 2026 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metric help text validation causes issues with multiple versions

1 participant