promhttp: don't panic when instrumenting with non-exemplar observers#2005
Open
spor3006 wants to merge 1 commit into
Open
promhttp: don't panic when instrumenting with non-exemplar observers#2005spor3006 wants to merge 1 commit into
spor3006 wants to merge 1 commit into
Conversation
InstrumentHandlerDuration and InstrumentHandlerCounter unconditionally
type-asserted their observer/counter to ExemplarObserver / ExemplarAdder
when an exemplar was provided. This panicked at request time if the
caller passed a SummaryVec — a valid prometheus.ObserverVec whose
underlying *summary does not implement ObserveWithExemplar, because
summaries cannot carry exemplars in the Prometheus exposition format.
Switch to the safe-cast + fallback pattern already used by
Timer.ObserveDurationWithExemplar (prometheus/timer.go): if the
observer/counter does not implement the exemplar interface, drop the
exemplar and record the value with the plain Observe/Add path. No
public API change.
Adds three regression tests:
- TestMiddlewareAPI_SummaryWithExemplars: exact panic reproduction
from the upstream issue, through InstrumentHandlerDuration.
- TestObserveWithExemplar_NonExemplarObserverFallsBack: unit-level
contract for the helper.
- TestAddWithExemplar_NonExemplarAdderFallsBack: same, for the
counter helper.
Fixes prometheus#1258
Signed-off-by: Sparshal Kothari <41056517+spor3006@users.noreply.github.com>
3928966 to
eb1ce2c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
InstrumentHandlerDurationandInstrumentHandlerCounterpanic at request time when theirObserverVec/CounterVecis backed by a type that does not implementExemplarObserver/ExemplarAdder. The most common trigger is aSummaryVec, which is a validprometheus.ObserverVecbut whose underlying*summarydoes not implementObserveWithExemplar— summaries cannot carry exemplars in the Prometheus exposition format.Reproduction from #1258:
panics on the first request with:
Why
The unsafe type assertion was inconsistent with the rest of the codebase.
Timer.ObserveDurationWithExemplar(prometheus/timer.go:72) has handled the same situation correctly for years using a safe-cast + fallback:The two helpers in
promhttp/instrument_server.go(observeWithExemplar,addWithExemplar) didobs.(prometheus.ExemplarObserver)without theokcheck, turning a documented exemplar-not-supported case into a runtime panic in the request path.Changes
observeWithExemplar/addWithExemplaruse the safe-cast + fallback pattern. If the observer/counter implements the exemplar interface, the exemplar is attached; otherwise the exemplar is silently dropped and the value is recorded with the plainObserve/Addpath. No public API change.Timer.ObserveDurationWithExemplarfor consistency.[BUGFIX]entry under Unreleased.Tests
Three regression tests in
instrument_server_test.go:TestMiddlewareAPI_SummaryWithExemplars— exact issue reproduction:InstrumentHandlerDuration(summaryVec, …, WithExemplarFromContext(…))+ one HTTP request. Fails onmainwith the panic; passes here.TestObserveWithExemplar_NonExemplarObserverFallsBack— unit-level contract for the helper.TestAddWithExemplar_NonExemplarAdderFallsBack— same for the counter helper.All three are stable under
go test -count=10 -race.Verification
go test ./prometheus/promhttp/...— passes (existing + new).go test -race ./prometheus/promhttp/...— passes.go vet ./prometheus/promhttp/...— clean.gofmt— clean.Note:
go test ./prometheus/collectors/...fails on this branch and onmainbecause Go 1.25 added runtime metrics (htmlmetacontenturlescape,httpcookiemaxnum,urlmaxqueryparams) that aren't in theexpectedMetricsfixture. Pre-existing, unrelated to this fix.Trade-offs considered
promhttphas no logger. Silent fallback matchesTimer.ObserveDurationWithExemplar. The new behavior is documented in the godoc.*summaryimplementExemplarObserver. Rejected: the Prometheus exposition format does not support exemplars on summaries, so accepting them would either silently drop at write time (worse — bug moves to a less visible site) or fabricate a different code path. Keeping the interface as the contract is the right design.Closes #1258.
/cc @bwplotka @kakkoyun @vesari