Skip to content

fix: Skip landing page registration on NewLandingPage error#2937

Open
carterpewpew wants to merge 1 commit into
kubernetes:mainfrom
carterpewpew:fix/newlandingpage-error-not
Open

fix: Skip landing page registration on NewLandingPage error#2937
carterpewpew wants to merge 1 commit into
kubernetes:mainfrom
carterpewpew:fix/newlandingpage-error-not

Conversation

@carterpewpew
Copy link
Copy Markdown

@carterpewpew carterpewpew commented Apr 27, 2026

What this PR does / why we need it:

When web.NewLandingPage() returns an error, the handler value is nil and the API is telling us not to use it. The previous code logged the error but still called mux.Handle("/", landingPage), registering a nil *web.LandingPageHandler. Because a nil pointer stored in an http.Handler interface is non-nil in Go, the default mux accepts it silently. The first GET / then panics when ServeHTTP dereferences the nil receiver (e.g. reading routePrefix).

This change:

  • Adds a newLandingPage function parameter to buildTelemetryServer() and buildMetricsServer() so the landing page factory is injected per call. Production code passes web.NewLandingPage; tests can inject a failing factory without mutable global state or data races.
  • Returns the mux after logging, without registering the / handler, when the factory returns an error. The rest of the server (metrics, health, pprof, etc.) continues to work and the process does not crash on the root path if landing page construction fails.
  • Adds focused tests for both the happy path (landing page registered, / returns 200) and the error path (landing page skipped, / returns 404, other routes like /metrics and /healthz still work). All tests are safe to run with t.Parallel() and pass go test -race.

How does this change affect the cardinality of KSM: does not change cardinality

Which issue(s) this PR fixes: N/A

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: carterpewpew
Once this PR has been reviewed and has the lgtm label, please assign dgrisonnet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 27, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @carterpewpew!

It looks like this is your first PR to kubernetes/kube-state-metrics 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kube-state-metrics has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Instrumentation Apr 27, 2026
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 27, 2026
@carterpewpew carterpewpew force-pushed the fix/newlandingpage-error-not branch from 6b039b3 to 8f01c12 Compare April 27, 2026 04:05
@carterpewpew carterpewpew changed the title Skip landing page registration on NewLandingPage error fix: Skip landing page registration on NewLandingPage error Apr 27, 2026
@mrueg mrueg requested a review from Copilot May 12, 2026 10:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents kube-state-metrics from registering a nil landing page handler when web.NewLandingPage() returns an error, avoiding a runtime panic on requests to / while keeping other endpoints functional.

Changes:

  • Return early (after logging) from buildTelemetryServer() and buildMetricsServer() when landing page creation fails, skipping mux.Handle("/", landingPage).
  • Add tests for the landing page success path and a regression-style test illustrating the nil-handler panic scenario.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/app/server.go Returns the mux without registering / when landing page construction fails.
pkg/app/server_test.go Adds landing page tests and a regression-style panic test for a nil landing page handler.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/app/server_test.go Outdated
Comment on lines +993 to +1004
func TestNilLandingPageHandlerPanicsOnRequest(t *testing.T) {
t.Parallel()
// Simulate the unfixed bug: web.NewLandingPage returns (nil, err) and
// the nil *LandingPageHandler is registered on the mux without an early
// return. In Go, a nil concrete pointer assigned to an interface is
// non-nil, so mux.Handle succeeds. But ServeHTTP dereferences the nil
// receiver (h.routePrefix), causing a panic.
var nilHandler *web.LandingPageHandler

mux := http.NewServeMux()
mux.Handle("/", nilHandler)

Comment thread pkg/app/server_test.go
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 13, 2026
@carterpewpew carterpewpew force-pushed the fix/newlandingpage-error-not branch from c31cc82 to d39d4f3 Compare May 13, 2026 13:29
@mrueg mrueg requested a review from Copilot May 13, 2026 14:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pkg/app/server_test.go:1063

  • Same issue as above: overriding the global newLandingPage can run concurrently with other t.Parallel() tests in this package and lead to data races/flakes. Use synchronization to guard overrides (or refactor to inject the landing page factory per server build) so tests remain safe under parallel execution and go test -race.
func TestBuildMetricsServerLandingPageError(t *testing.T) {
	original := newLandingPage
	t.Cleanup(func() { newLandingPage = original })
	newLandingPage = func(_ web.LandingConfig) (*web.LandingPageHandler, error) {
		return nil, fmt.Errorf("injected landing page error")
	}

Comment thread pkg/app/server.go Outdated
Comment on lines +78 to +79
// Overridable in tests to simulate NewLandingPage failures.
var newLandingPage = web.NewLandingPage
Comment thread pkg/app/server_test.go Outdated
Comment on lines +1007 to +1012
func TestBuildTelemetryServerLandingPageError(t *testing.T) {
original := newLandingPage
t.Cleanup(func() { newLandingPage = original })
newLandingPage = func(_ web.LandingConfig) (*web.LandingPageHandler, error) {
return nil, fmt.Errorf("injected landing page error")
}
When web.NewLandingPage returns an error, the previous code logged it
but still registered a nil *LandingPageHandler for "/", which can panic
on GET / because ServeHTTP dereferences the receiver.

Accept the landing page factory as a parameter in buildTelemetryServer
and buildMetricsServer so tests can inject a failing factory without
mutable global state or data races. Return the mux without registering
the root handler when the factory fails.

Add tests that inject a failing factory and assert "/" returns 404 while
other routes (/metrics, /healthz) continue to serve 200. Add happy-path
tests that verify "/" returns 200 under normal conditions.

Signed-off-by: Jathavedhan M <jathavedhan.m@ibm.com>
@carterpewpew carterpewpew force-pushed the fix/newlandingpage-error-not branch from d39d4f3 to b12fbce Compare May 13, 2026 14:21
@mrueg mrueg requested a review from Copilot May 13, 2026 14:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread pkg/app/server.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

3 participants