Skip to content

fix: prevent duplicate cluster-scoped metrics with --namespaces#2923

Open
vigneshakaviki wants to merge 1 commit into
kubernetes:mainfrom
vigneshakaviki:fix/duplicate-cluster-scoped-metrics
Open

fix: prevent duplicate cluster-scoped metrics with --namespaces#2923
vigneshakaviki wants to merge 1 commit into
kubernetes:mainfrom
vigneshakaviki:fix/duplicate-cluster-scoped-metrics

Conversation

@vigneshakaviki
Copy link
Copy Markdown

@vigneshakaviki vigneshakaviki commented Apr 13, 2026

Summary

Fixes #2878

When using --namespaces to filter by specific namespaces, cluster-scoped resources (nodes, PVs, namespaces, clusterroles, etc.) were creating one store per namespace. Since cluster-scoped resources ignore the namespace parameter in the list/watch call, each store watched all objects, producing N duplicate copies of every metric (where N = number of specified namespaces).

This caused Prometheus warnings like:

Error on ingesting samples with different value but same timestamp

Root Cause

buildStores() iterates over b.namespaces to create one store per namespace. This is correct for namespace-scoped resources (pods, services, deployments, etc.) but wrong for cluster-scoped resources (nodes, PVs, namespaces, clusterroles, etc.) which don't have a namespace dimension.

Fix

Added buildClusterScopedStores() that always creates a single store with NamespaceAll, bypassing the per-namespace loop. Updated all cluster-scoped resource build functions to use it:

  • nodes
  • persistentvolumes
  • storageclasses
  • namespaces
  • clusterroles
  • clusterrolebindings
  • mutatingwebhookconfigurations
  • validatingwebhookconfigurations
  • volumeattachments
  • certificatesigningrequests
  • ingressclasses

Namespace-scoped resources continue using buildStoresFunc as before.

Local Verification (minikube, --namespaces=default,kube-system --resources=nodes)

Before (stock binary) — duplicate metrics

Each node metric appears 2 times (once per namespace), identical values:

$ curl -s http://localhost:18080/metrics | grep "^kube_node_info{"
kube_node_info{node="minikube",kernel_version="6.17.0-20-generic",os_image="Debian GNU/Linux 12 (bookworm)",container_runtime_version="docker://29.2.0",kubelet_version="v1.35.0",kubeproxy_version="deprecated",provider_id="",pod_cidr="10.244.0.0/24",system_uuid="2e7c71de-c33c-4303-bc33-ee8bdb38a39b",internal_ip="192.168.49.2"} 1
kube_node_info{node="minikube",kernel_version="6.17.0-20-generic",os_image="Debian GNU/Linux 12 (bookworm)",container_runtime_version="docker://29.2.0",kubelet_version="v1.35.0",kubeproxy_version="deprecated",provider_id="",pod_cidr="10.244.0.0/24",system_uuid="2e7c71de-c33c-4303-bc33-ee8bdb38a39b",internal_ip="192.168.49.2"} 1

$ curl -s http://localhost:18080/metrics | grep "^kube_node_created{"
kube_node_created{node="minikube"} 1.770696357e+09
kube_node_created{node="minikube"} 1.770696357e+09

After (patched binary) — no duplicates

Each node metric appears exactly once:

$ curl -s http://localhost:18080/metrics | grep "^kube_node_info{"
kube_node_info{node="minikube",kernel_version="6.17.0-20-generic",os_image="Debian GNU/Linux 12 (bookworm)",container_runtime_version="docker://29.2.0",kubelet_version="v1.35.0",kubeproxy_version="deprecated",provider_id="",pod_cidr="10.244.0.0/24",system_uuid="2e7c71de-c33c-4303-bc33-ee8bdb38a39b",internal_ip="192.168.49.2"} 1

$ curl -s http://localhost:18080/metrics | grep "^kube_node_created{"
kube_node_created{node="minikube"} 1.770696357e+09

Namespace-scoped resources unaffected

Pods are still correctly filtered per namespace with zero duplicates:

$ curl -s http://localhost:18080/metrics | grep "^kube_pod_info{" | grep -oP 'namespace="[^"]*"' | sort | uniq -c
      1 namespace="default"
      7 namespace="kube-system"

Test Plan

  • All existing unit tests pass (only pre-existing TestCronJobStore failure on main)
  • Build compiles cleanly
  • End-to-end verified on minikube as shown above

…mespaces

When using --namespaces to filter by specific namespaces, cluster-scoped
resources (nodes, PVs, namespaces, clusterroles, etc.) were creating one
store per namespace. Since cluster-scoped resources ignore the namespace
parameter, each store watched all objects, producing N duplicate copies
of every metric (where N = number of specified namespaces).

Add buildClusterScopedStores() that always creates a single store with
NamespaceAll, and use it for all cluster-scoped resource build functions:
nodes, persistentvolumes, storageclasses, namespaces, clusterroles,
clusterrolebindings, mutatingwebhookconfigurations,
validatingwebhookconfigurations, volumeattachments,
certificatesigningrequests, and ingressclasses.

Fixes kubernetes#2878

Signed-off-by: Vignesh <kumarvignesh295@gmail.com>
@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Apr 13, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vigneshakaviki
Once this PR has been reviewed and has the lgtm label, please assign dgrisonnet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 13, 2026
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 13, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @vigneshakaviki!

It looks like this is your first PR to kubernetes/kube-state-metrics 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kube-state-metrics has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Instrumentation Apr 13, 2026
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 13, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Invalid commit message issues detected

Invalid commit messages

Keywords which can automatically close issues and hashtag(#) mentions are not allowed.

  • 95bccf0 fix: prevent duplicate metrics for cluster-scoped resources with --namespaces

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes duplicate metric series emitted for cluster-scoped resources when --namespaces is set by ensuring cluster-scoped collectors create only a single store/reflector (watching NamespaceAll) instead of one per provided namespace.

Changes:

  • Updated cluster-scoped resource store builders to use a new buildClusterScopedStores() helper.
  • Added buildClusterScopedStores() which always creates exactly one store using NamespaceAll, avoiding per-namespace duplication.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/store/builder.go
Comment on lines +526 to +539
metricFamilies = generator.FilterFamilyGenerators(b.familyGeneratorFilter, metricFamilies)
composedMetricGenFuncs := generator.ComposeMetricGenFuncs(metricFamilies)
familyHeaders := generator.ExtractMetricFamilyHeaders(metricFamilies)

store := metricsstore.NewMetricsStore(
familyHeaders,
composedMetricGenFuncs,
)
if b.fieldSelectorFilter != "" {
klog.InfoS("FieldSelector is used", "fieldSelector", b.fieldSelectorFilter)
}
listWatcher := listWatchFunc(b.kubeClient, v1.NamespaceAll, b.fieldSelectorFilter)
b.startReflector(expectedType, store, listWatcher, useAPIServerCache, objectLimit, b.kubeClient)
return []cache.Store{store}
Comment thread internal/store/builder.go
Comment on lines +516 to +519
// buildClusterScopedStores creates a single store for cluster-scoped resources
// (e.g., nodes, PVs, namespaces, clusterroles). Unlike buildStores, this
// always watches all objects regardless of the --namespaces flag, preventing
// duplicate metrics when multiple namespaces are specified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

Cluster scope metrics is exposed duplicately when using --namespaces cli option

3 participants