Skip to content

Issues foundation hardening: dependency graph, vocabulary, structure#828

Merged
nadaverell merged 9 commits into
feat/issues-uifrom
feat/issues-hardening
May 30, 2026
Merged

Issues foundation hardening: dependency graph, vocabulary, structure#828
nadaverell merged 9 commits into
feat/issues-uifrom
feat/issues-hardening

Conversation

@nadaverell
Copy link
Copy Markdown
Contributor

@nadaverell nadaverell commented May 30, 2026

Pure-refactor pass on the Issues stack from the architecture review — no behavior change (the precision/recall + public-contract work lives in #811). Tightens the seams the reviewer flagged so the layer is a clean foundation for the Applications / Packages surfaces to build on. Stacked on #811 (feat/issues-ui); retarget to main once that merges.

Dependency graph

  • pkg/resourceid (new, leaf): ResourceKey + GroupForBuiltinKind moved out of pkg/audit. subject/issues no longer depend on audit for identity; audit re-exports (callers unchanged). Audit is no longer the identity foundation.
  • pkg/conditions (new, leaf): the transient-reason vocabulary + the genuine-failure carve-out moved out of pkg/packages. issues/k8s depend on the neutral leaf, not on package-inventory; the two transient narrowings (Flux detector + CRD noise-floor) collapse to one shared IsInProgressForIssues.
  • types.go depuredFilters/CEL moved to options.go; the domain model (Issue/Severity/Source/Category/Ref) is now a pure leaf with no CEL import.
  • IssueID → generic subject.StableID (subject produces subject identity; issue-id = subject+category is composed by the issues layer). ResolveSubject lost its unused podMeta param.

Vocabulary + dedup

  • Problem type → Detection (gopls semantic rename) — kills the Problem/Issue synonym; helper structs + Detect* funcs correctly untouched.
  • category GroupCategoryGroup (collided with Issue.Group = API group).
  • One issues.SeverityRank — deleted two byte-identical clones (server + MCP).
  • One conditions.FindFalseCondition — collapsed three copies (CAPI closure, Flux readyFalseCondition, the issues reader).

Structure (god-file splits, matching the clean issues/ files)

  • problems.godetection.go + split by detector into capi.go / gitops.go, with a layer-framing doc: this is the detector layer (emits raw Detections); internal/issues is the layer above (classify/group); dashboard + MCP also consume it — the generalized successor to the v0 "problems" feature, not a parallel surface.
  • issues.go (657 lines) splitcompose.go / source_conditions.go / normalize.go / dedupe.go / filters.go.
  • navigateToResourceList extracted in App.tsx (Audit + Issues had a verbatim-identical navigate callback).

Not included (deliberate)

  • A5 (one Detection constructor for the Age/Duration quartet): skipped — the 24 sites have genuine variation (separate age/duration, no-age HPA/CronJob, age-only Node, +crash/owner pods); a single constructor is a leaky abstraction and the clean subsets already have one.
  • R2 (Provider → per-source interfaces) and R10 (Affected → kind-keyed map, a wire change): deferred as judgment-calls, not clear wins.

go build ./..., make test, and make tsc all green.


Note

Medium Risk
Large file moves and renames across issues/k8s/pkg with stated no-behavior-change intent; mistaken drift in compose/dedupe or condition suppression would affect live issue lists and IDs.

Overview
Foundation refactor for the Issues pipeline: split the monolithic issues.go / problems.go into focused files (compose, dedupe, filters, normalize, source_conditions; detection + capi / gitops), and move compose options/CEL types to options.go so domain types stay a leaf.

Vocabulary and shared libs: k8s.ProblemDetection; category rollup GroupCategoryGroup (vs Issue.Group API group); subject.IssueIDsubject.StableID with identity keys from new pkg/resourceid (ResourceKey, GroupForBuiltinKind, audit re-exports); transient/false-condition logic centralized in pkg/conditions (FindFalseCondition, IsInProgressForIssues) replacing duplicated readers and internal/issues/conditions.go; GitOps Flux path uses shared FindFalseCondition instead of local readyFalseCondition.

Dedup: exported issues.SeverityRank replaces identical server/MCP helpers; grouping uses it for sort/representative choice.

Small product fix: /.well-known/oauth-protected-resource and oauth-authorization-server return 404 so MCP clients don’t treat SPA HTML as OAuth metadata.

Reviewed by Cursor Bugbot for commit c2cd097. Bugbot is set up for automated code reviews on this repo. Configure here.

- Rename category `type Group` → `CategoryGroup` (collided with Issue.Group = API group).
- Export `issues.SeverityRank`; delete the byte-identical `composeSeverityRank`
  (server) + `mcpComposeSeverityRank` (mcp) clones, route both at it.
- `subject.IssueID(scope,key,category)` → generic `subject.StableID(scope,key,discriminator)`
  — subject produces subject identity; issue-id (subject+category) is composed by
  the issues layer. Byte layout unchanged (no re-key).
- Drop the unused `podMeta metav1.Object` param from `subject.ResolveSubject`
  (made the API look Pod-coupled; it wasn't used).
- Document `PodOwnerResolver`'s RS→Deployment hash-strip as a label-free
  APPROXIMATION (wrong for Argo Rollout / CRD-owned ReplicaSets) — use the
  cache-aware resolver when a cache is available.
The domain model (Issue/Severity/Source/Category/Ref) no longer imports
internal/filter; the CEL alias + Filters live in options.go. HTTP/MCP own CEL
compilation; the data types stay a pure leaf.
The `Problem` struct was a near-synonym of `Issue` (the classified/grouped public
type), forcing readers to learn an arbitrary distinction between two layers of one
pipeline. Renamed the transport-neutral detector-output type to `Detection` (it IS
a raw detection a detector emits), consumed by the issues classifier + dashboard +
MCP. Semantic rename via gopls — the `Problem` FIELD in NodeProblem/HPAProblem/
CronJobProblem helper structs and the Detect*Problems function names are correctly
untouched; no wire/JSON change. Resolves the Problem/Issue vocabulary collision.
Audit should not be the identity foundation. Moved ResourceKey +
GroupForBuiltinKind into a dependency-free leaf pkg/resourceid; pkg/audit now
re-exports them (existing bp.ResourceKey callers unchanged), and pkg/subject +
internal/issues depend on resourceid directly instead of pkg/audit.
…n logic

Package inventory (pkg/packages) shouldn't own generic condition-state semantics
for issues/GitOps/CRDs. Moved the transient-reason set + the genuine-failure
carve-out (ArtifactFailed/ChartNotReady) into a dependency-free leaf
pkg/conditions, exposing IsTransientConditionReason (health display) and
IsInProgressForIssues (transient MINUS genuine failures, for issue detectors).

- pkg/packages re-exports IsTransientConditionReason (existing callers unchanged).
- internal/issues drops its local genuineFailureReason map + pkg/packages dep,
  uses conditions.IsInProgressForIssues.
- internal/k8s Flux detector's hand-rolled isGitOpsInProgressReason replaced by
  conditions.IsInProgressForIssues — the Flux detector and the CRD noise-floor
  now share ONE in-progress predicate instead of two lists that could drift (A2).
…seCondition)

Collapsed three copies of "walk status.conditions, find a False entry" — the
verbatim closure in DetectCAPIProblems, the narrow readyFalseCondition in the
Flux detector, and internal/issues.FindFalseCondition — into a single
conditions.FindFalseCondition (v1beta2-fallback aware, condTypes-parameterized).
internal/issues/conditions.go deleted; the CAPI + Flux detectors + the generic
CRD fallback all route through the one reader. Added pkg/conditions tests.
problems.go was the v0 all-in-one "problems" feature; in the issues architecture
its real role is the DETECTOR LAYER — it reads the cache and emits []Detection
(raw findings) that internal/issues classifies + groups, and that the dashboard +
MCP also consume. The name and 1024-line god-file structure hid that.

- Renamed problems.go → detection.go (matches the Detection type; P1 follow-through).
- Split by detector to match the clean single-purpose files in internal/issues:
  detection.go (Detection type + the core DetectProblems workload/pod/service/etc
  detector + shared helpers), capi.go (DetectCAPIProblems), gitops.go
  (DetectGitOpsProblems + Argo/Flux helpers). Siblings health.go/missing_refs.go/
  scheduling.go were already separate detectors.
- Added a layer-framing doc on Detection: detectors → classify/group (issues) →
  render; this is the bottom tier, not a parallel surface to issues.
Mirrors the detector-layer split: one 657-line file → cohesive single-purpose
files matching the rest of the package.
- compose.go: Provider/ComposeStats + Compose/ComposeWithStats (the pipeline).
- source_conditions.go: the generic CRD .status.conditions source + curated-kind
  exclusion + transient suppression.
- normalize.go: Detection→Issue (fromProblem/classifyIssue/resolveGroup).
- dedupe.go: evidence-level dedup (scheduling-over-problem, workload-over-child).
- filters.go: severity/kind + cluster-scoped-access filters + SeverityRank.
No logic change; pure reorganization to cut the high-conflict god-file.
…avigate

The Audit and Issues render sites carried verbatim-identical inline
onNavigateToResource callbacks (kindToPlural → setSelectedResource → rebuild
searchParams → navigate to /resources/<plural>). Extracted one
navigateToResourceList useCallback; both pass it. (Topology/Workload/Helm
navigate sites differ and are left as-is.)
@nadaverell nadaverell requested a review from hisco as a code owner May 30, 2026 17:14
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c2cd097. Configure here.

Comment thread internal/k8s/gitops.go
continue
}
_, reason, msg, since, ok := conditions.FindFalseCondition(obj, "Ready")
if !ok || conditions.IsInProgressForIssues(reason) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flux detector silently suppresses three new failure reasons

Medium Severity

detectFluxProblems replaced its old narrow isGitOpsInProgressReason (7 reasons) with conditions.IsInProgressForIssues (10 reasons), silently adding Creating, Issuing, and Waiting as suppressed in-progress reasons. Flux objects with these condition reasons were previously surfaced as issues and will now be dropped. The comment still says "NARROW in-progress set" but the set is wider than before. In a PR that claims "no behavior change," this widens the suppression window for the Flux detector.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c2cd097. Configure here.

@nadaverell nadaverell merged commit c2cd097 into feat/issues-ui May 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant