Skip to content

fix: scope SDD spec reviewer to task diff#1595

Open
YOMXXX wants to merge 1 commit into
obra:devfrom
YOMXXX:fix/sdd-spec-reviewer-diff-scope
Open

fix: scope SDD spec reviewer to task diff#1595
YOMXXX wants to merge 1 commit into
obra:devfrom
YOMXXX:fix/sdd-spec-reviewer-diff-scope

Conversation

@YOMXXX
Copy link
Copy Markdown

@YOMXXX YOMXXX commented May 21, 2026

What problem are you trying to solve?

Issue #1538 reports that spec compliance reviewers in subagent-driven-development can take 20-33 minutes because the spec reviewer prompt tells the reviewer to read the implementation code but does not provide a git range or file scope. In current dev, code-quality-reviewer-prompt.md receives BASE_SHA and HEAD_SHA, while spec-reviewer-prompt.md does not; that leaves the spec reviewer to discover changed files by crawling the broader repository.

This PR addresses the diff-scope part of #1538 only. It does not change model selection guidance; #1496 is already exploring the broader subagent model-selection problem.

What does this PR change?

Adds a Git Range to Review section to skills/subagent-driven-development/spec-reviewer-prompt.md, including BASE_SHA, HEAD_SHA, and explicit git diff --stat / git diff commands. The prompt now tells the spec reviewer to read only files in that diff and explicitly lists broad codebase reads outside the diff range as a thing not to do.

It also adds a static regression test for this prompt shape and wires that test into the Claude Code skill test runner.

Is this change appropriate for the core library?

Yes. This is a general-purpose correction to the core SDD review prompt. It applies to any project using subagent-driven-development, does not add a third-party dependency, does not add a new harness, and does not introduce project-specific behavior.

What alternatives did you consider?

  1. Change model selection guidance so spec reviewers use cheaper models. I did not include that here because feat: add subagent-model-reconciliation skill #1496 already covers model selection more broadly, and bundling it would make this PR less reviewable.
  2. Add a persistent validator/team-mode flow. subagent-driven-development: phantom completion fix + persistent validator pool (Team Mode) #578 already explores that much larger design; this PR intentionally keeps the standard SDD flow and only gives the existing spec reviewer the same kind of git range the code quality reviewer already receives.
  3. Require per-requirement citations. That could help phantom completion, but it is a different behavioral change and is also explored in subagent-driven-development: phantom completion fix + persistent validator pool (Team Mode) #578. This PR only scopes reviewer discovery to the task diff.

Does this PR contain multiple unrelated changes?

No. The prompt edit and the static test both cover one issue: the spec compliance reviewer needs an explicit task diff boundary.

Existing PRs

I searched all open and closed PRs for:

  • 1538 slow review agents spec reviewer git range
  • spec reviewer diff scope BASE_SHA HEAD_SHA
  • subagent-driven-development spec compliance reviewer git diff
  • subagent review model selection spec reviewer

No PR dedicated to adding a git diff range to spec-reviewer-prompt.md was found. #578 is a much broader persistent validator/team-mode proposal. #613 hardens review-skipping gates. #1496 addresses model reconciliation. This PR is narrower than all three.

Environment tested

Harness (e.g. Claude Code, Cursor) Harness version Model Model version/ID
Local shell / Claude Code test files Claude Code CLI 2.1.141 present; tests run via bash on macOS N/A Static regression tests only

New harness support (required if this PR adds a new harness)

Not applicable. This PR does not add or modify harness support.

Evaluation

Initial prompt/context: user asked me to patrol existing PRs, identify high-priority issues, and continue with the highest-value actionable issue. I selected #1538 because it describes a concrete observed SDD slowdown and current dev still shows the asymmetry: code quality reviewer has a git range, spec reviewer does not.

Baseline before the change:

bash tests/claude-code/test-subagent-driven-development-static.sh

Result before editing spec-reviewer-prompt.md: 7 failures. The prompt lacked ## Git Range to Review, BASE_SHA, HEAD_SHA, the two git diff commands, the changed-file scope rule, and the outside-diff prohibition.

After the change:

bash tests/claude-code/test-subagent-driven-development-static.sh

Result: 7/7 assertions passed.

Additional verification:

bash tests/claude-code/test-worktree-path-policy.sh
bash -n tests/claude-code/test-subagent-driven-development-static.sh tests/claude-code/run-skill-tests.sh tests/claude-code/test-worktree-path-policy.sh
git diff --cached --check && git diff --check

All commands exited 0. I also tried bash tests/claude-code/run-skill-tests.sh --test test-subagent-driven-development-static.sh; it failed before reaching the test because macOS lacks GNU timeout by default. I did not change the runner for that pre-existing local environment issue because it is unrelated to #1538.

Live adversarial agent sessions after the change: 0. This is a narrow prompt-template guard with a RED/GREEN static regression test; I am not claiming multi-session behavioral pressure-test coverage.

Rigor

  • If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing (paste results below)
  • This change was tested adversarially, not just on the happy path
  • I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) without extensive evals showing the change is an improvement

I used the superpowers:writing-skills process for the RED/GREEN shape of this edit: identify the observed baseline failure from #1538, write a failing static regression test before changing the prompt, then make the minimum prompt change needed for the regression to pass. I did not complete live adversarial pressure testing; the unchecked boxes above are intentional rather than omitted.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

The complete staged diff was shown before submission. The human partner previously instructed me to treat shown diffs as reviewed for these PRs unless they say otherwise.

@YOMXXX
Copy link
Copy Markdown
Author

YOMXXX commented May 21, 2026

Reviewer note

This is a narrow follow-up from the issue triage for #1538. The PR only addresses the spec-reviewer diff-scope gap: code-quality-reviewer-prompt.md already receives BASE_SHA / HEAD_SHA, but spec-reviewer-prompt.md did not, which forces the reviewer to discover changed files by crawling the repo.

What changed: add a Git Range to Review section to the spec reviewer prompt, require git diff --stat / git diff, and explicitly scope reads to files in that diff. A static regression test locks this prompt shape in.

What this intentionally does not change: model selection guidance, persistent validator/team mode, citations, or review gate semantics. Those are covered by broader related PRs (#1496, #578, #613) and would make this PR much harder to review.

Verification: the new static regression failed on current dev before the prompt edit and passes after it. test-worktree-path-policy.sh, shell syntax checks, and diff whitespace checks also pass. The full run-skill-tests.sh --test ... wrapper was not used on my macOS machine because it depends on GNU timeout, which is not installed by default; the underlying static test itself passes directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant