Skip to content

feat: v0.8.67 constitution-first setup — model-assisted onboarding, runtime posture, cleanup#3861

Merged
Hmbown merged 132 commits into
mainfrom
claude/v0.8.67-constitution-setup-174rj9
Jul 2, 2026
Merged

feat: v0.8.67 constitution-first setup — model-assisted onboarding, runtime posture, cleanup#3861
Hmbown merged 132 commits into
mainfrom
claude/v0.8.67-constitution-setup-174rj9

Conversation

@Hmbown

@Hmbown Hmbown commented Jun 30, 2026

Copy link
Copy Markdown
Owner

v0.8.67 — constitution-first setup, plus an overnight quality pass

Cargo.toml stays at 0.8.66; no tag/publish/release here.

The release: constitution-first setup

First launch is a ritual, not a config form — choose a model, let it help draft the constitution it will live under, read and ratify. Nothing becomes law until you confirm.

  • Setup wizard: unified state model, /setup entry, resume/skip/retry, provider-readiness card, runtime-posture card (work mode separated from approval policy), secret-free verification report + doctor integration.
  • User constitution: guided structured creator, model-assisted drafting behind an untrusted-JSON → sanitize → bound → preview → explicit ratify gate, keep-existing completion for updating users, /constitution as the primary management surface.
  • Cleanup: WHALE.md discovery retired (ignored + migration-only), legacy flash auto-router / dormant tab subsystem / stale project_doc loader / dead fleet+MCP+execpolicy+QA helpers removed.

Constitution as mechanism (new)

Repo-local .codewhale/constitution.json protected_invariants may now carry path globs + action (ask|block), compiled into non-bypassable, tighten-only write holds in the tool gate — the law mechanically stops a violating write, with a receipt naming the invariant. The user-global constitution stays advisory prose and never reaches enforcement.

Blockers

Overnight quality pass

  • Approval prompt redesigned: calm/compact by default, red reserved for genuinely critical actions; agent tool properly classified.
  • Startup/render perf (v0.8.67: Launch is slow; profile and remove startup inefficiency #3757): boot janitors off the first-frame path, single-scan thread recovery, no per-boot task-store rewrite, @mention no longer re-walks per keystroke, startup milestone tracing.
  • Both model drafters (fleet + constitution) moved off the event loop — the UI never freezes while drafting.
  • v0.8.67: Ship configured-provider route manager for /provider and /model #3830 missing-auth route switch now opens /provider at that provider's key entry.
  • UX copy: no more misleading green failure toasts, debug/jargon leaks, or issue numbers in --help; actionable empty states.
  • Website: law-thesis hero + animated terminal demo, /constitution and /models pages, community hub + non-goals, docs dark mode.

Verification

cargo test -p codewhale-tui 5686 passed / 0 failed · codewhale-config 342 passed · cargo fmt --check · git diff --check · jq locales · RUSTFLAGS="-D warnings" --no-run clean · release build clean · scripts/v0867-setup-qa.sh 9/9 · web/ npm run build 40 pages clean.

A five-lens adversarial review (34 agents) ran over the branch; every confirmed finding was fixed (including a self-caught safety-floor gap).

🤖 Generated with Claude Code

claude added 3 commits June 30, 2026 22:44
Add the safety layer under every v0.8.67 setup step: atomic file writes
(temp + rename, 0o600) and a SetupTransaction that stages multiple file
writes and either fully applies or fully rolls back to each file's
pre-commit state, so a partial failure never leaves a half-written file.
Preview reports intended writes without touching disk; a dropped (uncommitted)
transaction changes nothing.

Also add redact_secrets(), a dependency-free backstop that masks
secret-bearing keyed assignments (api_key/token/password/...) and bare
opaque-token prefixes so config text echoed into reports, logs, or
diagnostics never leaks credentials.

Tests cover atomic replace, owner-only perms, preview-writes-nothing,
commit-applies-all, rollback on partial failure (restore + remove created
files), and redaction across TOML/JSON/env shapes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016pe1SXnGCNdjqFPA2DFcYf
Add the single setup-state record every v0.8.67 step (#3404-#3412) reads
and writes, persisted as an atomic JSON sidecar (setup_state.json) under
$CODEWHALE_HOME via the transactional persistence layer.

The model defines the shared vocabulary so /setup, doctor, and the context
report never invent their own meanings:

- SetupStep ids (language, provider_model, trust_sandbox, tools_mcp,
  hotbar, remote_runtime, constitution, verification).
- StepStatus enum (not_started/recommended/optional/deferred/in_progress/
  verified/needs_action/failed/skipped) + per-step StepEntry carrying a
  required flag, a safe summary (never secrets), and the writing lane.
- Constitution-first fields: choice, checkpoint_completed_for, language,
  source, validity, preview_hash, preview_version, runtime_posture_source.

Readiness is derived, not persisted: first_run_ready (language verified,
provider/model verified-or-needs-action, posture reviewed, explicit
constitution choice) and update_ready (constitution checkpoint complete for
the lane). derive_inherited() builds a safe inherited state for existing
users with no sidecar so an update never looks like a broken fresh install,
while still re-arming the once-per-version constitution checkpoint. A
missing or corrupt record falls back to None (derive from config) rather
than forcing the wizard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016pe1SXnGCNdjqFPA2DFcYf
…derer (#3793)

Add the guided global constitution's data model and renderer. The normal
output is structured data persisted under $CODEWHALE_HOME
(constitution.json), not a blank Markdown editor.

- UserConstitution: about, working_style, priorities, autonomy_preference,
  bounded free-prose notes, language. All fields optional and
  forward-compatible.
- render_body(): a pure, deterministic function of the struct; render_block()
  wraps it as the model-facing <codewhale_user_constitution> prose block.
- preview_hash(): stable FNV-1a fingerprint over the body, independent of the
  home path, so a preview matches its saved form.
- bounded(): caps free prose and list items in length and count and drops
  blanks; freeform is advisory and never parsed as enforceable policy.
- AutonomyPreference renders as guidance explicitly labeled as NOT changing
  approval policy, sandbox, shell, network, trust, MCP permissions, or
  default mode. There is no code path from this module to runtime config;
  applying posture stays owned by #3406.
- load_from()/save_to() classify Missing/Empty/Unreadable/Invalid/Loaded into
  ConstitutionValidity for the setup-state record; save persists the bounded
  form atomically.

A guard test asserts the persisted file carries no runtime-control keys.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016pe1SXnGCNdjqFPA2DFcYf
The preview_hash_is_independent_of_source_path test discarded the
#[must_use] Option returned by render_block, which fails the CI test build
under -D warnings (surfaced first on windows-latest). Bind and assert on the
result, which also strengthens the test: rendering with a source path embeds
that path in the block yet leaves the body-derived preview hash unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016pe1SXnGCNdjqFPA2DFcYf
Hmbown added 2 commits June 30, 2026 16:57
Load the structured user-global constitution from CODEWHALE_HOME and render it as its own model-facing system block using UserConstitution::render_block(). Missing or empty files stay silent, while invalid or unreadable files are skipped with prompt-layer warnings instead of poisoning the prompt.

The rendered block uses the stable user-global source label rather than a device-specific home path, and lands after the base/project context layer but before volatile environment data.

Tests cover valid injection and invalid-file fallback.
Add a localized /setup modal shell backed by the shared SetupState model, including step navigation, status rendering, inherited-state derivation, and the v0.8.67 constitution checkpoint action. Interactive launches for existing users now auto-open the checkpoint when onboarding is complete and the versioned checkpoint is still incomplete; headless/noninteractive paths are untouched.

Choosing bundled/default constitution records constitution_checkpoint_completed_for = 0.8.67 and creates no custom constitution file. Cancel only closes the modal; staged skip/retry state remains in memory unless an explicit commit event is emitted.

Refs #3404.

Refs #3794.

Tests: cargo fmt --all -- --check

Tests: git diff --check

Tests: cargo test -p codewhale-tui --bin codewhale-tui --locked setup -- --nocapture

Tests: cargo test -p codewhale-tui --bin codewhale-tui --locked localization::tests -- --nocapture

Tests: RUSTFLAGS="-D warnings" cargo test -p codewhale-tui --bin codewhale-tui --locked --no-run
Hmbown added 2 commits June 30, 2026 17:27
Scope the SetupWizardView state accessor to tests so the non-test TUI binary stays clean under CI's workspace clippy and release smoke gates with -D warnings.
Add the /constitution command surface with status, preview, repo-local law, AGENTS explanation, setup-step routing, and bundled/default selection actions.

Respect setup-state bundled/deferred/expert override choices when loading the user-global constitution block so stale custom law is not injected after a bundled/default selection. Also keep the once-per-version setup checkpoint out of --skip-onboarding/headless PTY harnesses.

Tests run:

- cargo fmt --all -- --check

- git diff --check

- jq empty crates/tui/locales/en.json crates/tui/locales/zh-Hans.json

- cargo test -p codewhale-tui --bin codewhale-tui --locked constitution -- --nocapture

- cargo test -p codewhale-tui --bin codewhale-tui --locked localization::tests -- --nocapture

- cargo test -p codewhale-tui --test qa_pty --locked -- --nocapture

- RUSTFLAGS="-D warnings" cargo test -p codewhale-tui --bin codewhale-tui --locked --no-run

- cargo clippy --workspace --all-features --locked -- -D warnings -A clippy::uninlined_format_args -A clippy::too_many_arguments -A clippy::unnecessary_map_or -A clippy::collapsible_if -A clippy::assertions_on_constants

Refs #3806 #3794 #3404
Hmbown added 2 commits June 30, 2026 18:02
Add compact provider/model and runtime posture cards to the constitution-first setup shell. Enter records safe SetupState summaries for inherited provider/model readiness or confirmed runtime posture without changing provider credentials, default model, approval policy, sandbox, shell, trust, or network config.

The runtime posture card explicitly keeps enforced config separate from constitution guidance, and localized copy covers the new card rows and status messages in English and zh-Hans.

Tests run:

- cargo fmt --all -- --check

- git diff --check

- jq empty crates/tui/locales/en.json crates/tui/locales/zh-Hans.json

- cargo test -p codewhale-tui --bin codewhale-tui --locked setup -- --nocapture

- cargo test -p codewhale-tui --bin codewhale-tui --locked localization::tests -- --nocapture

- cargo test -p codewhale-tui --test qa_pty --locked -- --nocapture

- RUSTFLAGS="-D warnings" cargo test -p codewhale-tui --bin codewhale-tui --locked --no-run

- cargo test -p codewhale-config --lib

- cargo clippy --workspace --all-features --locked -- -D warnings -A clippy::uninlined_format_args -A clippy::too_many_arguments -A clippy::unnecessary_map_or -A clippy::collapsible_if -A clippy::assertions_on_constants

Refs #3405 #3406 #3736
Stop loading project and global WHALE.md files as prompt context. The loader now treats them as ignored migration markers, keeps them in cache invalidation, and surfaces the migration warning through /constitution and context report diagnostics without reading the legacy body.

Update configuration docs and regression tests to describe AGENTS.md plus .codewhale/constitution.json as the supported migration path.

Refs #3798. Also advances #3806 and #3411 diagnostics.

Verified locally: cargo fmt --all -- --check; git diff --check; cargo test -p codewhale-tui --bin codewhale-tui --locked project_context -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked constitution -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked context_report -- --nocapture; RUSTFLAGS="-D warnings" cargo test -p codewhale-tui --bin codewhale-tui --locked --no-run; cargo test -p codewhale-config --lib; cargo test -p codewhale-tui --test qa_pty --locked -- --nocapture; cargo clippy --workspace --all-features --locked -- -D warnings -A clippy::uninlined_format_args -A clippy::too_many_arguments -A clippy::unnecessary_map_or -A clippy::collapsible_if -A clippy::assertions_on_constants.
Hmbown added 3 commits June 30, 2026 18:45
Wire the final setup verification step to the shared SetupState model so /setup report records a reviewed status, shows first-run/update readiness, and points to the next setup action.

Expose the same setup-state vocabulary through doctor and doctor --json, deriving inherited state when setup_state.json is absent and honoring persisted state when it exists.

Tests: cargo fmt --all -- --check; git diff --check; jq empty crates/tui/locales/en.json crates/tui/locales/zh-Hans.json; cargo test -p codewhale-tui --bin codewhale-tui --locked setup -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked doctor_setup -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked localization::tests -- --nocapture; RUSTFLAGS="-D warnings" cargo test -p codewhale-tui --bin codewhale-tui --locked --no-run; cargo test -p codewhale-config --lib; cargo test -p codewhale-tui --test qa_pty --locked -- --nocapture; cargo clippy --workspace --all-features --locked -- -D warnings -A clippy::uninlined_format_args -A clippy::too_many_arguments -A clippy::unnecessary_map_or -A clippy::collapsible_if -A clippy::assertions_on_constants

Refs #3411
Add a contextual G action to the constitution setup step that scaffolds a balanced structured user-global constitution, records GuidedCustom/source/validity/hash/version in SetupState, and persists constitution.json together with setup_state.json through SetupTransaction.

Keep runtime posture untouched; the generated autonomy preference is model guidance only and does not alter approval, sandbox, shell, network, trust, or MCP permissions.

Tests: cargo fmt --all -- --check; git diff --check; jq empty crates/tui/locales/en.json crates/tui/locales/zh-Hans.json; cargo test -p codewhale-tui --bin codewhale-tui --locked setup -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked localization::tests -- --nocapture; RUSTFLAGS="-D warnings" cargo test -p codewhale-tui --bin codewhale-tui --locked --no-run; cargo test -p codewhale-config --lib; cargo test -p codewhale-tui --test qa_pty --locked -- --nocapture; cargo clippy --workspace --all-features --locked -- -D warnings -A clippy::uninlined_format_args -A clippy::too_many_arguments -A clippy::unnecessary_map_or -A clippy::collapsible_if -A clippy::assertions_on_constants

Refs #3793. Refs #3806.
Document /constitution as the primary personal constitution surface, with structured user-global constitution data rendered to prose, repo-local law as optional project policy, AGENTS.md as project instructions, and memory/handoffs below those layers.

Mark the full base-prompt Markdown override as expert-only and keep WHALE.md as deprecated migration-only guidance. Update README translations and the web constitution page so the docs map has localized coverage.

Tests: git diff --check; npm --prefix web run check:docs; npm --prefix web run lint; cargo fmt --all -- --check

Refs #3811. Refs #3412. Refs #3806.
Hmbown added 3 commits June 30, 2026 19:42
Issue #3859 reported that the running-command hint sounded like Bash job control and set the wrong expectation for long foreground shell waits.

Rename the user-facing Ctrl+B affordance around moving the active shell wait into /jobs, update localized help copy and keybinding docs, and keep the model-facing detach result pointed at exec_shell_wait for follow-up inspection.

Tests run:

- cargo fmt --all -- --check

- git diff --check

- jq empty crates/tui/locales/en.json crates/tui/locales/es-419.json crates/tui/locales/ja.json crates/tui/locales/pt-BR.json crates/tui/locales/vi.json crates/tui/locales/zh-Hans.json

- cargo test -p codewhale-tui --bin codewhale-tui --locked exec_cell_renders_live_shell_output_before_final_output -- --nocapture

- cargo test -p codewhale-tui --bin codewhale-tui --locked test_exec_shell_foreground_can_move_to_background -- --nocapture

- cargo test -p codewhale-tui --bin codewhale-tui --locked localization::tests -- --nocapture

- RUSTFLAGS="-D warnings" cargo test -p codewhale-tui --bin codewhale-tui --locked --no-run

- cargo test -p codewhale-config --lib
Issue #3841 identified four ignored full-engine mock LLM placeholders that only ended in unreachable!("ignored") while the real blocker is the missing Engine Arc<dyn LlmClient> seam.

Delete the placeholders and keep the blocker as README/module documentation so the integration test reports real executable coverage instead of dormant future-test outlines.

Tests run:

- cargo fmt --all -- --check

- git diff --check

- cargo test -p codewhale-tui --test integration_mock_llm --locked -- --nocapture

- cargo check -p codewhale-tui --all-targets --locked

- RUSTFLAGS="-D warnings" cargo test -p codewhale-tui --bin codewhale-tui --locked --no-run

- cargo test -p codewhale-config --lib

- rg "#\[ignore\]|unreachable!\(\"ignored\"\)|engine_full_" crates/tui/tests/integration_mock_llm.rs crates/tui/tests/README.md
Issue #3840 identified project_doc.rs as a replaced project-instruction loader path whose only live consumer was find_git_root for project_context.rs.

Move the git-root helper beside its remaining project-context callers, remove the unused module wiring, and delete the stale discovery/loading API plus its self-contained tests. Project-context keeps the active AGENTS.md/constitution loading and symlink-safety coverage.

Tests run:

- cargo fmt --all -- --check

- git diff --check

- cargo test -p codewhale-tui --bin codewhale-tui --locked project_context -- --nocapture

- cargo test -p codewhale-tui --bin codewhale-tui --locked project_doc -- --nocapture

- cargo check -p codewhale-tui --all-targets --locked

- RUSTFLAGS="-D warnings" cargo test -p codewhale-tui --bin codewhale-tui --locked --no-run

- cargo test -p codewhale-config --lib

- rg "project_doc::|mod project_doc|read_project_docs|load_from_workspace|ProjectDoc|DOC_FILENAMES|format_instructions|discover_paths" crates/tui/src crates/tui/tests
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Reviewing PR #3861 (v0.8.67 constitution-first setup)

Working — large PR (100 files, +19.6k/-6.8k across 121 commits vs origin/main). Shallow clone blocked git diff origin/main...HEAD (no merge base locally); using per-file git diff origin/main -- <path> instead. Running deep-dive passes over the diff.

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

View job run

Approval-gate hunt finding: a tool present in the model-facing catalog
(tool_def = Some) but absent from the execution registry — or when the
registry itself is None — fell through every approval branch in the tool
gate. approval_required stayed false, read_only stayed false, no
auto-review classification ran, and it executed with no gate. The
not-available block only fired when tool_def was None.

The block now keys on whether the tool actually has an execution path
(a registry spec, or being MCP/code/js/search) rather than on catalog
presence, so an unrunnable tool is blocked as not-available instead of
running unguarded. Such a tool has no execution path anyway, so this is
not a regression for any runnable tool.

Tests:
- cargo test -p codewhale-tui --bin codewhale-tui --locked engine (273) + turn_loop (40)
- cargo clippy --workspace (CI flags) clean
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 2m 12s —— View job


Reviewing PR #3861 (v0.8.67 constitution-first setup)

5 parallel deep-dive passes running over the diff (setup/constitution correctness, provider/model/route safety, engine/tool cleanup, security enforcement, test coverage).

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

…ack, count image tokens

A correctness hunt over the compaction / token-accounting pipeline found
several real defects; these three are the safe, high-value fixes (the
trigger-unit reconciliation and unbounded-summary-prefix items are noted
for careful review since they change WHEN compaction fires).

- Emergency/preflight compaction dropped all working-set pins. The
  pin-aware manual path passes the pinned message indices + top paths to
  compact_messages_safe, but recover_context_overflow passed None/None —
  and on large windows that emergency path is the one that actually
  fires, so pinned errors, patches, and the files the user is editing
  could be summarized away mid-turn. It now computes and passes the same
  pins/paths as the manual path.
- The cache-aligned summary request replays a non-contiguous message
  subsequence (pinned messages removed from the middle), which can
  violate strict role-ordering (a non-transient InvalidInput). create_summary
  only fell back to the formatted summary on context-window errors, so a
  role-ordering error aborted compaction entirely and let context keep
  growing. It now falls back on ANY cache-aligned failure.
- Inline images (ContentBlock::ImageUrl) estimated to 0 tokens in every
  estimator, undercounting the budget and risking overflow in image-heavy
  sessions. They now carry a conservative flat estimate (erring high
  compacts slightly early rather than overflowing).

Tests:
- cargo test -p codewhale-tui --bin codewhale-tui --locked compaction (68 passed; new inline_image_estimates_nonzero_tokens)
- RUSTFLAGS="-D warnings" --no-run clean; cargo clippy (CI flags) clean
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 2m 19s —— View job


Reviewing PR #3861 (v0.8.67 constitution-first setup)

Working — 5 parallel deep-dive passes running over the diff (setup/constitution correctness, provider/model/route safety, engine/tool cleanup, security enforcement, test coverage). Shallow clone has no merge-base with origin/main; using per-file git diff origin/main -- <path> instead.

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

…e guard, durable rename

Four session-persistence correctness fixes from the hunt:

- Backtrack (Esc-Esc) truncated api_messages by counting role=="user"
  messages, but tool results are stored as role=="user" too — so in any
  turn with tool calls the cut landed mid-turn on a tool_result, leaving a
  dangling assistant tool_use with no matching result (a transcript the
  provider rejects) and diverging the visible history from what the model
  sees. It now counts only messages that yield a HistoryCell::User (the
  same predicate apply_loaded_session uses), via a testable
  backtrack_api_cut_index helper.

- The persistence actor tracked checkpoint-write and checkpoint-clear as
  independent flags and always cleared-then-wrote in the drain, so a
  Checkpoint and a ClearCheckpoint coalesced in one batch would clear
  latest.json and immediately re-write the stale checkpoint — undoing the
  clear and resurrecting a finished turn as a false crash-recovery on next
  launch. Now last-writer-wins: a fresh checkpoint drops a pending clear
  and vice versa.

- /sessions prune deleted every session older than the cutoff with no
  exclusion, so it could delete the active, just-resumed session (whose
  updated_at is stale until its first post-resume save) out from under the
  live TUI. Added prune_sessions_older_than_keeping and passed the active
  session id, mirroring the boot-janitor guard added earlier.

- write_atomic synced the temp file data before rename but never fsynced
  the parent directory after, so a power loss right after the rename could
  lose the new directory entry — silently dropping a crash-recovery
  checkpoint the subsystem exists to preserve. Now best-effort fsyncs the
  parent dir after persist.

Tests:
- backtrack_cut_index_skips_tool_result_user_messages (new)
- session_manager 45, persistence 15, backtrack 22, write_atomic 3 passing
- cargo clippy --workspace (CI flags) clean
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 2m 10s —— View job


Reviewing PR #3861 (v0.8.67 constitution-first setup)

4 parallel deep-dive passes running over the diff (setup/constitution correctness, provider/model/route safety, tool-gate/security, dead-code + test coverage).

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes + test coverage
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Post final review summary

The approval prompt is a bottom-anchored inline band (#3799), not the old
centered full-screen card; an orphaned doc comment from the removed
card-rect helper still described it as a centered takeover. Doc only.

Tests: build only (comment change)
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 2m 30s —— View job


Reviewing PR #3861 (v0.8.67 constitution-first setup)

Working — large PR (100 files, +19,846/-6,819 across 125 commits vs origin/main). Running deep-dive passes over the diff per repo review guidance.

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

The Anthropic Messages path and the OpenAI Responses path each decoded
every raw bytes_stream() chunk individually with
String::from_utf8_lossy(&chunk). reqwest yields bytes at arbitrary
TCP/HTTP2 frame boundaries, so any multi-byte UTF-8 character — CJK,
emoji, accented letter — split across two reads had its tail bytes turned
into U+FFFD in one chunk and its lead bytes into U+FFFD in the next,
permanently garbling the character in both visible text and tool-call
arguments. For a DeepSeek-centric, CJK-heavy user base this triggers
routinely (e.g. "你好" rendering as "你�").

Both paths now buffer raw bytes and decode only COMPLETE lines via a
shared take_sse_line helper — the '\n' delimiter is ASCII and can never
fall inside a multi-byte sequence, so the split char is reassembled
intact. This mirrors the already-correct chat.rs main path.

Tests:
- take_sse_line_preserves_multibyte_split_across_reads (splits '好' mid-char),
  take_sse_line_returns_none_without_newline
- client::tests 148, streaming 148 passing; cargo clippy --workspace (CI flags) clean
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Reviewing PR #3861 (v0.8.67 constitution-first setup)

Working — large PR (100 files, +19,901/-6,829 across 126 commits vs origin/main). Running deep-dive passes over the diff per repo review guidance.

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

View job run

Two more streaming-parse fixes in the main chat SSE path (both no-ops for
the primary DeepSeek path, which sends [DONE]):

- The pending line_buf was flushed only on an SSE blank-line boundary, so a
  provider that ends the stream straight after its last `data:` line — or
  whose final line lacks a trailing newline — had its final delta (last
  tokens, finish_reason, and usage) silently dropped. A post-loop flush now
  processes any leftover byte_buf line + line_buf as a final frame. It is
  gated on a new saw_done flag so post-[DONE] trailing bytes are never
  reprocessed; for DeepSeek ([DONE]) and blank-line-terminated streams the
  flush is a no-op.
- Multiple `data:` fields within one SSE event were concatenated with no
  separator, yielding `{…}{…}` that fails JSON parsing and drops the whole
  frame. They are now joined with '\n' per the SSE spec. No-op for the
  common single-data-line event (line_buf empty when the first is pushed).

Verification: build + cargo clippy --workspace (CI flags) clean; client::tests
148, chat 100, parse_sse 3 passing (no regression). No dedicated end-to-end
streaming test — the crate has no MockServer SSE-stream harness, and both
changes are provably no-ops for the primary [DONE]/blank-line-terminated
paths, only recovering the currently-dropped final frame.
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 3m 5s —— View job


Reviewing PR #3861 (v0.8.67 constitution-first setup)

5 parallel deep-dive passes running over the diff (setup/constitution correctness, provider/model/route safety, engine/tool cleanup, security enforcement, TUI surfaces + test coverage).

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

…, dead timeout

Four MCP transport hardening fixes from a trust-boundary hunt (the MCP
server is third-party; its data flows into the model and the gate):

- SSRF (high): the server-supplied `endpoint` SSE event accepted an
  absolute cross-origin URL, and SseTransport::send then POSTed to it with
  the Bearer/OAuth headers attached — the endpoint host was never re-vetted
  after the one connect-time network-policy check. A malicious MCP server
  could redirect the client's authenticated POSTs to 169.254.169.254 or a
  localhost admin port. resolve_endpoint_url now requires the resolved
  endpoint to be same-origin (scheme+host+port) with the connect URL;
  relative endpoints are same-origin by construction.
- UTF-8 corruption: the SSE loop decoded each raw network chunk with
  String::from_utf8_lossy, garbling any multi-byte char split across reads
  (same class of bug just fixed provider-side, here in the MCP transport).
  It now buffers raw bytes and decodes only complete event blocks via a
  byte-oriented separator finder.
- OOM DoS: the SSE frame buffer grew without bound if a server never emitted
  a separator. Capped at MAX_SSE_FRAME_BYTES (8 MiB).
- Dead timeout: the HTTP client set reqwest's TOTAL .timeout() from
  connect_timeout (10s), silently capping every request at 10s and making
  the per-server execute_timeout/read_timeout dead. Now connect_timeout
  bounds the connect phase and read_timeout is the total backstop.

Tests:
- resolve_endpoint_accepts_relative_and_same_origin,
  resolve_endpoint_rejects_cross_origin_ssrf,
  find_sse_event_separator_bytes_matches_str_and_survives_multibyte
- mcp suite 134 passing; cargo clippy --workspace (CI flags) clean
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Reviewing PR #3861 (v0.8.67 constitution-first setup)

Working — large PR (100 files, +20,092/-6,843 across 128 commits vs origin/main). Running deep-dive passes over the diff per repo review guidance.

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

View job run

Completes the transport-read hardening: a misbehaving or malicious MCP
server could OOM the client at transport-read time (before any
transcript-level spillover), via two unbounded reads:

- Streamable-HTTP used response.text() to slurp the entire body into
  memory. It now streams chunk-by-chunk through read_body_capped and
  aborts past MAX_MCP_RESPONSE_BYTES (16 MiB).
- The stdio transport used the unbounded read_line, so a server emitting a
  newline-free multi-GB "line" would OOM. A new read_line_capped uses
  fill_buf/consume and aborts once a line exceeds the same cap without a
  newline.

Tests:
- reads_a_line_and_reports_eof, aborts_on_newline_free_line_over_cap
- mcp suite 136 passing; cargo clippy (CI flags) clean
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 1m 44s —— View job


Reviewing PR #3861 (v0.8.67 constitution-first setup)

4 parallel deep-dive passes running over the diff (setup/constitution correctness, provider/model/route safety, engine/tool-gate security, client/MCP + test coverage).

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

…omic writes

A hunt over the file-write/patch tools found real, non-adversarial
corruption of the user's source files. Fixes (the gate cross-check
confirmed repo_law's path parsing has no bypass — the weakness was
content-level, exactly these):

- apply_patch dropped the file's trailing newline on EVERY patch and
  flipped CRLF files to LF: it round-tripped content through
  str::lines() (which strips both the trailing \n and each \r) and
  reassembled with join("\n"). Now reassemble_preserving_newlines detects
  the base file's dominant terminator (CRLF vs LF) and trailing-newline
  state and reproduces it, so a patched file keeps its exact line-ending
  style. (The old tests only asserted content.contains(...), never the
  trailing byte, so this went uncaught.)
- apply_patch's fuzz default was MAX_FUZZ (50), not the schema-documented
  3, so a hunk with no fuzz argument could silently apply up to 50 lines
  from its stated position — landing in the wrong region of a file with
  repeated blocks and reporting success. Default is now DEFAULT_FUZZ = 3
  (ceiling stays 50 for explicit opt-in).
- write_file, edit_file, apply_patch, and fim_edit wrote the user's real
  source with plain fs::write (truncate-then-write), so an ENOSPC/EIO/crash
  mid-write left the original truncated. All four — and apply_patch's
  rollback — now use the atomic temp+fsync+rename write_atomic path that
  the session/config/artifact writers already use.

Tests:
- reassemble_preserving_newlines_keeps_style, apply_patch trailing-newline
  and apply_patch_preserves_crlf_line_endings regressions
- apply_patch 47, tools::file 61, fim 4 passing; cargo clippy --workspace (CI flags) clean
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 1m 58s —— View job


Reviewing PR #3861 (v0.8.67 constitution-first setup)

4 parallel deep-dive passes running over the diff (setup/constitution correctness, provider/model/route safety, tool-gate security + dead-code cleanup, engine/turn-loop + test coverage). Shallow clone has no merge-base with origin/main; using per-file git diff origin/main -- <path> instead.

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

The bytes_stream()-based read_body_capped introduced a CI-only failure in
streamable_http_stale_session_reconnects_and_retries_tool_call: that test
drives a raw TCP server using connection-close framing, and the
chunk-by-chunk stream read raced with the peer shutdown differently than
.text() under macOS/Windows CI timing (it passed locally).

Restore .text() for the read (identical connection framing to before) and
enforce the OOM bound with a Content-Length pre-check that rejects an
over-large declared body before it is read into memory. The SSE frame-buffer
cap and stdio line cap — the primary streaming OOM vectors — are unchanged.
Removed the now-unused read_body_capped helper.

Tests: streamable_http_stale_session passes 4/4; mcp suite 136; clippy clean
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Reviewing PR #3861 (v0.8.67 constitution-first setup)

Working — large PR (100 files changed, +20,280/-6,854 across 131 commits vs origin/main). Running deep-dive passes over the diff per repo review guidance.

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

View job run

These hold local-only planning notes (unreleased roadmap) and scratch files
that must not land in git history.
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 14m 51s —— View job


Reviewing PR #3861 (v0.8.67 constitution-first setup)

Review complete — 5 parallel deep-dives over the diff (git diff origin/main HEAD, 169 files, +20.3k/-6.9k), then the most severe findings were independently re-verified by direct code reading before posting.

  • Gather context (repo guidance, diff stat, changed files)
  • Understand request (code review per custom instructions)
  • Review correctness bugs (setup wizard, constitution drafting, repo_law state machine)
  • Review provider/model/route safety (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608 invariant)
  • Review dead-code cleanup + engine/tool changes
  • Review security (secrets, tool-gate enforcement, destructive-command detection)
  • Review test coverage for new modules
  • Post final review summary

Findings (most severe first)

1. [CRITICAL, verified] Interactive-origin destructive commands are not held in YOLO/Bypass mode, contradicting the PR's own blocker-#3883 claim.
crates/tui/src/tui/auto_review.rs:343-358safety_floor() only force-holds Destructive/Secret actions for RunOrigin::Background | RunOrigin::Headless. RunOrigin::Interactive is exempt. detached_start (turn_loop.rs:1728, spec.starts_detached_for(...)) is false for an ordinary synchronous exec_shell call — i.e. RunOrigin::Interactive is the default path a model uses to run a command, not an edge case. Separately, crates/tui/src/tools/shell.rs:2451 wraps its own SafetyLevel::Dangerous check in if !context.auto_approve, which is skipped entirely under YOLO. I confirmed every destructive/device-destroyer test in auto_review.rs (background_device_and_filesystem_destroyers_are_held_by_safety_floor, destroyer_check_resists_prefix_quote_and_pipe_evasions, etc., lines 851-930) constructs RunOrigin::Background — none exercises RunOrigin::Interactive + ApprovalMode::Bypass. Net effect: a foreground YOLO-mode exec_shell call running rm -rf ~/, dd to a device, mkfs, or shred is not held for review by anything — directly contradicting the PR body's "genuinely destructive/publish-like/device-destroying commands ... still hold in every mode." (git push/publish-like commands are safe — ToolActionKind::Publish is held regardless of origin — this is specific to the Destructive/Secret arm.)
Fix this →

2. [HIGH, verified] exec_shell bypasses repo-law write holds entirely.
crates/tui/src/repo_law.rs:31: const WRITE_TOOLS = ["write_file", "edit_file", "apply_patch", "fim_edit"]. repo_law_plan_decision (line 49-51) returns None for any tool not in this list, and exec_shell isn't in it, despite being the most obviously write-capable tool. The module's own doc comment names this exact class of bug (fim_edit was previously missing and had to be added). A repo with {"paths": ["crates/protocol/**"], "action": "block"} can still be written to via exec_shell with printf x > crates/protocol/wire.rs, sed -i, or rm, with no prompt or block — contradicting "non-bypassable... mechanically stops a violating write."
Fix this →

3. [HIGH, verified] MCP Streamable HTTP OOM guard bypassable via chunked transfer-encoding.
crates/tui/src/mcp/streamable_http.rs:121-128 only checks response.content_length(), which reqwest returns None for any chunked/undeclared-length response — a normal server configuration, not exotic. When absent, the guard is skipped and response.text() (line 129) buffers the full body with no cap. This is inconsistent with the genuinely byte-bounded fixes in the same PR for sse.rs (MAX_SSE_FRAME_BYTES streaming cap) and stdio.rs (read_line_capped). A malicious/compromised MCP server can OOM the client over Streamable HTTP simply by omitting Content-Length.
Fix this →

4. [MEDIUM, verified] Four shipped locales silently fall back to English for the entire new setup/constitution wizard.
Key counts confirm the gap: en.json 562 keys, zh-Hans.json 550, but es-419.json 428, pt-BR.json 416 (ja.json/vi.json are similarly behind — all four only got +7/-7 unrelated edits in this diff). None of the ~120+ new SetupWizard*/Approval* keys exist in those four locales, so those users see raw English strings for the first-run ritual. localization.rs's shipped_first_pack_has_no_missing_core_messages test can't catch this — it only flags a key if tr() returns the literal {id:?} debug placeholder, which never happens once en.json defines the key (fallback masks it). Consider a test that diffs each shipped locale's key set against en.json.

5. [MEDIUM, plausible] repo_law uses nearest-file-wins, not a cross-directory merge — a subdirectory's constitution can silently drop a parent repo's invariants. project_context.rs:323-348 (discover_repo_constitution) returns as soon as it finds the first .codewhale/constitution.json walking upward to the git root; it never merges in an ancestor's stronger rules, and a parse failure at the nearest file returns None for the whole lookup rather than falling back to a valid parent. This is consistent with the module's documented "fail open, never a poisoned gate" philosophy for a single file, but means opening a workspace at/below a subdirectory with its own (weaker) constitution silently loses the root's protection — worth confirming this is the intended scope of "tighten-only."

6. [LOW] PR-description mismatch: crates/tui/src/model_registry.rs was not deleted. It's still live (used by model_picker.rs/model_profile.rs); only two already-dead helpers (serving_provider, seeded_model_ids) were trimmed. Not a functional issue, just worth knowing when evaluating the "cleanup" claims.

7. [LOW] WHALE.md "migration" is warning-only, not an actual migration. project_context.rs drops WHALE.md content from ctx.instructions and emits a static WHALE_IGNORED_WARNING telling the user to hand-copy it elsewhere — no code path writes its content into AGENTS.md/constitution.json. Matches "ignored," not "migration-only" as phrased in the PR description; the replacement mechanism (repo_law.rs) itself is fine.

Areas that look solid

  • Provider/model routing invariant (EPIC v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608): thoroughly clean. The model_routing.rs deletion removes genuinely dead, already-superseded flash-router code; the surviving inventory-based resolver never infers provider from a wire-id prefix. The new custom-provider form (provider_picker.rs) correctly persists only api_key_env (never the raw key), and normalize_custom_provider_id rejects names colliding with built-in providers.
  • execpolicy chained-command matching (command_segments/command_is_chained) correctly prevents a trusted prefix from auto-approving a chained destructive suffix, and is well tested.
  • apply_patch.rs atomic writes (temp file + sync_all + rename in the target dir) are genuinely atomic and not symlink-attackable; reassemble_preserving_newlines correctly handles LF/CRLF.
  • sse.rs fixes are real: byte-bounded streaming read, UTF-8 split-char correctness, and a same-origin check on the server-supplied endpoint event closing an SSRF hole.
  • Constitution ratify gate / user_constitution.rs / persistence.rs: thorough test coverage of atomic write/rollback, secret redaction, corrupt-JSON handling, and untrusted-draft sanitization; the ratify gate itself cannot be bypassed.
  • Subagent error-chain surfacing (bug: Codex sub-agents fail with Responses API request failed #3884): well tested — chain, status, and body all asserted, not just happy path.
  • Dead-code removal of the dormant tab subsystem, project_doc.rs, execpolicy/amend.rs, and approval_cache.rs all check out clean — no dangling references found.

Not run in this review (no build/test tool access in this session): cargo test, cargo fmt --check, cargo clippy. The verification log in the PR thread covers these locally; findings above are from direct source reading, not test execution.

@Hmbown Hmbown merged commit 9617281 into main Jul 2, 2026
20 checks passed
Hmbown pushed a commit that referenced this pull request Jul 2, 2026
Resolves the conflicts with the #3861 constitution-first-setup merge:

- @mention completion: main landed the #3757 candidate cache (one
  needle-independent walk serves every keystroke of a non-path-like
  mention token, ranked in memory with a 4s TTL). Kept it as the fast
  path, adapted to the MentionCompletionKey cache form, and kept this
  branch's #3899 background walk for exactly the needles the candidate
  cache bypasses (path-like and browser) — the two fixes compose:
  non-path needles rarely walk at all, and when a walk is needed it no
  longer blocks the UI thread.
- app.rs: kept both structs/fields (MentionCompletionPending from
  #3899, MentionCandidateCache from #3757).
- sidebar.rs: kept both tests added at the same location (this
  branch's tree-sort exactly-once test and main's terminal-row
  collapse test).

cargo test -p codewhale-tui --locked after the merge: 5715 passed,
1 failed (sandbox::tests::test_parity_linux_landlock_available —
pre-existing container-environment failure, kernel lacks Landlock).
@Hmbown Hmbown deleted the claude/v0.8.67-constitution-setup-174rj9 branch July 3, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.