Skip to content

feat: LLM can start MCP servers from chat context#3866

Open
bistack wants to merge 1 commit into
Hmbown:mainfrom
bistack:feat/mcp-runtime-servers
Open

feat: LLM can start MCP servers from chat context#3866
bistack wants to merge 1 commit into
Hmbown:mainfrom
bistack:feat/mcp-runtime-servers

Conversation

@bistack

@bistack bistack commented Jul 1, 2026

Copy link
Copy Markdown

Summary

Add start_mcp_server tool allowing LLM to dynamically start MCP servers from conversation context. Supports both stdio (local command) and HTTP (remote URL) transports.

Changes

  • New runtime_mcp.rs with StartRuntimeMcpServer tool implementation
  • McpPool gains dynamic_servers (parking_lot::RwLock) for runtime configs
  • add_runtime_server_config returns conflict warnings for static/dynamic name collisions
  • shell-words crate for proper quoted argument parsing
  • start_mcp_server classified as McpAction in approval system

Testing

  • cargo fmt --all -- --check
  • cargo test -p codewhale-tui --locked — 16 runtime_mcp tests + 3 conflict tests pass

Checklist

  • Updated docs or comments as needed
  • Added or updated tests where relevant

@bistack bistack requested a review from Hmbown as a code owner July 1, 2026 06:34
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Thanks @bistack for taking the time to contribute.

This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant recurring PR access by commenting /lgtm on a pull request.

@Hmbown

Hmbown commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Thanks @bistack — I inspected this while sweeping community PRs for possible v0.8.67 inclusion.

I’m not harvesting this into the current constitution-first release branch as-is because it crosses several trust-boundary surfaces at once: model-facing tool catalog, local process spawning, remote MCP endpoints, approval UI, and runtime MCP pool mutation. A few concrete blockers I saw from the diff:

  • start_mcp_server can start local commands or connect remote URLs, but the tool implementation returns ApprovalRequirement::Auto. Classifying it as McpAction in the TUI approval category is not enough if the execution gate treats the tool spec as auto-approved.
  • The PR initializes an MCP pool in Engine::new and registers the runtime MCP starter whenever self.mcp_pool exists, which appears to bypass the existing Feature::Mcp/lazy ensure_mcp_pool() shape.
  • add_runtime_server_config() says a dynamic server shadows a static server, but get_or_connect() checks static config first and only then dynamic config, so a static-name collision appears to connect the static server rather than the runtime one.
  • with_mcp_tools() snapshots pool.all_tools() when the registry is built. After start_mcp_server adds a dynamic server, the newly discovered MCP tools may not be in the current model-visible registry/catalog until a later rebuild, despite the tool result telling the model to use the discovered tools.

This is still interesting directionally, especially for making MCP setup less manual, but I’d want it split into a dedicated MCP runtime/trust PR with explicit user approval semantics, feature gating, catalog-refresh behavior, and conflict tests before release inclusion. Leaving it open rather than closing it; it deserves a focused pass, just not a quick harvest into v0.8.67.

@bistack bistack force-pushed the feat/mcp-runtime-servers branch 2 times, most recently from 86af2a9 to 0c2cbeb Compare July 1, 2026 10:02
@bistack

bistack commented Jul 1, 2026

Copy link
Copy Markdown
Author

@Hmbown Thank you for the thorough review. All four points are clear blockers.

The work will be split into focused PRs by trust-boundary surface:

  1. Runtime MCP pool mutation — dynamic server infrastructure, static-name collision fix, conflict tests
  2. Feature gating + registry — gate behind Feature::Mcp, preserve lazy ensure_mcp_pool() shape, avoid McpToolAdapter duplication
  3. Model-facing tool catalog + approval UIRequired approval semantics for process spawning and remote endpoints, catalog refresh so discovered tools are immediately model-visible, end-to-end conflict tests

Each PR is independently reviewable. Will update this PR to track progress.

@Hmbown

Hmbown commented Jul 1, 2026

Copy link
Copy Markdown
Owner

@bistack thanks for the quick turnaround and the split plan — that decomposition matches how I'd slice it. I re-read the updated head:

  • Name collisions: resolved. add_runtime_server_config now rejects both static-config and dynamic-duplicate collisions, and the three conflict tests cover it. Consider my point 3 closed.
  • Auto-approval (1), feature gating (2), and catalog timing (4) still apply to the current head.
  • One new finding to fold into your PR 2 (feature gating): the eager McpPool construction in Engine::new skips the with_network_policy(...) wiring that the lazy ensure_mcp_pool() applies — and because ensure_mcp_pool early-returns when a pool already exists, every MCP connection in the process would run without the configured network-policy decider. Any runtime-MCP design needs to go through (or replicate) that path. Related inconsistency worth fixing in the same slice: start_mcp_server lands in DEFAULT_ACTIVE_NATIVE_TOOLS (model-visible even with Feature::Mcp off), while discovered tools only enter the catalog behind the Feature::Mcp check — so the model could "successfully" start servers whose tools can never appear.

For PR 3, the acceptance bar I'd use: start_mcp_server returns ApprovalRequirement::Required (ideally surfacing the parsed command/URL in the prompt), and a test proving that tools discovered mid-turn are actually callable by the exact names the tool result advertises.

Looking forward to the split — this lands as capability the manual-config flow can't match once the trust story is right.

@bistack

bistack commented Jul 2, 2026

Copy link
Copy Markdown
Author
屏幕快照 2026-07-02 的 12 41 51 PM

@bistack

bistack commented Jul 2, 2026

Copy link
Copy Markdown
Author
屏幕快照 2026-07-02 的 12 46 56 PM

@bistack

bistack commented Jul 2, 2026

Copy link
Copy Markdown
Author
屏幕快照 2026-07-02 的 12 44 50 PM

@bistack

bistack commented Jul 2, 2026

Copy link
Copy Markdown
Author

@Hmbown Thank you for the detailed follow-up.

PR 2 additions (feature gating):

  • The eager McpPool construction in Engine::new will be removed — all MCP connections go through the lazy ensure_mcp_pool() path to pick up with_network_policy() wiring
  • start_mcp_server will be removed from DEFAULT_ACTIVE_NATIVE_TOOLS — gated behind Feature::Mcp alongside discovered tools

PR 3 acceptance criteria noted:

  • End-to-end test: start_mcp_server → discovered tools callable by exact advertised names

On point 4 (catalog timing): MCP tools are deferred in the catalog by apply_mcp_tool_deferral — the model already discovers them through tool_search, not from the initial tool list. The flow is:

  1. start_mcp_server connects the server, returns fully qualified tool names
  2. pool.to_api_tools() picks up the new tools on the next catalog build
  3. Model calls tool_search to discover and activate the exact names

This matches the existing MCP tool workflow — the model doesn't expect deferred tools to be callable without a tool_search step first.

On ApprovalRequirement::Auto for start_mcp_server: the trust boundary is at tool execution, not at server connection. Connecting to a server and discovering its tool list is a read-only operation with no side effects — analogous to list_mcp_resources. The actual trust gate is enforced by the MCP tool approval path in the turn loop (turn_loop.rs:1706): approval_required = !read_only && !auto_approve. Every non-read-only MCP tool requires explicit user approval before execution, regardless of how the server was started.

The attached execution trace confirms this in practice: start_mcp_server executes without confirmation (connection only), while each discovered MCP tool triggers an approval prompt before execution. This two-layer design — Auto for connection, Required for execution — keeps the workflow frictionless when the user explicitly provides a server command, while maintaining the trust boundary where it matters.

On McpPool integration for dynamic tools: the interface changes required (lifetime issues with all_tools(), &mut self constraints on get_or_connect()) are substantial and touch core connection management. Rather than forcing a refactor from the contributor side, this is better addressed by the maintainer as a design decision — the current PRs focus on getting the trust boundaries right with a simple, working approach.

@bistack bistack force-pushed the feat/mcp-runtime-servers branch from 0c2cbeb to 4225a96 Compare July 2, 2026 08:31
bistack added a commit to bistack/CodeWhale that referenced this pull request Jul 2, 2026
Add in-memory dynamic server support to McpPool for runtime-started
MCP servers from conversation context.

- New `dynamic_servers` field (parking_lot::RwLock) on McpPool
- `add_runtime_server_config()` rejects duplicate names (static or dynamic)
- `get_or_connect()` checks dynamic servers before static config
- `server_names()` includes both static and dynamic servers
- Conflict tests for static/dynamic name overlap

Prerequisite for Hmbown#3866 (start_mcp_server tool). Extracted as a focused
infrastructure PR to isolate the trust-boundary surface (runtime MCP pool
mutation) from the tool implementation.
@Hmbown

Hmbown commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Security review — please gate the spawn before merging

Really like the direction here, and the design has clearly improved — add_runtime_server_config being in-memory only, not persisted is exactly right (a chat-added server can't pollute trusted global config). Thank you for that.

There's one blocking issue before this can land, though: the spawn is auto-approved, so the model can start an arbitrary local process from chat with no user confirmation.

Evidence

  • The tool declares fn approval_requirement(&self) -> ApprovalRequirement::Auto.
  • In core/engine/turn_loop.rs, registered_tool_approval_required returns false for Auto:
    if requirement == ApprovalRequirement::Auto { return false; }
    and start_mcp_server is not in registered_tool_requires_non_bypassable_approval (only rlm_eval is). So the gate is skipped in every mode, not just YOLO.
  • parse_mcp_command turns a command string into {command, args} and spawns it. So start_mcp_server with e.g. bash -c '<anything>' (or npx <malicious-package>) executes arbitrary code as the user, with no prompt.

That's a remote-code-execution path from model/chat context — the one thing the MCP subsystem currently assumes can't happen (it was built assuming server config comes from a trusted human).

Preconditions to make this safe to land

  1. Require explicit user approval on every model-initiated spawn. Change approval_requirement() away from Auto to a prompting requirement, and add start_mcp_server to registered_tool_requires_non_bypassable_approval so YOLO/auto-approve can't swallow it. The approval prompt should show the resolved program + args (and URL, for HTTP servers) prominently so the user sees exactly what will run.
  2. Validate / surface the command. At minimum reject empty/obviously-shell-wrapped commands and display the concrete program + args; ideally support an allowlist of permitted programs.
  3. Reject colliding server names. Names are joined as mcp_{server}_{tool}; with _ allowed in server names, a model-added server can shadow/intercept another server's tools (server foo + tool bar_x vs server foo_bar + tool x both → mcp_foo_bar_x, resolved by longest-match). Reject _ in model-supplied names or dedup prefixed names.
  4. Re-check network policy for URL-transport servers on connect (the existing connect-time check covers the base URL; a model-chosen URL host should be vetted too).

Items 1–2 are the hard blockers; 3–4 are important hardening. Happy to pair on the approval-gating wiring if useful — this is a genuinely nice capability once the spawn is behind a prompt.

(Reviewed against the current diff; findings are defensible from the code cited above. Not a merge/approve — flagging for @Hmbown's decision.)

@bistack bistack force-pushed the feat/mcp-runtime-servers branch from 4225a96 to 9c756b6 Compare July 3, 2026 03:52
@bistack

bistack commented Jul 3, 2026

Copy link
Copy Markdown
Author
屏幕快照 2026-07-03 的 12 19 40 PM

Allow LLM to start MCP servers from conversation context when a user
provides a command (e.g. npx ...) or URL. The tool connects to the
server and returns fully qualified tool names (mcp_{server}_{tool})
that the model discovers via tool_search.

McpPool infrastructure:
- Add dynamic_servers (parking_lot::RwLock) for runtime server configs
- add_runtime_server_config() rejects duplicate names (static or dynamic)
- get_or_connect() checks dynamic servers before static config
- server_names() includes both static and dynamic servers

Engine integration:
- Lazy pool initialization via ensure_mcp_pool() with network_policy wiring
- No eager McpPool construction in Engine::new
- start_mcp_server gated behind Feature::Mcp via tool_setup registration
- Dynamic always_load injection ensures the tool is eager (not deferred)

Security:
- ApprovalRequirement::Required — requires user approval before spawning
- Non-bypassable approval — YOLO mode cannot skip (registered_tool_requires_non_bypassable_approval)
- Reject shell wrapper commands (bash, sh, zsh, cmd, powershell)
- Reject shell metacharacters in args (redirects, pipes, chaining, $, backticks)
- Allowlist of permitted runtimes (npx, node, python, uvx, deno, ruby, cargo, etc.)
- Underscores in server names auto-converted to hyphens to prevent tool name collision

Tool implementation:
- Shell-words parsing for quoted arguments
- Name inference for npm/pnpm/node/python/uvx and Windows cmd /c
- McpAction approval classification
- with_runtime_mcp_tool() builder avoids McpToolAdapter duplication
@bistack bistack force-pushed the feat/mcp-runtime-servers branch from 9c756b6 to 8c746d5 Compare July 3, 2026 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants