Skip to content

feat: plumb task-configured MCP servers through ACP session/new#589

Open
bingran-you wants to merge 1 commit into
mainfrom
bry/nostalgic-tereshkova-fffb80
Open

feat: plumb task-configured MCP servers through ACP session/new#589
bingran-you wants to merge 1 commit into
mainfrom
bry/nostalgic-tereshkova-fffb80

Conversation

@bingran-you
Copy link
Copy Markdown
Collaborator

@bingran-you bingran-you commented May 31, 2026

Summary

task.toml [[environment.mcp_servers]] was parsed into SandboxConfig.mcp_servers but never reached the agent: both ACP session entry points hardcoded an empty MCP list (mcp_servers=[]), and the vendored McpServerSpec was never constructed anywhere. So a task could not attach any MCP server (e.g. a Playwright MCP) to the agent.

This PR wires the full path task config → ACP session/new so task-declared MCP servers are attached to the agent's session.

task.toml [[environment.mcp_servers]]
  → SandboxConfig.mcp_servers (MCPServerConfig)
  → rollout._task_mcp_specs()            # MCPServerConfig → McpServerSpec
  → connect_acp(mcp_servers=...)
  → ACPClient.session_new / session_load # → session/new "mcpServers" on the wire

Changes

  • acp/types.pyMcpServerSpec becomes wire-correct: adds name (required — every ACP variant carries it) and headers, plus to_new_session_param() which projects the flat spec onto the SDK's discriminated per-transport union (stdio omits type and carries command/args/env; sse/http carry url/headers and keep type).
  • rollout.py_task_mcp_specs() owns the MCPServerConfig → McpServerSpec translation (including the streamable-http → http naming bridge) at the composition root, so acp/ keeps zero dependency on the task-config layer. Threaded into both connect_acp call sites (connect() and connect_as()).
  • acp/client.py / acp/runtime.pysession_new, session_load, and connect_acp accept the spec list. None → empty list, preserving the historical benchmark default.
  • tests/test_acp_mcp_servers.py — covers the per-transport wire projection, the _task_mcp_specs mapping (all three transports + absent config), the session_new/session_load wire output, and the rollout threading into connect_acp.

Design notes

  • Why McpServerSpec (not raw dicts or MCPServerConfig passed down): it's the ACP-layer DTO that decouples the protocol layer from both the task-config shape and the SDK's split McpServerStdio/SseMcpServer/HttpMcpServer union. Passing typed specs through connect_acp/session_new keeps those signatures honest; doing the translation in the rollout keeps acp/ free of any benchflow.task import.
  • session_new uses NewSessionParams.model_validate(...) so the SDK's discriminated union validates each per-transport dict (the constructor's static type rejects a list[dict]).

Scope / limitations

  • Framework-only change. No task bundles touched.
  • stdio servers get env: [] (MCPServerConfig has no env field) — fine for Playwright; a stdio MCP needing secrets would require adding an env field to MCPServerConfig (follow-up).
  • Unblocks Playwright-MCP ("V3") variants of research tasks. Note: whether a given agent runtime honors session/new mcpServers vs only its own config still needs a smoke test before relying on it end-to-end.

Testing

  • uv run --extra dev python -m pytest tests/ — full suite green
  • uv run ty check src/ — clean
  • uv run ruff check . — clean

Open in Devin Review

task.toml [[environment.mcp_servers]] was parsed into
SandboxConfig.mcp_servers but never reached the agent: both ACP session
entry points hardcoded an empty MCP list and the vendored McpServerSpec
was never constructed. This wires the full path so a task can attach MCP
servers (e.g. a Playwright MCP) to the agent's ACP session.

- McpServerSpec gains the wire-correct fields (name, headers) and a
  to_new_session_param() projection onto the SDK's discriminated
  per-transport union (stdio omits type; sse/http carry url+headers).
- rollout._task_mcp_specs() owns the MCPServerConfig -> McpServerSpec
  translation (incl. streamable-http -> http naming) at the composition
  root, keeping acp/ free of any task-config dependency.
- ACPClient.session_new/session_load and connect_acp accept the spec
  list; None preserves the historical empty-list default.
- tests/test_acp_mcp_servers.py covers config->spec mapping, the
  per-transport wire shape, and the rollout threading.

Framework-only change; unblocks Playwright-MCP variants of research tasks.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant