-
Notifications
You must be signed in to change notification settings - Fork 878
Python: Add CuaAgentMiddleware for Computer-Use tool #1338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@microsoft-github-policy-service agree company="Cua AI, Inc." |
|
I've also been thinking about how to also support .NET with this integration. Since Agent Framework already has built-in MCP support (see samples), we could create a Python MCP server that wraps Cua's The flow would be: Usage from C#: // Connect to Cua MCP server
await using var mcpClient = await McpClient.CreateAsync(new StdioClientTransport(new()
{
Command = "python",
Arguments = ["-m", "cua.mcp.server"],
}));
var agent = chatClient.CreateAIAgent(
instructions: "You are a desktop automation assistant.",
tools: [.. (await mcpClient.ListToolsAsync()).Cast<AITool>()]
);
await agent.RunAsync("Open Firefox and search for 'Python tutorials'");This approach would:
We have a pending PR for MCP server support on the Cua side (trycua/cua#427). Once that's merged, I can add C# samples and documentation in a follow-up PR or update this one. Thoughts? |
|
Hey @ekzhu - I've addressed your feedback:
Happy to chat this week if helpful! |
|
Hey @ekzhu - I've made some improvements to the API design: Eliminated Dummy Variables - Created Before: # Had to use dummy client
dummy_client = OpenAIChatClient(model_id="gpt-4o-mini", api_key="dummy-not-used")
middleware = CuaAgentMiddleware(
computer=computer,
model="anthropic/claude-sonnet-4-5-20250929",
instructions="You are an assistant.",
)
agent = ChatAgent(chat_client=dummy_client, middleware=[middleware])After: # Clean API with CuaChatClient
chat_client = CuaChatClient(
model="anthropic/claude-sonnet-4-5-20250929",
instructions="You are an assistant.",
)
middleware = CuaAgentMiddleware(computer=computer)
agent = ChatAgent(chat_client=chat_client, middleware=[middleware])Standardized Examples - All samples now default to Linux on Docker (cross-platform), with macOS and Windows options shown as alternatives in comments. Let me know if there's anything else you'd like me to address! |
ekzhu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the new interface! It looks very good and polished. I had some issue running it locally with Docker though -- see my comments.
I am a bit concerned about the package name "cua" being overly broad here. I think it may make sense to rename it to "trycua" or something more specific. Right now, it feels like this is the official computer-use feature of the framework.
Another alternative is to move this package to a module inside agent-framework-lab, we can use the extra cua there.
| # Create Cua chat client with model and instructions | ||
| chat_client = CuaChatClient( | ||
| model="anthropic/claude-sonnet-4-5-20250929", | ||
| instructions="You are a desktop automation assistant. Be precise and careful.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the examples, let's set the instructions through ChatAgent instead. This is to keep it consistent with the rest of the samples in the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the examples, let's set the instructions through
ChatAgentinstead. This is to keep it consistent with the rest of the samples in the repo.
Thanks for flagging this, @ekzhu! CuaAgentMiddleware intercepts the call and drives the run loop, so the chat client never gets a chance to apply its own system message—anything we put there gets ignored. If you need custom guidance, the simplest path is to include it in the prompt you send to agent.run(...); that text is preserved and reaches CUA exactly as written.
| async def main(): | ||
| """Run a basic computer use example with Claude.""" | ||
| # Initialize Cua computer (Linux Docker container) | ||
| async with Computer(os_type="linux", provider_type="docker") as computer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had trouble running this example and it failed right here. I am on WSL Ubuntu and running Docker Desktop.
Traceback (most recent call last):
File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/computer.py", line 493, in run
await self._interface.wait_for_ready(timeout=30)
File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/interface/generic.py", line 817, in wait_for_ready
raise e
File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/interface/generic.py", line 813, in wait_for_ready
await self._wait_for_ready_ws(timeout, interval)
File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/interface/generic.py", line 938, in _wait_for_ready_ws
raise TimeoutError(error_msg)
TimeoutError: Could not connect to localhost after 30 seconds
...
TimeoutError: Could not connect to WebSocket interface at localhost:8000/ws: Could not connect to localhost after 30 seconds
I have already pulled the image, and I tried this even after I manually started the container from Docker Desktop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ekzhu, I'm Adam from the Cua team. I reproduced the sample failure. The culprit is the Docker image name: the code uses trycua/cua-ubuntu:latest, but there’s no Linux/AMD64 manifest for that tag, so the container never starts and the WebSocket wait times out. Pulling trycua/cua-xfce:latest (which is published for AMD64) and tagging it locally as trycua/cua-ubuntu:latest fixes the run.
- Pull the AMD64 image Cua documents for Docker:
docker pull --platform=linux/amd64 trycua/cua-xfce:latest - Create a local tag so the provider can find it:
docker tag trycua/cua-xfce:latest trycua/cua-ubuntu:latest
We’ll update the sample to point at the XFCE image so others don’t hit this.
On the Agent Framework side we’ll also land a tiny fix so CuaChatClient imports and applies @use_chat_middleware; that keeps the middleware hook active even when Cua handles the run loop.
|
Also there is some merge conflict. Looks like uv.lock needs to be regenerated, and the pyproject.toml file needs to be updated -- just accept both changes. |
Thanks so much for the feedback and the notes @ekzhu - super helpful. On the naming:
That said, we definitely don’t want it to appear like an “official” Microsoft Agent SDK package. If avoiding confusion is the main concern, a clean alternative for us could be cua-ai (or cua_ai), which still preserves the project identity while making the separation explicit. Happy to make that change if it aligns better with the project’s conventions. Let me know which direction you’d prefer - we’re flexible as long as the identity remains clear. |
9c14b49 to
80bd9cd
Compare
80bd9cd to
7c2bcee
Compare
|
Hi @markwallace-microsoft, those .NET, workflows, and lab labels were added while I briefly pulled in the wrong files. The PR is back to Python-only now, so could you remove those tags when you get a chance? Thanks! |
| # Create middleware | ||
| middleware = CuaAgentMiddleware(computer=computer) | ||
|
|
||
| # Create agent - no dummy variables needed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: what does it mean by "no dummy variables needed"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @TaoChenOSU, thanks for spotting that. The “no dummy variables needed” comment is leftover from an earlier draft and will be removed.
Pin all CUA Docker samples to trycua/cua-xfce:latest for Windows/x64 support Drop Anthropic instructions field so Claude requests keep working
add Windows, macOS, and Linux quickstarts under samples/getting_started/cua/setup/ refresh the CUA README to link to the new guides and modernize prerequisites
2c5566a to
5c43237
Compare
|
Hi @TaoChenOSU @ekzhu - what're the pending items left on this PR? |
Motivation and Context
This PR implements the integration between Microsoft Agent Framework and Cua as discussed in issue #1095.
Why is this needed?
Implementation approach:
Following @eavanvalkenburg's guidance in #1095, this uses the
ChatMiddlewarepattern rather than implementing Cua as a Tool. This delegates the entire agent loop to Cua while maintaining Agent Framework's orchestration and human-in-the-loop capabilities.Why wrap
ComputerAgentinstead of justComputer?ComputerAgentprovides the complete agent loop (model inference → parsing → computer actions → multi-step execution) with support for 100+ model configurationsComputeris just the low-level tool for executing actions (click, type, screenshot, etc.)ComputerAgent, we get all of Cua's model support for free without reimplementing provider-agnostic parsers for OpenCUA, InternVL, UI-Tars, GLM, etc.Related issue: #1095
Description
This PR adds
agent-framework-cua, a new integration package that providesCuaAgentMiddleware.Key components:
CuaAgentMiddleware- Middleware that intercepts chat requests and delegates to Cua'sComputerAgentcontext.terminate = TrueComputerAgent(supports 100+ models)require_approval,approval_interval)ChatResponseformatType definitions -
CuaModelId,CuaProviderType,CuaOSType, etc. for type safetyExamples:
basic_example.py- Claude Sonnet 4.5 with Linux Dockercomposite_agent_example.py- UI-Tars + GPT-4o composite agentPackage structure - Follows existing integration patterns (
agent-framework-redis,agent-framework-mem0)Architecture:
The chat client becomes a no-op since
CuaAgentMiddlewareterminates middleware execution and returns the response directly from Cua.Technical notes:
cua-agentdependency)chat_clientsince middleware terminates execution before reaching itChatMessage.content→ChatMessage.text/contentsattribute usage in middlewareContribution Checklist