Skip to content

Conversation

@f-trycua
Copy link

@f-trycua f-trycua commented Oct 9, 2025

Motivation and Context

This PR implements the integration between Microsoft Agent Framework and Cua as discussed in issue #1095.

Why is this needed?

  • Provides Agent Framework with 100+ model configurations (OpenAI, Anthropic, OpenCUA, InternVL, UI-Tars, GLM, etc.) without duplicating model-specific parsers
  • Enables desktop automation capabilities across Windows, macOS, and Linux through Cua's virtualization infrastructure
  • Supports composite agents (e.g., "UI-Tars+GPT-4o") combining grounding and planning models
  • Leverages Cua's existing computer-use infrastructure instead of reimplementing it

Implementation approach:
Following @eavanvalkenburg's guidance in #1095, this uses the ChatMiddleware pattern rather than implementing Cua as a Tool. This delegates the entire agent loop to Cua while maintaining Agent Framework's orchestration and human-in-the-loop capabilities.

Why wrap ComputerAgent instead of just Computer?

  • ComputerAgent provides the complete agent loop (model inference → parsing → computer actions → multi-step execution) with support for 100+ model configurations
  • Computer is just the low-level tool for executing actions (click, type, screenshot, etc.)
  • By wrapping ComputerAgent, we get all of Cua's model support for free without reimplementing provider-agnostic parsers for OpenCUA, InternVL, UI-Tars, GLM, etc.
  • This architectural choice means Agent Framework benefits from Cua's ongoing model additions automatically

Related issue: #1095

Description

This PR adds agent-framework-cua, a new integration package that provides CuaAgentMiddleware.

Key components:

  1. CuaAgentMiddleware - Middleware that intercepts chat requests and delegates to Cua's ComputerAgent

    • Completely bypasses the Agent Framework chat client by setting context.terminate = True
    • All model inference is handled by Cua's ComputerAgent (supports 100+ models)
    • Handles message format conversion between Agent Framework and Cua
    • Supports human-in-the-loop approval workflows (require_approval, approval_interval)
    • Transforms Cua results back to Agent Framework ChatResponse format
  2. Type definitions - CuaModelId, CuaProviderType, CuaOSType, etc. for type safety

  3. Examples:

    • basic_example.py - Claude Sonnet 4.5 with Linux Docker
    • composite_agent_example.py - UI-Tars + GPT-4o composite agent
  4. Package structure - Follows existing integration patterns (agent-framework-redis, agent-framework-mem0)

Architecture:

Agent Framework → CuaAgentMiddleware → Cua ComputerAgent
                      ↓                      ↓
                 terminate=True    Model + Computer Loop
                                           ↓
                                       Results
                                           ↓
Agent Framework ← CuaAgentMiddleware ← Cua ComputerAgent

The chat client becomes a no-op since CuaAgentMiddleware terminates middleware execution and returns the response directly from Cua.

Technical notes:

  • Requires Python ≥3.12 (due to cua-agent dependency)
  • Uses dummy chat_client since middleware terminates execution before reaching it
  • Fixed ChatMessage.contentChatMessage.text/contents attribute usage in middleware

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? No

@f-trycua
Copy link
Author

f-trycua commented Oct 9, 2025

@microsoft-github-policy-service agree company="Cua AI, Inc."

@f-trycua f-trycua marked this pull request as ready for review October 9, 2025 21:14
@f-trycua
Copy link
Author

I've also been thinking about how to also support .NET with this integration. Since Agent Framework already has built-in MCP support (see samples), we could create a Python MCP server that wraps Cua's ComputerAgent.

The flow would be:

.NET Agent → MCP Client → stdio → Python MCP Server → Cua ComputerAgent (100+ models)

Usage from C#:

// Connect to Cua MCP server
await using var mcpClient = await McpClient.CreateAsync(new StdioClientTransport(new()
{
    Command = "python",
    Arguments = ["-m", "cua.mcp.server"],
}));

var agent = chatClient.CreateAIAgent(
    instructions: "You are a desktop automation assistant.",
    tools: [.. (await mcpClient.ListToolsAsync()).Cast<AITool>()]
);

await agent.RunAsync("Open Firefox and search for 'Python tutorials'");

This approach would:

  • ✅ Reuse existing MCP infrastructure (no new .NET bindings needed)
  • ✅ Give .NET agents access to all 100+ Cua models
  • ✅ Work cross-language via the MCP protocol

We have a pending PR for MCP server support on the Cua side (trycua/cua#427). Once that's merged, I can add C# samples and documentation in a follow-up PR or update this one. Thoughts?

@f-trycua
Copy link
Author

Hey @ekzhu - I've addressed your feedback:

  • API Key Configuration - Added section explaining setup via environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.)

  • Simplified Exports - Good catch! Removed unused types from __all__, now only exports CuaAgentMiddleware

  • Restructured Samples - Moved to samples/getting_started/cua/ and added workflow_orchestration showing Agent Framework orchestration + Cua execution synergy

  • Other - Added experimental disclaimer (DevUI pattern), Cua docs links, updated to cua-xfce image, clarified unused parameters

Happy to chat this week if helpful!

@f-trycua
Copy link
Author

Hey @ekzhu - I've made some improvements to the API design:

Eliminated Dummy Variables - Created CuaChatClient that properly stores model and instructions configuration. No more need for dummy OpenAIChatClient(model_id="gpt-4o-mini", api_key="dummy-not-used") workarounds.

Before:

# Had to use dummy client
dummy_client = OpenAIChatClient(model_id="gpt-4o-mini", api_key="dummy-not-used")
middleware = CuaAgentMiddleware(
    computer=computer,
    model="anthropic/claude-sonnet-4-5-20250929",
    instructions="You are an assistant.",
)
agent = ChatAgent(chat_client=dummy_client, middleware=[middleware])

After:

# Clean API with CuaChatClient
chat_client = CuaChatClient(
    model="anthropic/claude-sonnet-4-5-20250929",
    instructions="You are an assistant.",
)
middleware = CuaAgentMiddleware(computer=computer)
agent = ChatAgent(chat_client=chat_client, middleware=[middleware])

Standardized Examples - All samples now default to Linux on Docker (cross-platform), with macOS and Windows options shown as alternatives in comments.

Let me know if there's anything else you'd like me to address!

Copy link
Contributor

@ekzhu ekzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the new interface! It looks very good and polished. I had some issue running it locally with Docker though -- see my comments.

I am a bit concerned about the package name "cua" being overly broad here. I think it may make sense to rename it to "trycua" or something more specific. Right now, it feels like this is the official computer-use feature of the framework.

Another alternative is to move this package to a module inside agent-framework-lab, we can use the extra cua there.

# Create Cua chat client with model and instructions
chat_client = CuaChatClient(
model="anthropic/claude-sonnet-4-5-20250929",
instructions="You are a desktop automation assistant. Be precise and careful.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the examples, let's set the instructions through ChatAgent instead. This is to keep it consistent with the rest of the samples in the repo.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the examples, let's set the instructions through ChatAgent instead. This is to keep it consistent with the rest of the samples in the repo.

Thanks for flagging this, @ekzhu! CuaAgentMiddleware intercepts the call and drives the run loop, so the chat client never gets a chance to apply its own system message—anything we put there gets ignored. If you need custom guidance, the simplest path is to include it in the prompt you send to agent.run(...); that text is preserved and reaches CUA exactly as written.

async def main():
"""Run a basic computer use example with Claude."""
# Initialize Cua computer (Linux Docker container)
async with Computer(os_type="linux", provider_type="docker") as computer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had trouble running this example and it failed right here. I am on WSL Ubuntu and running Docker Desktop.

Traceback (most recent call last):
  File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/computer.py", line 493, in run
    await self._interface.wait_for_ready(timeout=30)
  File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/interface/generic.py", line 817, in wait_for_ready
    raise e
  File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/interface/generic.py", line 813, in wait_for_ready
    await self._wait_for_ready_ws(timeout, interval)
  File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/interface/generic.py", line 938, in _wait_for_ready_ws
    raise TimeoutError(error_msg)
TimeoutError: Could not connect to localhost after 30 seconds

...

TimeoutError: Could not connect to WebSocket interface at localhost:8000/ws: Could not connect to localhost after 30 seconds

I have already pulled the image, and I tried this even after I manually started the container from Docker Desktop.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ekzhu, I'm Adam from the Cua team. I reproduced the sample failure. The culprit is the Docker image name: the code uses trycua/cua-ubuntu:latest, but there’s no Linux/AMD64 manifest for that tag, so the container never starts and the WebSocket wait times out. Pulling trycua/cua-xfce:latest (which is published for AMD64) and tagging it locally as trycua/cua-ubuntu:latest fixes the run.

  • Pull the AMD64 image Cua documents for Docker:
    docker pull --platform=linux/amd64 trycua/cua-xfce:latest
  • Create a local tag so the provider can find it:
    docker tag trycua/cua-xfce:latest trycua/cua-ubuntu:latest

We’ll update the sample to point at the XFCE image so others don’t hit this.

On the Agent Framework side we’ll also land a tiny fix so CuaChatClient imports and applies @use_chat_middleware; that keeps the middleware hook active even when Cua handles the run loop.

@ekzhu
Copy link
Contributor

ekzhu commented Oct 24, 2025

Also there is some merge conflict. Looks like uv.lock needs to be regenerated, and the pyproject.toml file needs to be updated -- just accept both changes.

@f-trycua
Copy link
Author

I am a bit concerned about the package name "cua" being overly broad here. I think it may make sense to rename it to "trycua" or something more specific. Right now, it feels like this is the official computer-use feature of the framework.

Thanks so much for the feedback and the notes @ekzhu - super helpful.

On the naming:

  • We own both cua.ai and trycua.com, and Cua is the name of the open-source framework as well as our company, so using the cua package name is intentional and consistent with our ecosystem (CLI, SDKs, cloud API, etc.).

That said, we definitely don’t want it to appear like an “official” Microsoft Agent SDK package. If avoiding confusion is the main concern, a clean alternative for us could be cua-ai (or cua_ai), which still preserves the project identity while making the separation explicit. Happy to make that change if it aligns better with the project’s conventions.

Let me know which direction you’d prefer - we’re flexible as long as the identity remains clear.

@YeIIcw YeIIcw force-pushed the feature/cua-integration branch from 9c14b49 to 80bd9cd Compare November 15, 2025 05:10
@YeIIcw YeIIcw requested a review from a team as a code owner November 15, 2025 05:10
@markwallace-microsoft markwallace-microsoft added .NET workflows Related to Workflows in agent-framework lab Agent Framework Lab labels Nov 15, 2025
@github-actions github-actions bot changed the title Python: Add CuaAgentMiddleware for Computer-Use tool .NET: Python: Add CuaAgentMiddleware for Computer-Use tool Nov 15, 2025
@YeIIcw YeIIcw force-pushed the feature/cua-integration branch from 80bd9cd to 7c2bcee Compare November 15, 2025 05:13
@YeIIcw
Copy link

YeIIcw commented Nov 15, 2025

Hi @markwallace-microsoft, those .NET, workflows, and lab labels were added while I briefly pulled in the wrong files. The PR is back to Python-only now, so could you remove those tags when you get a chance? Thanks!

@crickman crickman requested review from TaoChenOSU, ekzhu and peibekwe and removed request for ekzhu November 17, 2025 21:15
@crickman crickman removed the .NET label Nov 17, 2025
# Create middleware
middleware = CuaAgentMiddleware(computer=computer)

# Create agent - no dummy variables needed!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what does it mean by "no dummy variables needed"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TaoChenOSU, thanks for spotting that. The “no dummy variables needed” comment is leftover from an earlier draft and will be removed.

@TaoChenOSU TaoChenOSU changed the title .NET: Python: Add CuaAgentMiddleware for Computer-Use tool Python: Add CuaAgentMiddleware for Computer-Use tool Nov 18, 2025
@github-actions github-actions bot changed the title Python: Add CuaAgentMiddleware for Computer-Use tool .NET: Python: Add CuaAgentMiddleware for Computer-Use tool Nov 18, 2025
@YeIIcw YeIIcw force-pushed the feature/cua-integration branch from 2c5566a to 5c43237 Compare November 18, 2025 23:17
@f-trycua
Copy link
Author

Hi @TaoChenOSU @ekzhu - what're the pending items left on this PR?

@markwallace-microsoft markwallace-microsoft changed the title .NET: Python: Add CuaAgentMiddleware for Computer-Use tool Python: Add CuaAgentMiddleware for Computer-Use tool Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation lab Agent Framework Lab python workflows Related to Workflows in agent-framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants