Outbound A2A: support long-running tasks and multi-turn interaction by rockfordlhotka · Pull Request #267 · MarimerLLC/rockbot

rockfordlhotka · 2026-04-11T21:09:40Z

Summary

Closes #265

Long-running task polling: When HTTP-transport agents return Working/Submitted, poll GetTask with exponential backoff (2s → 30s cap) until terminal, forwarding intermediate status updates to the user
InputRequired multi-turn follow-up: Trust-gated follow-up loop for both HTTP and queue transports — Act-level trusted agents get autonomous LLM responses; others are surfaced through the user conversation
Inbound contextId continuation: RockBotTaskHandler uses contextId to maintain conversation history across multi-turn exchanges, enabling two RockBot instances to collaborate (e.g., negotiating a meeting time)
Loop protection: Hard max of 20 rounds + consecutive identical Q/A repetition detection (threshold 3)
OTel instrumentation: New metrics (polling_attempts, input_required_rounds, input_required_breaks), activity spans, and cross-container correlation tags (task_id, context_id, correlation_id, session_id)
Documentation: Updated docs/a2a.md with new section covering polling, InputRequired, trust model, loop protection, and observability

Key new files

InputRequiredHandler — shared service used by both HTTP dispatch and queue result handler
InputRequiredRepetitionDetector — modeled on RepetitiveToolCallDetector

Deferred to future issues

Streaming consumption via SendStreamingMessageAsync
SubscribeToTask as alternative to polling

Test plan

All 1112 existing tests pass (zero regressions)
17 new tests: repetition detector (9), V1/V0.3 response mapping with contextId (8), RockBotTaskHandler continuation (2), PendingA2ATask mutable state (2)
Manual: two RockBot instances on same RabbitMQ bus, invoke multi-turn skill with InputRequired
Manual: HTTP agent returning Working state, verify polling + status relay

🤖 Generated with Claude Code

…rn follow-up (#265) Enable two RockBot instances to collaborate on behalf of their users (e.g., negotiating a meeting time) by supporting the full A2A task lifecycle instead of treating the first response as final. - Poll GetTask with exponential backoff when HTTP agents return Working/Submitted - Handle InputRequired via trust-gated follow-up loop (both HTTP and queue transports) - Use existing inbound trust levels to decide autonomous vs user-surfaced responses - Add contextId-based conversation continuation on the inbound RockBotTaskHandler - Loop protection: max 20 rounds + repeated Q/A detection (threshold 3) - OTel metrics (polling_attempts, input_required_rounds, input_required_breaks) and spans with cross-container correlation tags (task_id, context_id, session_id) - 17 new tests covering repetition detector, response mapping, and continuation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two isolated RockBot instances (Alice and Bob), each with their own RabbitMQ, communicating via HTTP A2A gateways. Enables integration testing of the multi-turn InputRequired flow from #265. Setup: - rabbitmq-alice/bob: separate message bus instances - agent-alice/bob: RockBot agents with per-instance seed data - gateway-alice/bob: HTTP A2A endpoints (ports 5201/5202) - blazor-alice/bob: Blazor UIs (ports 8081/8082) Each agent's well-known-agents.json points to the other's gateway with pre-configured API key auth. Trust stores pre-seed Act-level trust so agents can collaborate autonomously. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Without LLM credentials agents use EchoChatClient and can't reason about tools — they just echo input back. Updated docker-compose header with prerequisites and --env-file usage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New Act-level inbound skill that exercises the full InputRequired flow: returns InputRequired on first call (proposing meeting times), Completed on follow-up (confirming the selected time). Uses contextId and conversation memory for multi-turn state tracking. - Register skill in agent card (Program.cs), gateway appsettings - Update peer seed data: well-known-agents + trust stores include negotiate-meeting with Act-level approval - Expand .env.example with Azure OpenAI option Test from Alice's Blazor UI (http://localhost:8081): "Negotiate a meeting with Bob for tomorrow" Expected flow: Alice → invoke_agent(Bob, negotiate-meeting, ...) Bob returns InputRequired: "Available at 10am, 2pm, or 4pm" Alice's LLM picks a time → sends follow-up with contextId Bob returns Completed: "Meeting confirmed" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use the existing agent naming mechanism (agent-name.md on data volume, hot-reloaded by AgentProfileLoader) to give each peer instance a distinct identity. The LLM was calling invoke_agent(agent_name=RockBot) (itself) instead of "Bob" because both agents shared the default name. With distinct display names, the Blazor UI also reflects the correct agent identity per instance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AgentDirectory.StartAsync returned early when known-agents.json didn't exist, skipping the well-known agent seeding loop below. On a fresh data volume (no prior directory file), well-known agents from config were never added to the directory — so list_known_agents returned only the agent's own self-announcement, not the configured peers. Move the early return into a scoped block so well-known seeding always runs regardless of whether the persisted file exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…pic) The gateway was hardcoding "agent.task.RockBot" for RabbitMQ topic routing. Now uses GatewayOptions.RoutingName (from InternalAgentName config, falls back to AgentName). This separates the external agent card identity (Alice/Bob) from the internal routing name (RockBot) that must match the agent's WithIdentity() subscription topic. In the peer docker-compose: Gateway__AgentName: Alice (external — what callers see) Gateway__InternalAgentName: RockBot (internal — matches agent sub) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each gateway must accept the key that the OTHER agent sends: - gateway-alice accepts bob-calls-alice (sent by Bob) - gateway-bob accepts alice-calls-bob (sent by Alice) The keys were backwards, causing 401 Unauthorized on every cross-agent HTTP A2A call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The preview bubble published under the target agent's name used IsFinal=false, but no corresponding IsFinal=true from that agent name ever followed — the final synthesis comes under the primary agent's name. The Blazor UI tracks spinners per agent name, so the target agent's spinner never stopped. Mark the preview bubble IsFinal=true since it IS the target agent's final output for this task. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All completed inbound A2A tasks now store a searchable outcome in working memory under a2a-outcomes/{skill}/{contextId} with an 8-hour TTL and category "a2a-outcome". This lets the agent recall recent inter-agent interactions when asked. - negotiate-meeting: stores full exchange transcript + confirmation - notify-user: stores notification text + sender - Observe-level tasks: stores request + LLM summary Tagged with caller name and skill for SearchWorkingMemory discovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New section in docs/a2a.md covering: - Skill registration (agent card, gateway, handler dispatch) - Outcome persistence requirements (working memory, key pattern, category, TTL, tags) - Multi-turn InputRequired pattern (contextId, conversation memory, turn storage, outcome on completion only) - Trust and approval model References HandleNegotiateMeetingAsync as the canonical example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two issues found during live testing: 1. A2A outcomes stored under global namespace (a2a-outcomes/) were invisible to SearchWorkingMemory which defaults to session scope. Move outcomes under session/{WellKnownSessions.Primary}/a2a-outcomes/ so the user's LLM finds them naturally. 2. The LLM sometimes calls invoke_agent with its own identity name ("RockBot") instead of the target agent ("Bob"), creating a self-invocation loop. Add a guard that rejects self-invocation with a helpful error pointing to list_known_agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The negotiate-meeting skill now asks the caller (Alice) for purpose, duration, and time preference in a single InputRequired round. The notification to the receiving user (Bob) is purely informational — no questions, just confirmed details. Previously the skill only asked for a time, and Bob's Observe-level notification would generate LLM questions about purpose/duration that should have been directed at Alice during the negotiation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LLMs commonly paraphrase skill names — Alice used "schedule-meeting" instead of "negotiate-meeting", causing the request to fall to Observe level (read-only summary with questions) instead of the Act-level multi-turn handler. - Add "schedule-meeting" as dispatch alias in RockBotTaskHandler - Add to approved skills in both trust stores - Note the alias in well-known-agents descriptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The Observe path returned a polite "your request has been received" which the caller's LLM interpreted as success — hallucinating that a meeting was confirmed when it was only queued for human review. Replace with an unambiguous "IMPORTANT: This request was NOT completed" message that explicitly states nothing was scheduled, confirmed, or executed. The caller's LLM should relay that the request is pending the other party's manual review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The LLM was calling query-availability before negotiate-meeting (unprompted by the user) and paraphrasing skill IDs. Add explicit guidance: - Call the skill that matches the request directly - Don't call query-availability as a prerequisite - Use exact skill IDs from list_known_agents, don't paraphrase - One invoke_agent per user request unless asked otherwise Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Callers may paraphrase skill IDs (e.g. "schedule-meeting" instead of "negotiate-meeting"). Instead of hardcoded aliases, use BM25 ranking against skill IDs, names, descriptions, and known aliases. - InboundSkillMatcher: exact ID → exact alias → BM25 fuzzy match - RockBotTaskHandler: match requested skill before dispatch - Logs matched skill for debugging - Remove hardcoded schedule-meeting alias and trust store entries - 15 new tests covering exact, alias, fuzzy, and no-match cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All four A2A handlers (result, error, status, InputRequired) used agent.Name (identity = "RockBot") for AgentReply.AgentName, causing chat bubbles to show "RockBot" instead of the display name ("Alice"). Add AgentNameHolder to each handler and use DisplayName for all user-facing fields (AgentReply.AgentName, conversation turns, progress context). Envelope source field stays as agent.Name for message routing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The gateway always returned a Message response via EnqueueMessageAsync, losing the task state. When Bob's agent returned InputRequired, the caller saw Completed (because the SDK maps Message as Completed). Now returns a Task response via EnqueueTaskAsync for non-terminal states (InputRequired, Working, Submitted) so the caller's SDK preserves the state and the InputRequired multi-turn loop fires. Terminal states still return Message for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two issues causing the InputRequired loop to never complete: 1. Gateway didn't pass contextId from SDK request to the RabbitMQ AgentTaskRequest, so Bob's handler never saw a contextId and treated every follow-up as a fresh conversation (completedRounds=0). 2. Gateway used the caller's contextId for the response instead of the agent's contextId. Bob generates a contextId from the taskId on the first call; the gateway now forwards that back so the caller uses it for follow-ups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The result handler runs the LLM to synthesize a response after an A2A task completes. With invoke_agent in the tool set, the LLM would call it again — creating an infinite loop where each Completed result triggered a new agent call. Filter out A2A caller tools (invoke_agent, register/unregister_agent, list_known_agents, get_agent_details) from the result handler's ChatOptions. The result synthesis should only present the outcome to the user, not initiate new agent interactions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The invoke_agent tool response was too mild — the LLM kept iterating and called negotiate-meeting a second time before the first result arrived. Strengthen the response to explicitly say STOP and present a status to the user while waiting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The LLM ignored the "STOP" instruction and called invoke_agent twice in the same loop iteration, creating duplicate negotiate-meeting round-trips. Now checks if the session already has a pending A2A task in the tracker and returns an error telling the LLM to wait. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two sources of noise in the Blazor UI during A2A interactions: 1. The result handler published a preview bubble AND ran an LLM synthesis — both showed as separate messages, creating a double-confirmation. Remove the preview; the synthesis is the single user-facing message. 2. The InputRequired handler published a question bubble for each round. For autonomous follow-ups (Act-level trust), this is noise — the user only needs the final outcome. Remove the intermediate bubble. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rockfordlhotka and others added 24 commits April 11, 2026 16:09

rockfordlhotka merged commit 914efef into main Apr 12, 2026
2 checks passed

rockfordlhotka deleted the rockfordlhotka/265-outbound-a2a-long-running branch April 12, 2026 03:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outbound A2A: support long-running tasks and multi-turn interaction#267

Outbound A2A: support long-running tasks and multi-turn interaction#267
rockfordlhotka merged 24 commits intomainfrom
rockfordlhotka/265-outbound-a2a-long-running

rockfordlhotka commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rockfordlhotka commented Apr 11, 2026

Summary

Key new files

Deferred to future issues

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant