Skip to content

Outbound A2A: support long-running tasks and multi-turn interaction#267

Merged
rockfordlhotka merged 24 commits intomainfrom
rockfordlhotka/265-outbound-a2a-long-running
Apr 12, 2026
Merged

Outbound A2A: support long-running tasks and multi-turn interaction#267
rockfordlhotka merged 24 commits intomainfrom
rockfordlhotka/265-outbound-a2a-long-running

Conversation

@rockfordlhotka
Copy link
Copy Markdown
Member

Summary

Closes #265

  • Long-running task polling: When HTTP-transport agents return Working/Submitted, poll GetTask with exponential backoff (2s → 30s cap) until terminal, forwarding intermediate status updates to the user
  • InputRequired multi-turn follow-up: Trust-gated follow-up loop for both HTTP and queue transports — Act-level trusted agents get autonomous LLM responses; others are surfaced through the user conversation
  • Inbound contextId continuation: RockBotTaskHandler uses contextId to maintain conversation history across multi-turn exchanges, enabling two RockBot instances to collaborate (e.g., negotiating a meeting time)
  • Loop protection: Hard max of 20 rounds + consecutive identical Q/A repetition detection (threshold 3)
  • OTel instrumentation: New metrics (polling_attempts, input_required_rounds, input_required_breaks), activity spans, and cross-container correlation tags (task_id, context_id, correlation_id, session_id)
  • Documentation: Updated docs/a2a.md with new section covering polling, InputRequired, trust model, loop protection, and observability

Key new files

  • InputRequiredHandler — shared service used by both HTTP dispatch and queue result handler
  • InputRequiredRepetitionDetector — modeled on RepetitiveToolCallDetector

Deferred to future issues

  • Streaming consumption via SendStreamingMessageAsync
  • SubscribeToTask as alternative to polling

Test plan

  • All 1112 existing tests pass (zero regressions)
  • 17 new tests: repetition detector (9), V1/V0.3 response mapping with contextId (8), RockBotTaskHandler continuation (2), PendingA2ATask mutable state (2)
  • Manual: two RockBot instances on same RabbitMQ bus, invoke multi-turn skill with InputRequired
  • Manual: HTTP agent returning Working state, verify polling + status relay

🤖 Generated with Claude Code

rockfordlhotka and others added 24 commits April 11, 2026 16:09
…rn follow-up (#265)

Enable two RockBot instances to collaborate on behalf of their users (e.g.,
negotiating a meeting time) by supporting the full A2A task lifecycle instead
of treating the first response as final.

- Poll GetTask with exponential backoff when HTTP agents return Working/Submitted
- Handle InputRequired via trust-gated follow-up loop (both HTTP and queue transports)
- Use existing inbound trust levels to decide autonomous vs user-surfaced responses
- Add contextId-based conversation continuation on the inbound RockBotTaskHandler
- Loop protection: max 20 rounds + repeated Q/A detection (threshold 3)
- OTel metrics (polling_attempts, input_required_rounds, input_required_breaks)
  and spans with cross-container correlation tags (task_id, context_id, session_id)
- 17 new tests covering repetition detector, response mapping, and continuation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two isolated RockBot instances (Alice and Bob), each with their own
RabbitMQ, communicating via HTTP A2A gateways. Enables integration
testing of the multi-turn InputRequired flow from #265.

Setup:
- rabbitmq-alice/bob: separate message bus instances
- agent-alice/bob: RockBot agents with per-instance seed data
- gateway-alice/bob: HTTP A2A endpoints (ports 5201/5202)
- blazor-alice/bob: Blazor UIs (ports 8081/8082)

Each agent's well-known-agents.json points to the other's gateway
with pre-configured API key auth. Trust stores pre-seed Act-level
trust so agents can collaborate autonomously.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without LLM credentials agents use EchoChatClient and can't reason
about tools — they just echo input back. Updated docker-compose
header with prerequisites and --env-file usage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New Act-level inbound skill that exercises the full InputRequired
flow: returns InputRequired on first call (proposing meeting times),
Completed on follow-up (confirming the selected time). Uses contextId
and conversation memory for multi-turn state tracking.

- Register skill in agent card (Program.cs), gateway appsettings
- Update peer seed data: well-known-agents + trust stores include
  negotiate-meeting with Act-level approval
- Expand .env.example with Azure OpenAI option

Test from Alice's Blazor UI (http://localhost:8081):
  "Negotiate a meeting with Bob for tomorrow"

Expected flow:
  Alice → invoke_agent(Bob, negotiate-meeting, ...)
  Bob returns InputRequired: "Available at 10am, 2pm, or 4pm"
  Alice's LLM picks a time → sends follow-up with contextId
  Bob returns Completed: "Meeting confirmed"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the existing agent naming mechanism (agent-name.md on data volume,
hot-reloaded by AgentProfileLoader) to give each peer instance a
distinct identity. The LLM was calling invoke_agent(agent_name=RockBot)
(itself) instead of "Bob" because both agents shared the default name.

With distinct display names, the Blazor UI also reflects the correct
agent identity per instance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AgentDirectory.StartAsync returned early when known-agents.json didn't
exist, skipping the well-known agent seeding loop below. On a fresh
data volume (no prior directory file), well-known agents from config
were never added to the directory — so list_known_agents returned
only the agent's own self-announcement, not the configured peers.

Move the early return into a scoped block so well-known seeding
always runs regardless of whether the persisted file exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pic)

The gateway was hardcoding "agent.task.RockBot" for RabbitMQ topic
routing. Now uses GatewayOptions.RoutingName (from InternalAgentName
config, falls back to AgentName). This separates the external agent
card identity (Alice/Bob) from the internal routing name (RockBot)
that must match the agent's WithIdentity() subscription topic.

In the peer docker-compose:
  Gateway__AgentName: Alice          (external — what callers see)
  Gateway__InternalAgentName: RockBot (internal — matches agent sub)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each gateway must accept the key that the OTHER agent sends:
- gateway-alice accepts bob-calls-alice (sent by Bob)
- gateway-bob accepts alice-calls-bob (sent by Alice)

The keys were backwards, causing 401 Unauthorized on every
cross-agent HTTP A2A call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The preview bubble published under the target agent's name used
IsFinal=false, but no corresponding IsFinal=true from that agent
name ever followed — the final synthesis comes under the primary
agent's name. The Blazor UI tracks spinners per agent name, so
the target agent's spinner never stopped.

Mark the preview bubble IsFinal=true since it IS the target agent's
final output for this task.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All completed inbound A2A tasks now store a searchable outcome in
working memory under a2a-outcomes/{skill}/{contextId} with an 8-hour
TTL and category "a2a-outcome". This lets the agent recall recent
inter-agent interactions when asked.

- negotiate-meeting: stores full exchange transcript + confirmation
- notify-user: stores notification text + sender
- Observe-level tasks: stores request + LLM summary

Tagged with caller name and skill for SearchWorkingMemory discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New section in docs/a2a.md covering:
- Skill registration (agent card, gateway, handler dispatch)
- Outcome persistence requirements (working memory, key pattern,
  category, TTL, tags)
- Multi-turn InputRequired pattern (contextId, conversation memory,
  turn storage, outcome on completion only)
- Trust and approval model

References HandleNegotiateMeetingAsync as the canonical example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues found during live testing:

1. A2A outcomes stored under global namespace (a2a-outcomes/) were
   invisible to SearchWorkingMemory which defaults to session scope.
   Move outcomes under session/{WellKnownSessions.Primary}/a2a-outcomes/
   so the user's LLM finds them naturally.

2. The LLM sometimes calls invoke_agent with its own identity name
   ("RockBot") instead of the target agent ("Bob"), creating a
   self-invocation loop. Add a guard that rejects self-invocation
   with a helpful error pointing to list_known_agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The negotiate-meeting skill now asks the caller (Alice) for purpose,
duration, and time preference in a single InputRequired round. The
notification to the receiving user (Bob) is purely informational —
no questions, just confirmed details.

Previously the skill only asked for a time, and Bob's Observe-level
notification would generate LLM questions about purpose/duration that
should have been directed at Alice during the negotiation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLMs commonly paraphrase skill names — Alice used "schedule-meeting"
instead of "negotiate-meeting", causing the request to fall to Observe
level (read-only summary with questions) instead of the Act-level
multi-turn handler.

- Add "schedule-meeting" as dispatch alias in RockBotTaskHandler
- Add to approved skills in both trust stores
- Note the alias in well-known-agents descriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Observe path returned a polite "your request has been received"
which the caller's LLM interpreted as success — hallucinating that a
meeting was confirmed when it was only queued for human review.

Replace with an unambiguous "IMPORTANT: This request was NOT completed"
message that explicitly states nothing was scheduled, confirmed, or
executed. The caller's LLM should relay that the request is pending
the other party's manual review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LLM was calling query-availability before negotiate-meeting
(unprompted by the user) and paraphrasing skill IDs. Add explicit
guidance:
- Call the skill that matches the request directly
- Don't call query-availability as a prerequisite
- Use exact skill IDs from list_known_agents, don't paraphrase
- One invoke_agent per user request unless asked otherwise

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Callers may paraphrase skill IDs (e.g. "schedule-meeting" instead of
"negotiate-meeting"). Instead of hardcoded aliases, use BM25 ranking
against skill IDs, names, descriptions, and known aliases.

- InboundSkillMatcher: exact ID → exact alias → BM25 fuzzy match
- RockBotTaskHandler: match requested skill before dispatch
- Logs matched skill for debugging
- Remove hardcoded schedule-meeting alias and trust store entries
- 15 new tests covering exact, alias, fuzzy, and no-match cases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All four A2A handlers (result, error, status, InputRequired) used
agent.Name (identity = "RockBot") for AgentReply.AgentName, causing
chat bubbles to show "RockBot" instead of the display name ("Alice").

Add AgentNameHolder to each handler and use DisplayName for all
user-facing fields (AgentReply.AgentName, conversation turns,
progress context). Envelope source field stays as agent.Name for
message routing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The gateway always returned a Message response via EnqueueMessageAsync,
losing the task state. When Bob's agent returned InputRequired, the
caller saw Completed (because the SDK maps Message as Completed).

Now returns a Task response via EnqueueTaskAsync for non-terminal
states (InputRequired, Working, Submitted) so the caller's SDK
preserves the state and the InputRequired multi-turn loop fires.
Terminal states still return Message for backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues causing the InputRequired loop to never complete:

1. Gateway didn't pass contextId from SDK request to the RabbitMQ
   AgentTaskRequest, so Bob's handler never saw a contextId and
   treated every follow-up as a fresh conversation (completedRounds=0).

2. Gateway used the caller's contextId for the response instead of
   the agent's contextId. Bob generates a contextId from the taskId
   on the first call; the gateway now forwards that back so the
   caller uses it for follow-ups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The result handler runs the LLM to synthesize a response after an A2A
task completes. With invoke_agent in the tool set, the LLM would call
it again — creating an infinite loop where each Completed result
triggered a new agent call.

Filter out A2A caller tools (invoke_agent, register/unregister_agent,
list_known_agents, get_agent_details) from the result handler's
ChatOptions. The result synthesis should only present the outcome to
the user, not initiate new agent interactions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The invoke_agent tool response was too mild — the LLM kept iterating
and called negotiate-meeting a second time before the first result
arrived. Strengthen the response to explicitly say STOP and present
a status to the user while waiting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LLM ignored the "STOP" instruction and called invoke_agent twice
in the same loop iteration, creating duplicate negotiate-meeting
round-trips. Now checks if the session already has a pending A2A task
in the tracker and returns an error telling the LLM to wait.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two sources of noise in the Blazor UI during A2A interactions:

1. The result handler published a preview bubble AND ran an LLM
   synthesis — both showed as separate messages, creating a
   double-confirmation. Remove the preview; the synthesis is the
   single user-facing message.

2. The InputRequired handler published a question bubble for each
   round. For autonomous follow-ups (Act-level trust), this is
   noise — the user only needs the final outcome. Remove the
   intermediate bubble.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 914efef into main Apr 12, 2026
2 checks passed
@rockfordlhotka rockfordlhotka deleted the rockfordlhotka/265-outbound-a2a-long-running branch April 12, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Outbound A2A: support long-running tasks, streaming, and multi-turn interaction

1 participant