Skip to content

feat: add CancellationToken for graceful agent execution cancellation#1772

Merged
pgrayy merged 6 commits intostrands-agents:mainfrom
jgoyani1:feat/cancellation-token-impl
Mar 9, 2026
Merged

feat: add CancellationToken for graceful agent execution cancellation#1772
pgrayy merged 6 commits intostrands-agents:mainfrom
jgoyani1:feat/cancellation-token-impl

Conversation

@jgoyani1
Copy link
Contributor

@jgoyani1 jgoyani1 commented Feb 26, 2026

Motivation

Agents need a way to be stopped from external contexts — web request handlers, background threads, timeout logic. Currently there is no graceful cancellation mechanism, so callers have no way to interrupt a running agent without killing the process.

Resolves: #81

Public API Changes

New agent.cancel() method for graceful cancellation:

agent = Agent()

# Cancel from any thread or async context
agent.cancel()

# Agent stops at next checkpoint with stop_reason="cancelled"
result = await agent.invoke_async("Hello")
assert result.stop_reason == "cancelled"

Cancellation is checked at two strategic checkpoints:

  • During model response streaming — discards partial output, returns {"text": "Cancelled by user"}
  • Before tool execution — adds error toolResult for each pending toolUse to maintain valid conversation state

The agent is reusable after cancellation — the cancel signal is automatically cleared when the invocation completes.

Use Cases

  • Web servers: Cancel agent on request timeout or client disconnect
  • Background tasks: Stop agent from a monitoring thread when conditions change
  • Interactive UIs: Wire a "Stop" button to agent.cancel()

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/strands/event_loop/event_loop.py 88.23% 0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Issue: Code coverage gap noted by Codecov (74.4% patch coverage).

The Codecov report indicates 11 missing lines in src/strands/event_loop/event_loop.py. Consider adding unit tests specifically for:

  1. Checkpoint 1 cancellation (cycle start) - lines 152-164
  2. Checkpoint 2 cancellation (before model call) - lines 346-359
  3. Checkpoint 4 cancellation (before tool execution) - lines 522-533

While the integration tests cover many scenarios, targeted unit tests with mock signals would improve coverage and make it easier to verify each checkpoint works correctly in isolation.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Review Summary

Assessment: Request Changes

This is a well-designed feature that adds graceful cancellation support to agents. The implementation is thread-safe, has clear checkpoints, and includes comprehensive tests. However, there's a critical issue that needs to be addressed before merging.

Review Categories
  • Critical - Agent Reusability: The StopSignal has no reset mechanism. Once cancel() is called, the agent is permanently cancelled and cannot be reused for subsequent invocations. This needs either a reset() method or automatic reset at invocation start.

  • Important - Naming Consistency: As noted by @pgrayy, the naming should be aligned. Consider renaming StopSignal to CancelSignal and updating docstrings that reference "cancellation token" for consistency.

  • Test Coverage: Codecov reports 74.4% patch coverage with 11 missing lines. Consider adding targeted unit tests for each cancellation checkpoint.

The core implementation is solid - the four checkpoint design is sensible and the thread-safety approach using threading.Lock is appropriate.

@jgoyani1 jgoyani1 force-pushed the feat/cancellation-token-impl branch from ae46812 to c9535ce Compare March 5, 2026 06:17
@github-actions github-actions bot added size/l and removed size/l labels Mar 5, 2026
@jgoyani1 jgoyani1 force-pushed the feat/cancellation-token-impl branch from c9535ce to 05bc0bf Compare March 5, 2026 06:23
@github-actions github-actions bot added size/xl and removed size/l labels Mar 5, 2026
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary (Fourth Pass)

Assessment: Ready for approval pending documentation

The implementation is now complete with excellent test coverage (93.33%, up from 74.4%).

All Technical Issues Resolved
Issue Resolution
Reset mechanism _cancel_signal.clear() in finally block
threading.Event ✅ Used directly (no custom class)
Type hints threading.Event | None
Checkpoints ✅ Reduced to 2 (mid-stream + before tools)
_should_cancel helper ✅ Removed
Test cleanup ✅ Simplified and focused
Agent reuse ✅ Verified in both unit and integration tests

Remaining items:

  1. Documentation PR - Still empty. This new public API needs documentation.
  2. Model ID - Consider updating to avoid potential deprecation (per pgrayy's comment)

The code quality is excellent. Once documentation is addressed, this is ready to merge.

pgrayy
pgrayy previously approved these changes Mar 8, 2026
@agent-of-mkmeral
Copy link

🔬 Adversarial Testing: Cancel + Interrupt Interactions

I ran additional adversarial tests focusing on how cancel() interacts with interrupts, concurrent tool execution, and partial results.

Result: ✅ No critical bugs found — 2 design observations documented below.

📊 Test Summary
Metric Value
Tests written 25
Tests passing 17
Tests failing (design behaviors, not bugs) 8
Critical bugs 0
🔍 Finding 1: Cancel is NOT checked during sequential tool execution

Observation (Design Choice — Not a Bug)

With SequentialToolExecutor, cancellation is checked before starting tool execution, not between individual tools.

Evidence:

# Cancel called after first tool starts → BOTH tools complete fully
execution_log = ['start_0.1', 'end_0.1', 'start_0.5', 'end_0.5']

Where in code: event_loop.py ~line 468-500 — the cancel check happens once before the executor runs.

Impact: For long-running tools, cancel() won't interrupt mid-tool. The current design is safer (avoids inconsistent state) but less responsive.

Is this a bug? No — this appears intentional. Stopping a running tool mid-execution could leave state inconsistent.

🔍 Finding 2: Cancel during interrupt state doesn't clear interrupt

Observation (UX Question — May Need Documentation)

When the agent is in INTERRUPTED state and cancel() is called:

  • ✅ Cancel signal is set
  • ✅ Interrupt state remains activated=True
  • ⚠️ Subsequent agent("normal prompt") throws TypeError because you're still in interrupt state

Reproduction:

result = await agent.invoke_async("start")  # Triggers interrupt
assert result.stop_reason == "interrupt"

agent.cancel()  # User wants to cancel

# This FAILS with TypeError!
await agent.invoke_async("normal prompt")  # Must resume from interrupt

# To recover, must manually:
agent._interrupt_state.deactivate()
agent._cancel_signal.clear()

Is this a bug? Probably not — but it's a UX question:

  • Current: Cancel doesn't abandon interrupt workflow (preserves state for recovery)
  • Alternative: Cancel could also call _interrupt_state.deactivate() for true "cancel and forget"

The current behavior is defensible but should be documented.

✅ Confirmed: No Race Conditions

Thread safety verified with 50+ concurrent threads calling cancel() simultaneously.

threading.Event provides the needed thread safety — no race conditions found.

✅ Confirmed: No Information Loss

Every tool_use gets a corresponding tool_result (even when cancelled).

The implementation correctly adds cancel toolResult for unexecuted tools, maintaining valid conversation state.

🎯 State Machine Analysis
Agent States:
- IDLE: Agent not running, ready for invocation
- STREAMING: Model is generating response
- TOOL_PENDING: Model returned tool_use(s), about to execute
- TOOL_EXECUTING: Currently executing a tool
- INTERRUPTED: Waiting for human input (interruptResponse)
- CANCELLED: Invocation was cancelled

Key Transitions:
- STREAMING --cancel()--> Check at next chunk, return "cancelled"
- TOOL_PENDING --cancel()--> All tools get cancel results
- TOOL_EXECUTING --cancel()--> Current tool COMPLETES, remaining get cancel results
- INTERRUPTED --cancel()--> Signal set, but still in INTERRUPTED state

Conclusion: The implementation is solid. The two findings are design observations, not bugs — both behaviors are defensible and appear intentional. Consider documenting the cancel + interrupt interaction for users.

Move cancel_signal out of invocation_state to avoid leaking internal
implementation details to hooks, tools, and model providers. The signal
is now passed as a dedicated parameter to stream_messages and
process_stream, while event_loop continues accessing it directly via
agent._cancel_signal.
@jgoyani1 jgoyani1 requested a review from pgrayy March 9, 2026 17:31
@pgrayy pgrayy merged commit 73fe9cc into strands-agents:main Mar 9, 2026
20 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add support of CancellationToken

4 participants