|
| 1 | +# Isolated State |
| 2 | + |
| 3 | +**Status**: Proposed |
| 4 | + |
| 5 | +**Date**: 2026-02-16 |
| 6 | + |
| 7 | +**Issue**: N/A |
| 8 | + |
| 9 | +## Context |
| 10 | + |
| 11 | +Today, the `Agent` class stores all mutable per-invocation state as instance fields. A few examples include: |
| 12 | + |
| 13 | +- `messages` — conversation history |
| 14 | +- `state` (AgentState) — user-facing key-value state |
| 15 | +- `event_loop_metrics` — token usage and performance metrics |
| 16 | +- `trace_span` — the current OpenTelemetry trace span |
| 17 | +- `_interrupt_state` — interrupt tracking |
| 18 | + |
| 19 | +Because this state lives directly on the agent instance, two concurrent invocations would corrupt each other's data. The SDK prevents this with a `threading.Lock` that raises `ConcurrencyException` if a second call arrives while the first is still running: |
| 20 | + |
| 21 | +```python |
| 22 | +# From agent.py stream_async |
| 23 | +acquired = self._invocation_lock.acquire(blocking=False) |
| 24 | +if not acquired: |
| 25 | + raise ConcurrencyException( |
| 26 | + "Agent is already processing a request. Concurrent invocations are not supported." |
| 27 | + ) |
| 28 | +``` |
| 29 | + |
| 30 | +### The problem in practice |
| 31 | + |
| 32 | +A simple concurrent use case fails today: |
| 33 | + |
| 34 | +```python |
| 35 | +import asyncio |
| 36 | +from strands import Agent |
| 37 | + |
| 38 | +agent = Agent(system_prompt="You are a helpful assistant.") |
| 39 | + |
| 40 | +async def main(): |
| 41 | + # This raises ConcurrencyException on the second call |
| 42 | + results = await asyncio.gather( |
| 43 | + agent.invoke_async("Summarize the Python GIL"), |
| 44 | + agent.invoke_async("Summarize the Rust borrow checker"), |
| 45 | + ) |
| 46 | + |
| 47 | +asyncio.run(main()) |
| 48 | +``` |
| 49 | + |
| 50 | +### The workaround is verbose and limiting |
| 51 | + |
| 52 | +To get around this today, users must create separate agent instances: |
| 53 | + |
| 54 | +```python |
| 55 | +import asyncio |
| 56 | +from strands import Agent |
| 57 | + |
| 58 | +def make_agent(): |
| 59 | + return Agent( |
| 60 | + model=my_model, |
| 61 | + tools=[tool_a, tool_b], |
| 62 | + system_prompt="You are a helpful assistant.", |
| 63 | + ) |
| 64 | + |
| 65 | +async def main(): |
| 66 | + results = await asyncio.gather( |
| 67 | + make_agent().invoke_async("Summarize the Python GIL"), |
| 68 | + make_agent().invoke_async("Summarize the Rust borrow checker"), |
| 69 | + ) |
| 70 | + |
| 71 | +asyncio.run(main()) |
| 72 | +``` |
| 73 | + |
| 74 | +This works for simple scripts, but breaks down anywhere a function accepts an agent instance directly. The factory-function pattern can't help when the caller expects a pre-configured agent. `Graph.add_node` is one example — it takes an agent instance, and it validates that each node has a unique instance: |
| 75 | + |
| 76 | +```python |
| 77 | +# From graph.py _validate_node_executor |
| 78 | +if id(executor) in seen_instances: |
| 79 | + raise ValueError("Duplicate node instance detected. Each node must have a unique object instance.") |
| 80 | +``` |
| 81 | + |
| 82 | +If you have a generic agent (e.g., a summarizer) that you want to reuse across multiple graph nodes, you can't. You must create separate instances with identical configuration: |
| 83 | + |
| 84 | +```python |
| 85 | +from strands import Agent |
| 86 | +from strands.multiagent.graph import GraphBuilder |
| 87 | + |
| 88 | +summarizer_config = dict( |
| 89 | + model=my_model, |
| 90 | + tools=[summarize_tool], |
| 91 | + system_prompt="You are a summarizer.", |
| 92 | +) |
| 93 | + |
| 94 | +graph = GraphBuilder() |
| 95 | +# Must create separate instances even though they're identical |
| 96 | +graph.add_node(Agent(**summarizer_config), node_id="summarize_a") |
| 97 | +graph.add_node(Agent(**summarizer_config), node_id="summarize_b") |
| 98 | +``` |
| 99 | + |
| 100 | +This goes against the SDK's goal of building agents in just a few lines of code. |
| 101 | + |
| 102 | +### State reset is fragile |
| 103 | + |
| 104 | +Any code that needs to reset an agent to a clean state must manually reach into its internals and know which fields to clear. This is error-prone — if the agent gains new stateful fields in the future, every reset site must be updated or it silently leaks state between executions. |
| 105 | + |
| 106 | +The graph implementation is a good example of this: |
| 107 | + |
| 108 | +```python |
| 109 | +# From graph.py GraphNode.reset_executor_state |
| 110 | +def reset_executor_state(self) -> None: |
| 111 | + if hasattr(self.executor, "messages"): |
| 112 | + self.executor.messages = copy.deepcopy(self._initial_messages) |
| 113 | + |
| 114 | + if hasattr(self.executor, "state"): |
| 115 | + self.executor.state = AgentState(self._initial_state.get()) |
| 116 | + |
| 117 | + self.execution_status = Status.PENDING |
| 118 | + self.result = None |
| 119 | +``` |
| 120 | + |
| 121 | +It deep-copies initial state at construction time and manually resets specific fields. This pattern would need to be replicated anywhere else that needs to reset agent state. |
| 122 | + |
| 123 | +## Decision |
| 124 | + |
| 125 | +Consider making `Agent` stateless by extracting all per-invocation mutable state into an isolated state object, managed through a session manager and keyed by an invocation key. |
| 126 | + |
| 127 | +### Isolated invocation state |
| 128 | + |
| 129 | +One approach would be to move all mutable state out of the agent instance and into a per-invocation state object: |
| 130 | + |
| 131 | +```python |
| 132 | +class InvocationState: |
| 133 | + """All mutable state for a single agent invocation.""" |
| 134 | + messages: Messages |
| 135 | + agent_state: AgentState |
| 136 | + event_loop_metrics: EventLoopMetrics |
| 137 | + trace_span: trace_api.Span | None |
| 138 | + interrupt_state: _InterruptState |
| 139 | +``` |
| 140 | + |
| 141 | +The agent instance would retain only configuration: model, tools, system prompt, hooks, callback handler, conversation manager, etc. In the future, configuration could also be extracted into its own isolated object to allow per-invocation overrides, but this document focuses on invocation state to highlight the core problem and start the discussion. |
| 142 | + |
| 143 | +### Session manager provides state |
| 144 | + |
| 145 | +At invocation time, the agent could read state from a session manager using an invocation key: |
| 146 | + |
| 147 | +```python |
| 148 | +# Pseudo-code for agent.stream_async |
| 149 | +async def stream_async(self, prompt, *, invocation_key=None, **kwargs): |
| 150 | + # Resolve the invocation key |
| 151 | + key = invocation_key or self._default_invocation_key |
| 152 | + |
| 153 | + # Load isolated state from session manager |
| 154 | + invocation_state = await self.session_manager.load(key) |
| 155 | + |
| 156 | + # Run the event loop against the isolated state (not self) |
| 157 | + async for event in self._run_loop(invocation_state, prompt, **kwargs): |
| 158 | + yield event |
| 159 | + |
| 160 | + # Persist state back |
| 161 | + await self.session_manager.save(key, invocation_state) |
| 162 | +``` |
| 163 | + |
| 164 | +Because each invocation would operate on its own state object, there would be no shared mutable state on the agent. The `threading.Lock` and `ConcurrencyException` would no longer be needed. |
| 165 | + |
| 166 | +### Default behavior could preserve backwards compatibility |
| 167 | + |
| 168 | +One idea is to introduce a default in-memory session manager. Each agent instance would get a default invocation key that is stable across calls: |
| 169 | + |
| 170 | +```python |
| 171 | +class InMemorySessionManager(SessionManager): |
| 172 | + """Stores state in memory, keyed by invocation key.""" |
| 173 | + |
| 174 | + def __init__(self): |
| 175 | + self._store: dict[str, InvocationState] = {} |
| 176 | + |
| 177 | + async def load(self, key: str) -> InvocationState: |
| 178 | + if key not in self._store: |
| 179 | + self._store[key] = InvocationState() |
| 180 | + return self._store[key] |
| 181 | + |
| 182 | + async def save(self, key: str, state: InvocationState) -> None: |
| 183 | + self._store[key] = state |
| 184 | +``` |
| 185 | + |
| 186 | +When no invocation key is supplied, the agent would use a default key tied to the instance. This would mean: |
| 187 | + |
| 188 | +- Sequential calls accumulate conversation history, just like today. |
| 189 | +- A single agent instance with no invocation key behaves identically to the current implementation. |
| 190 | +- No code changes required for existing users. |
| 191 | + |
| 192 | +### Concurrent usage with invocation keys |
| 193 | + |
| 194 | +Users who want concurrency could supply distinct invocation keys: |
| 195 | + |
| 196 | +```python |
| 197 | +import asyncio |
| 198 | +from strands import Agent |
| 199 | + |
| 200 | +agent = Agent(system_prompt="You are a helpful assistant.") |
| 201 | + |
| 202 | +async def main(): |
| 203 | + results = await asyncio.gather( |
| 204 | + agent.invoke_async("Summarize the Python GIL", invocation_key="task-1"), |
| 205 | + agent.invoke_async("Summarize the Rust borrow checker", invocation_key="task-2"), |
| 206 | + ) |
| 207 | + |
| 208 | +asyncio.run(main()) |
| 209 | +``` |
| 210 | + |
| 211 | +Each key would get its own isolated messages, agent state, metrics, and trace span. No lock contention, no `ConcurrencyException`. |
| 212 | + |
| 213 | +### Graph could become simpler |
| 214 | + |
| 215 | +With isolated state, graph nodes could reuse the same agent instance. The graph would pass a unique invocation key per node execution: |
| 216 | + |
| 217 | +```python |
| 218 | +from strands import Agent |
| 219 | +from strands.multiagent.graph import GraphBuilder |
| 220 | + |
| 221 | +summarizer = Agent( |
| 222 | + model=my_model, |
| 223 | + tools=[summarize_tool], |
| 224 | + system_prompt="You are a summarizer.", |
| 225 | +) |
| 226 | + |
| 227 | +graph = GraphBuilder() |
| 228 | +# Same instance, different invocation keys per execution |
| 229 | +graph.add_node(summarizer, node_id="summarize_a") |
| 230 | +graph.add_node(summarizer, node_id="summarize_b") |
| 231 | +``` |
| 232 | + |
| 233 | +The `_validate_node_executor` duplicate-instance check would no longer be needed. `GraphNode.reset_executor_state` could be removed — each execution would start with a fresh invocation state loaded from the session manager. No more deep-copying initial state, no more manually resetting fields, and no risk of missing new stateful fields in the future. |
| 234 | + |
| 235 | +## Developer Experience |
| 236 | + |
| 237 | +### Basic usage (unchanged) |
| 238 | + |
| 239 | +```python |
| 240 | +from strands import Agent |
| 241 | + |
| 242 | +agent = Agent(system_prompt="You are a helpful assistant.") |
| 243 | +result = agent("Hello!") # Uses default invocation key |
| 244 | +result = agent("Follow up") # Same key, conversation continues |
| 245 | +``` |
| 246 | + |
| 247 | +### Concurrent usage |
| 248 | + |
| 249 | +```python |
| 250 | +import asyncio |
| 251 | +from strands import Agent |
| 252 | + |
| 253 | +agent = Agent(system_prompt="You are a helpful assistant.") |
| 254 | + |
| 255 | +async def handle_request(user_id: str, message: str): |
| 256 | + return await agent.invoke_async(message, invocation_key=user_id) |
| 257 | + |
| 258 | +async def main(): |
| 259 | + results = await asyncio.gather( |
| 260 | + handle_request("user-1", "What is Python?"), |
| 261 | + handle_request("user-2", "What is Rust?"), |
| 262 | + ) |
| 263 | +``` |
| 264 | + |
| 265 | +### State reset |
| 266 | + |
| 267 | +Rather than reaching into agent internals: |
| 268 | + |
| 269 | +```python |
| 270 | +# Today: manually reset individual fields |
| 271 | +agent.messages = [] |
| 272 | +agent.state = AgentState() |
| 273 | +``` |
| 274 | + |
| 275 | +State could be cleared through the session manager: |
| 276 | + |
| 277 | +```python |
| 278 | +# Proposed: clear state for a given invocation key |
| 279 | +await agent.session_manager.clear(invocation_key) |
| 280 | +``` |
| 281 | + |
| 282 | +## Consequences |
| 283 | + |
| 284 | +### What could become easier |
| 285 | + |
| 286 | +- Concurrent agent usage with a single instance |
| 287 | +- Resetting or clearing agent state without reaching into internals |
| 288 | +- Adding new stateful fields without updating reset logic in graph or other consumers |
| 289 | +- Serving multiple users/conversations from a single agent instance |
| 290 | + |
| 291 | +### What could become harder or change |
| 292 | + |
| 293 | +- Internal code that currently reads `self.messages` or `self.state` would need to be updated to read from the invocation state object |
| 294 | + - For example, hook callbacks that receive the agent and access `agent.messages` would need to be adapted |
| 295 | +- Session manager becomes a required concept (though a default in-memory implementation could make it invisible for simple use cases) |
| 296 | +- The `threading.Lock` and `CurrencyException` would be removed, which means users who relied on the exception as a signal would need to adapt |
| 297 | + |
| 298 | +### Backwards compatibility is the biggest concern |
| 299 | + |
| 300 | +Today, users directly read and write instance fields like `agent.messages` and `agent.state`. Moving these into an isolated invocation state object would break that public API surface. Community tools, custom hooks, and user code that accesses these fields would all need updating. Providing a smooth migration path — whether through proxy accessors, a compatibility layer, or clear deprecation — is the most significant challenge with this proposal. |
| 301 | + |
| 302 | +Given the scope of this change, it may be worth considering this as part of a v2 of the Python SDK rather than attempting it as a backwards-compatible evolution of v1. |
0 commit comments