Skip to content

Commit 8fa6d5d

Browse files
committed
design: Add 0002-isolated-state proposal
1 parent c48be6b commit 8fa6d5d

1 file changed

Lines changed: 302 additions & 0 deletions

File tree

designs/0002-isolated-state.md

Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# Isolated State
2+
3+
**Status**: Proposed
4+
5+
**Date**: 2026-02-16
6+
7+
**Issue**: N/A
8+
9+
## Context
10+
11+
Today, the `Agent` class stores all mutable per-invocation state as instance fields. A few examples include:
12+
13+
- `messages` — conversation history
14+
- `state` (AgentState) — user-facing key-value state
15+
- `event_loop_metrics` — token usage and performance metrics
16+
- `trace_span` — the current OpenTelemetry trace span
17+
- `_interrupt_state` — interrupt tracking
18+
19+
Because this state lives directly on the agent instance, two concurrent invocations would corrupt each other's data. The SDK prevents this with a `threading.Lock` that raises `ConcurrencyException` if a second call arrives while the first is still running:
20+
21+
```python
22+
# From agent.py stream_async
23+
acquired = self._invocation_lock.acquire(blocking=False)
24+
if not acquired:
25+
raise ConcurrencyException(
26+
"Agent is already processing a request. Concurrent invocations are not supported."
27+
)
28+
```
29+
30+
### The problem in practice
31+
32+
A simple concurrent use case fails today:
33+
34+
```python
35+
import asyncio
36+
from strands import Agent
37+
38+
agent = Agent(system_prompt="You are a helpful assistant.")
39+
40+
async def main():
41+
# This raises ConcurrencyException on the second call
42+
results = await asyncio.gather(
43+
agent.invoke_async("Summarize the Python GIL"),
44+
agent.invoke_async("Summarize the Rust borrow checker"),
45+
)
46+
47+
asyncio.run(main())
48+
```
49+
50+
### The workaround is verbose and limiting
51+
52+
To get around this today, users must create separate agent instances:
53+
54+
```python
55+
import asyncio
56+
from strands import Agent
57+
58+
def make_agent():
59+
return Agent(
60+
model=my_model,
61+
tools=[tool_a, tool_b],
62+
system_prompt="You are a helpful assistant.",
63+
)
64+
65+
async def main():
66+
results = await asyncio.gather(
67+
make_agent().invoke_async("Summarize the Python GIL"),
68+
make_agent().invoke_async("Summarize the Rust borrow checker"),
69+
)
70+
71+
asyncio.run(main())
72+
```
73+
74+
This works for simple scripts, but breaks down anywhere a function accepts an agent instance directly. The factory-function pattern can't help when the caller expects a pre-configured agent. `Graph.add_node` is one example — it takes an agent instance, and it validates that each node has a unique instance:
75+
76+
```python
77+
# From graph.py _validate_node_executor
78+
if id(executor) in seen_instances:
79+
raise ValueError("Duplicate node instance detected. Each node must have a unique object instance.")
80+
```
81+
82+
If you have a generic agent (e.g., a summarizer) that you want to reuse across multiple graph nodes, you can't. You must create separate instances with identical configuration:
83+
84+
```python
85+
from strands import Agent
86+
from strands.multiagent.graph import GraphBuilder
87+
88+
summarizer_config = dict(
89+
model=my_model,
90+
tools=[summarize_tool],
91+
system_prompt="You are a summarizer.",
92+
)
93+
94+
graph = GraphBuilder()
95+
# Must create separate instances even though they're identical
96+
graph.add_node(Agent(**summarizer_config), node_id="summarize_a")
97+
graph.add_node(Agent(**summarizer_config), node_id="summarize_b")
98+
```
99+
100+
This goes against the SDK's goal of building agents in just a few lines of code.
101+
102+
### State reset is fragile
103+
104+
Any code that needs to reset an agent to a clean state must manually reach into its internals and know which fields to clear. This is error-prone — if the agent gains new stateful fields in the future, every reset site must be updated or it silently leaks state between executions.
105+
106+
The graph implementation is a good example of this:
107+
108+
```python
109+
# From graph.py GraphNode.reset_executor_state
110+
def reset_executor_state(self) -> None:
111+
if hasattr(self.executor, "messages"):
112+
self.executor.messages = copy.deepcopy(self._initial_messages)
113+
114+
if hasattr(self.executor, "state"):
115+
self.executor.state = AgentState(self._initial_state.get())
116+
117+
self.execution_status = Status.PENDING
118+
self.result = None
119+
```
120+
121+
It deep-copies initial state at construction time and manually resets specific fields. This pattern would need to be replicated anywhere else that needs to reset agent state.
122+
123+
## Decision
124+
125+
Consider making `Agent` stateless by extracting all per-invocation mutable state into an isolated state object, managed through a session manager and keyed by an invocation key.
126+
127+
### Isolated invocation state
128+
129+
One approach would be to move all mutable state out of the agent instance and into a per-invocation state object:
130+
131+
```python
132+
class InvocationState:
133+
"""All mutable state for a single agent invocation."""
134+
messages: Messages
135+
agent_state: AgentState
136+
event_loop_metrics: EventLoopMetrics
137+
trace_span: trace_api.Span | None
138+
interrupt_state: _InterruptState
139+
```
140+
141+
The agent instance would retain only configuration: model, tools, system prompt, hooks, callback handler, conversation manager, etc. In the future, configuration could also be extracted into its own isolated object to allow per-invocation overrides, but this document focuses on invocation state to highlight the core problem and start the discussion.
142+
143+
### Session manager provides state
144+
145+
At invocation time, the agent could read state from a session manager using an invocation key:
146+
147+
```python
148+
# Pseudo-code for agent.stream_async
149+
async def stream_async(self, prompt, *, invocation_key=None, **kwargs):
150+
# Resolve the invocation key
151+
key = invocation_key or self._default_invocation_key
152+
153+
# Load isolated state from session manager
154+
invocation_state = await self.session_manager.load(key)
155+
156+
# Run the event loop against the isolated state (not self)
157+
async for event in self._run_loop(invocation_state, prompt, **kwargs):
158+
yield event
159+
160+
# Persist state back
161+
await self.session_manager.save(key, invocation_state)
162+
```
163+
164+
Because each invocation would operate on its own state object, there would be no shared mutable state on the agent. The `threading.Lock` and `ConcurrencyException` would no longer be needed.
165+
166+
### Default behavior could preserve backwards compatibility
167+
168+
One idea is to introduce a default in-memory session manager. Each agent instance would get a default invocation key that is stable across calls:
169+
170+
```python
171+
class InMemorySessionManager(SessionManager):
172+
"""Stores state in memory, keyed by invocation key."""
173+
174+
def __init__(self):
175+
self._store: dict[str, InvocationState] = {}
176+
177+
async def load(self, key: str) -> InvocationState:
178+
if key not in self._store:
179+
self._store[key] = InvocationState()
180+
return self._store[key]
181+
182+
async def save(self, key: str, state: InvocationState) -> None:
183+
self._store[key] = state
184+
```
185+
186+
When no invocation key is supplied, the agent would use a default key tied to the instance. This would mean:
187+
188+
- Sequential calls accumulate conversation history, just like today.
189+
- A single agent instance with no invocation key behaves identically to the current implementation.
190+
- No code changes required for existing users.
191+
192+
### Concurrent usage with invocation keys
193+
194+
Users who want concurrency could supply distinct invocation keys:
195+
196+
```python
197+
import asyncio
198+
from strands import Agent
199+
200+
agent = Agent(system_prompt="You are a helpful assistant.")
201+
202+
async def main():
203+
results = await asyncio.gather(
204+
agent.invoke_async("Summarize the Python GIL", invocation_key="task-1"),
205+
agent.invoke_async("Summarize the Rust borrow checker", invocation_key="task-2"),
206+
)
207+
208+
asyncio.run(main())
209+
```
210+
211+
Each key would get its own isolated messages, agent state, metrics, and trace span. No lock contention, no `ConcurrencyException`.
212+
213+
### Graph could become simpler
214+
215+
With isolated state, graph nodes could reuse the same agent instance. The graph would pass a unique invocation key per node execution:
216+
217+
```python
218+
from strands import Agent
219+
from strands.multiagent.graph import GraphBuilder
220+
221+
summarizer = Agent(
222+
model=my_model,
223+
tools=[summarize_tool],
224+
system_prompt="You are a summarizer.",
225+
)
226+
227+
graph = GraphBuilder()
228+
# Same instance, different invocation keys per execution
229+
graph.add_node(summarizer, node_id="summarize_a")
230+
graph.add_node(summarizer, node_id="summarize_b")
231+
```
232+
233+
The `_validate_node_executor` duplicate-instance check would no longer be needed. `GraphNode.reset_executor_state` could be removed — each execution would start with a fresh invocation state loaded from the session manager. No more deep-copying initial state, no more manually resetting fields, and no risk of missing new stateful fields in the future.
234+
235+
## Developer Experience
236+
237+
### Basic usage (unchanged)
238+
239+
```python
240+
from strands import Agent
241+
242+
agent = Agent(system_prompt="You are a helpful assistant.")
243+
result = agent("Hello!") # Uses default invocation key
244+
result = agent("Follow up") # Same key, conversation continues
245+
```
246+
247+
### Concurrent usage
248+
249+
```python
250+
import asyncio
251+
from strands import Agent
252+
253+
agent = Agent(system_prompt="You are a helpful assistant.")
254+
255+
async def handle_request(user_id: str, message: str):
256+
return await agent.invoke_async(message, invocation_key=user_id)
257+
258+
async def main():
259+
results = await asyncio.gather(
260+
handle_request("user-1", "What is Python?"),
261+
handle_request("user-2", "What is Rust?"),
262+
)
263+
```
264+
265+
### State reset
266+
267+
Rather than reaching into agent internals:
268+
269+
```python
270+
# Today: manually reset individual fields
271+
agent.messages = []
272+
agent.state = AgentState()
273+
```
274+
275+
State could be cleared through the session manager:
276+
277+
```python
278+
# Proposed: clear state for a given invocation key
279+
await agent.session_manager.clear(invocation_key)
280+
```
281+
282+
## Consequences
283+
284+
### What could become easier
285+
286+
- Concurrent agent usage with a single instance
287+
- Resetting or clearing agent state without reaching into internals
288+
- Adding new stateful fields without updating reset logic in graph or other consumers
289+
- Serving multiple users/conversations from a single agent instance
290+
291+
### What could become harder or change
292+
293+
- Internal code that currently reads `self.messages` or `self.state` would need to be updated to read from the invocation state object
294+
- For example, hook callbacks that receive the agent and access `agent.messages` would need to be adapted
295+
- Session manager becomes a required concept (though a default in-memory implementation could make it invisible for simple use cases)
296+
- The `threading.Lock` and `CurrencyException` would be removed, which means users who relied on the exception as a signal would need to adapt
297+
298+
### Backwards compatibility is the biggest concern
299+
300+
Today, users directly read and write instance fields like `agent.messages` and `agent.state`. Moving these into an isolated invocation state object would break that public API surface. Community tools, custom hooks, and user code that accesses these fields would all need updating. Providing a smooth migration path — whether through proxy accessors, a compatibility layer, or clear deprecation — is the most significant challenge with this proposal.
301+
302+
Given the scope of this change, it may be worth considering this as part of a v2 of the Python SDK rather than attempting it as a backwards-compatible evolution of v1.

0 commit comments

Comments
 (0)