Skip to content

feat(agents): claim-scoped write tokens#4287

Merged
KyleAMathews merged 8 commits intomainfrom
feat/claim-scoped-write-tokens
May 7, 2026
Merged

feat(agents): claim-scoped write tokens#4287
KyleAMathews merged 8 commits intomainfrom
feat/claim-scoped-write-tokens

Conversation

@KyleAMathews
Copy link
Copy Markdown
Contributor

@KyleAMathews KyleAMathews commented May 6, 2026

Summary

Replace static entity write tokens with ephemeral, claim-scoped tokens. Under the old model, the entity's permanent write_token was returned via x-write-token header on spawn and embedded in webhook notification payloads — any consumer that ever saw it retained permanent write access. Now, write tokens are issued when a consumer claims a wake and revoked when it sends done.

Approach

Token lifecycle:

  1. Consumer claims a wake via callback-forward → server generates a fresh randomUUID() token, stores it in an in-memory activeClaimWriteTokens map
  2. Token is returned in the claim response body → runtime uses it for all writes during the wake
  3. Consumer sends done → token is revoked from the map
  4. New claim for the same stream supersedes the old token (handles consumer rotation)

Key implementation detail: the callback-forward endpoint acts as a proxy between the consumer and the durable-streams server's /callback/{consumerId} endpoint. Claim requests must include the durable-streams claim token (notification.token) as a Bearer auth header. When the durable-streams server responds {ok: true}, the agents-server enriches the response with a claim-scoped write token.

Key invariants:

  • At most one active write token per stream (enforced by map keyed on stream path)
  • At most one active stream per consumer (bidirectional map cleanup)
  • Only the current claim holder can write; static entity tokens no longer authorize writes
  • autoClaim: true on IdempotentProducer ensures the producer participates in the claim protocol
  • Done from a stale consumer does not clobber a newer claim's entity status
  • Entity kill clears active claim state for the killed entity's stream

Non-goals:

  • TTL/expiry for orphaned claims (issuedAt is stored for future use but no sweep is implemented yet)
  • Re-enabling skipped conformance tests for the old write/set_tag/writeStateProtocol DSL actions (they used static tokens; adapting them to the claim flow is a follow-up)

Trade-offs:

  • In-memory map vs. database for active claims: chose in-memory for simplicity and performance since claims are short-lived (wake duration). Tradeoff: claims don't survive server restarts, but neither do active wake sessions.
  • Removed writeToken from webhook notification payloads entirely rather than deprecating: clean break since the runtime already uses the claim flow.

Verification

# New claim lifecycle tests
pnpm vitest run packages/agents-server/test/server-claim-write-token.test.ts

# Updated tests
pnpm vitest run packages/agents-server/test/electric-agents-manager-write-validation.test.ts
pnpm vitest run packages/agents-runtime/test/process-wake.test.ts
pnpm vitest run packages/agents-server/test/wake-registry.test.ts
pnpm vitest run packages/agents-server/test/scheduler-integration.test.ts

Files changed

File Change
agents-server/src/server.ts Claim token lifecycle: issue on claim, revoke on done, validate on write. Bidirectional map (activeClaimWriteTokens + activeClaimWriteTokensByConsumer) for one-to-one invariant. Done-clobber fix.
agents-server/src/electric-agents-manager.ts Pluggable writeTokenValidator via setWriteTokenValidator() dependency injection
agents-server/src/electric-agents-routes.ts Remove x-write-token from spawn response, clear claims on kill path
agents-runtime/src/process-wake.ts Add autoClaim: true to IdempotentProducer, simplify writeToken fallback
agents-runtime/src/types.ts Remove writeToken from WebhookNotification interface
agents-server/test/server-claim-write-token.test.ts New: 8 integration tests covering claim lifecycle, token revocation, done handling, kill cleanup, tag writes
agents-server/test/wake-registry.test.ts Refactor to use appendInternalEvent helper (bypass HTTP write auth)
agents-server/test/scheduler-integration.test.ts Use schedule API instead of direct stream writes
conformance-tests/src/electric-agents-tests.ts Adapt tag tests to claim flow: sequential tag updates uses callback-forward with auth, tag update on stopped entity expects 401, new spawn-token-absence assertions. Skip old write/set_tag tests. Remove dead code from property tests.
conformance-tests/src/electric-agents-dsl.ts Remove write, set_tag, writeStateProtocol from DSL action types
website/docs/.../programmatic-runtime-client.md Update setTag/removeTag docs to reference claim-scoped write tokens

🤖 Generated with Claude Code

Write tokens are now issued when a consumer claims a wake and revoked
on done. This prevents leaked credentials from granting permanent write
access. Removes writeToken from webhook notifications and spawn response
headers. Adds autoClaim to IdempotentProducer instances.

Includes fixes for done-clobbers-newer-claim race and kill-path cleanup
of stale claim state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 95.83333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.48%. Comparing base (6399147) to head (168b824).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-server/src/server.ts 93.93% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4287      +/-   ##
==========================================
+ Coverage   54.87%   55.48%   +0.60%     
==========================================
  Files         193      193              
  Lines       19567    19599      +32     
  Branches     5062     5065       +3     
==========================================
+ Hits        10737    10874     +137     
+ Misses       8828     8721     -107     
- Partials        2        4       +2     
Flag Coverage Δ
packages/agents 58.94% <ø> (ø)
packages/agents-runtime 80.22% <100.00%> (-0.01%) ⬇️
packages/agents-server 69.16% <95.55%> (+3.10%) ⬆️
packages/agents-server-ui 6.04% <ø> (ø)
packages/electric-ax 38.59% <ø> (ø)
typescript 55.48% <95.83%> (+0.60%) ⬆️
unit-tests 55.48% <95.83%> (+0.60%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 6, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 952b9e7
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/69fbcbc27215a0000828d580
😎 Deploy Preview https://deploy-preview-4287--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

KyleAMathews and others added 5 commits May 6, 2026 17:23
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These tests use ctx.currentWriteToken (now null) for tag operations.
They need to be adapted to the claim-scoped token flow in a follow-up.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- `tag update on stopped entity`: kill clears claims, so the correct
  response is 401 (no valid claim), not 409. Updated assertion.
- `sequential tag updates accumulate`: uses the claim flow (send message
  → expectWebhook → claim via callback-forward → get write token) to
  obtain a claim-scoped token before writing tags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The durable-streams callback endpoint requires a Bearer token for
authentication. Pass notification.parsed.token as the Authorization
header when claiming via callback-forward.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@balegas balegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invariant clarification

The "only the current claim holder can write" claim is technically not held — handleStreamAppend checks the token sync then forwards async, so an in-flight write from a superseded claim can still land. Runtime writes are epoch-fenced via IdempotentProducer, but direct HTTP writers (external clients, tag writes) only have the bearer check. Probably fine in practice (short window, rare supersession), just noting it.


TTL for orphaned claims

issuedAt is stored but unread. A crashed consumer wedges the stream until another claim, kill, or restart. Suggest a lazy check in isValidEntityWriteToken (or a small sweep) that evicts entries older than ~3× heartbeat. Cheap, and closes the orphan case.


Bug: done clears the token before updating status

if (stillOwnsClaim) clearActiveClaimForStream(...)
...
if (entity && stillOwnsClaim) await registry.updateStatus(entity.url, `idle`)

If updateStatus throws, the token is gone but the entity stays running — a retried done sees stillOwnsClaim === false and never sets idle. Swap the order: update status first, clear the token after.

KyleAMathews and others added 2 commits May 7, 2026 08:44
If updateStatus throws after the token is already cleared, a retried
done sees stillOwnsClaim === false and never transitions to idle. Fix by
updating status first, clearing the token only on success.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@KyleAMathews KyleAMathews merged commit 744c47f into main May 7, 2026
27 checks passed
@KyleAMathews KyleAMathews deleted the feat/claim-scoped-write-tokens branch May 7, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants