Skip to content

Serve graph reads from side-indexes instead of full enumeration#893

Open
rohitg00 wants to merge 1 commit into
mainfrom
fix/graph-read-path-indexes
Open

Serve graph reads from side-indexes instead of full enumeration#893
rohitg00 wants to merge 1 commit into
mainfrom
fix/graph-read-path-indexes

Conversation

@rohitg00

@rohitg00 rohitg00 commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Closes #828.

state::list over the graph scopes blocks the worker event loop past ~25K nodes, so every read path that enumerated mem:graph:nodes and mem:graph:edges (searchByEntities, expandFromChunks, temporalQuery, graph-query query/startNodeId) was a worker-killer at scale. The earlier side-index work fixed the write paths and stats; this finishes the read paths.

New side-indexes in src/state/graph-indexes.ts: a 64-shard name catalog read as bounded gets with exact substring semantics, a per-node adjacency index bounding traversal by degree x depth (BFS capped at 5000 visited nodes with an explicit truncated warning), an obsId to nodeId index for chunk expansion, and a readiness marker. Marker absent means readers fall back to the previous enumeration, never silently empty. Writes are hints only; readers verify every hit against the live record (stale flag plus snapshot resetAt), so cascades and wipes need no index cleanup. The marker is set by boot backfill (gated on snapshot totalNodes <= 25K), graph-snapshot-rebuild, and graph-reset, which also clears the name shards so post-reset retrieval stops surfacing pre-reset rows.

Parity tests compare index and enumeration paths on identical graphs for all four read paths, plus fallback, post-rebuild maintenance, and post-reset cases.

Summary by CodeRabbit

  • New Features

    • Added graph read-side indexes to accelerate searches and traversals.
    • Graph queries now preferentially use indexes when available, with automatic fallback to enumeration.
  • Improvements

    • Graph operations (import, export, extraction, sync, snapshot restore) automatically maintain index consistency.
    • Entity searches and temporal queries now leverage indexed lookups for better performance.
    • Traversal operations are bounded to prevent excessive resource consumption.
  • Tests

    • Added parity tests validating indexed and enumeration-based graph operations produce equivalent results.

state::list over the graph scopes blocks the worker event loop past
~25K nodes, so every read path that enumerated mem:graph:nodes and
mem:graph:edges (searchByEntities, expandFromChunks, temporalQuery,
graph-query query/startNodeId) was a worker-killer at scale.

New side-indexes under src/state/graph-indexes.ts:
- mem:graph:name-shards: hash(nodeId) % 64 -> {id, name}[], the name
  catalog as 64 bounded gets with exact substring-match semantics
- mem:graph:adjacency: nodeId -> incident edgeId[], bounding traversal
  by degree x depth (graph-query BFS capped at 5000 visited nodes)
- mem:graph:obs-nodes: obsId -> nodeId[] for chunk expansion
- mem:graph:index-meta: readiness marker; absent means readers fall
  back to the previous enumeration, never silently empty

Writes are hints only; readers verify every hit against the live
record (stale flag plus snapshot resetAt), so cascades and wipes need
no index cleanup. Hints are written by graph-extract, temporal
extract, mesh LWW merges, import, and snapshot restore. The marker is
set by boot backfill (gated on snapshot totalNodes <= 25K), by
graph-snapshot-rebuild, and by graph-reset, which also clears the name
shards so post-reset retrieval stops surfacing pre-reset rows.

Property-value matching in graph-query is served from the snapshot
topNodes with an explicit coverage warning when the snapshot does not
cover the whole corpus. Start-node iteration in retrieval scoring is
now deterministic (sorted by node id) so index and enumeration paths
score identically.

Parity tests compare both paths on identical graphs for all four read
paths, plus fallback, post-rebuild maintenance, and post-reset cases.
@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agentmemory Ready Ready Preview, Comment Jun 10, 2026 10:49pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces a per-node edge index and sharded name catalog to eliminate full-corpus enumeration on graph query and traversal reads. Graph queries on corpora exceeding ~25K nodes previously crashed due to heartbeat timeouts during kv.list enumeration; the new indexes enable bounded, targeted lookups while maintaining index consistency across all write paths and backfilling indexes on startup for legacy data.

Changes

Graph Read-Side Index Infrastructure

Layer / File(s) Summary
KV Schema for Graph Indexes
src/state/schema.ts
Four new KV entries added: graphNameShards (shard-based name lookup), graphAdjacency (per-node edge lists), graphObsNodes (observation-to-node mapping), and graphIndexMeta (index readiness marker).
Graph Index Module
src/state/graph-indexes.ts
Core 292-line module implementing sharded name catalog with nameShardKey hashing, readiness markers (graphIndexesReady, markGraphIndexesReady), node write-path indexing (indexGraphNode with per-shard locking), observation linking (linkObservationsToNode), edge adjacency indexing (indexGraphEdge), and read helpers (loadNameCatalog, loadAdjacentEdgeIds, loadNodeIdsForObservations). Adds GraphIndexReader class with cached per-node/multi-node traversal (getNeighbors, getIncidentEdges), reset-aware record filtering (isLiveGraphRecord), and bulk backfill (backfillGraphIndexes with parallel batch writes).
Graph Retrieval Pluggable Neighbor Provider
src/functions/graph-retrieval.ts
Introduces NeighborProvider function type for async neighbor fetching; adds neighborsFromArrays to build local adjacency from node/edge arrays. Refactors searchByEntities, expandFromChunks, and temporalQuery to prefer index paths when available (via GraphIndexReader), deterministically order results by node id, and fall back to enumeration-based neighbor provider. Converts dijkstraTraversal from synchronous adjacency building to async per-node neighbor fetching via provider.
Query Routing & Extract/Reset/Rebuild Indexing
src/functions/graph.ts
Adds queryViaIndexes and traverseViaIndexes helpers for index-backed name-catalog queries and bounded BFS with visit cap. Routes mem::graph-query to these when graphIndexesReady, else falls back to enumeration-with-timeout. Updates mem::graph-extract to call linkObservationsToNode, indexGraphNode, indexGraphEdge instead of inline index writes. Replaces mem::graph-snapshot-rebuild backfill with backfillGraphIndexes call. Updates mem::graph-reset to clear name shards and mark indexes ready.
Write Path Index Maintenance
src/functions/export-import.ts, src/functions/mesh.ts, src/functions/snapshot.ts, src/functions/temporal-graph.ts
Consistent index maintenance across all persistence operations: export-import calls indexGraphNode/indexGraphEdge after node/edge writes; mesh.ts extends lwwMergeList with optional onWrite callback and applies it to call indexGraphNode/indexGraphEdge on writes; snapshot.ts calls indexGraphNode after restoring nodes; temporal-graph.ts calls linkObservationsToNode, indexGraphNode, indexGraphEdge during extraction.
Startup Index Backfill
src/index.ts
Adds guarded startup procedure: checks graphIndexesReady, reads GraphSnapshot.stats.totalNodes, enforces GRAPH_INDEX_NODE_CEILING, enumerates non-stale nodes/edges, and bulk-builds all indexes via backfillGraphIndexes; wrapped in try/catch that logs warning but does not abort startup.
Index Parity and Behavior Validation
test/graph-index-parity.test.ts
371-line Vitest suite validating parity between index-backed and enumeration paths for searchByEntities, expandFromChunks, temporalQuery, and graph-query traversal; verifies readiness-marker fallback behavior (enumeration when missing, no enumeration when present), side-index consistency after rebuild/extract, and that mem::graph-reset correctly filters pre-reset records via isLiveGraphRecord. Includes in-memory KV/SDK mocks and deterministic result sorting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • rohitg00/agentmemory#849: Modifies the same graph orchestration functions (mem::graph-snapshot-rebuild, mem::graph-reset) in src/functions/graph.ts to address unsafe kv.list patterns and reset behavior on large corpora.
  • rohitg00/agentmemory#816: Adds top-degree snapshot caching to the hot path (mem::graph-query/mem::graph-stats); this PR switches the same query/traversal paths to index-backed helpers as a complementary scaling mechanism.
  • rohitg00/agentmemory#698: Adds api::graph-build endpoint that triggers mem::graph-extract; the extraction logic updated here now maintains the new side indexes via the helper functions.

A rabbit hops through sharded catalogs, indexing each edge with care—
No more listing all the nodes, just the neighbors living there;
Twenty-five thousand nodes? Pfft, we bound our traversal with a cap,
Graph queries zoom now, no heartbeat collapse—index fixes the graph gap! 🐰🌳

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Serve graph reads from side-indexes instead of full enumeration' accurately summarizes the main objective of the PR, which is to replace full KV enumeration with indexed lookups for graph operations.
Linked Issues check ✅ Passed The PR fully implements the objectives from #828: it adds a per-node adjacency index (mem:graph:adjacency), extends name/token indexing via sharded name catalog (mem:graph:name-shards), maintains indexes incrementally during extract/reset/rebuild, provides graceful fallback to enumeration when indexes aren't ready, bounds traversal at 5000 visited nodes, and includes comprehensive parity tests across all read paths.
Out of Scope Changes check ✅ Passed All changes align with the linked issue #828 objectives. The PR introduces graph-indexes module, updates read paths (graph-retrieval, graph, temporal-graph), maintains indexes during writes (export-import, mesh, snapshot), adds boot backfill logic, and includes comprehensive parity tests—all directly scoped to eliminating full enumeration and implementing index-backed reads.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/graph-read-path-indexes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
src/state/graph-indexes.ts (1)

206-224: ⚡ Quick win

Fan out traversal reads in parallel.

getIncidentEdges() and getNeighbors() currently serialize one KV read per edge/node. On high-degree traversals that turns the indexed path into RTT-per-hop latency even though these reads are independent.

As per coding guidelines, Use parallel operations with Promise.all for independent KV writes/reads.

💡 Proposed refactor
   async getIncidentEdges(nodeId: string): Promise<GraphEdge[]> {
     const edgeIds = await loadAdjacentEdgeIds(this.kv, nodeId);
-    const edges: GraphEdge[] = [];
-    for (const edgeId of edgeIds) {
-      const edge = await this.getEdge(edgeId);
-      if (edge) edges.push(edge);
-    }
-    return edges;
+    const edges = await Promise.all(
+      edgeIds.map((edgeId) => this.getEdge(edgeId)),
+    );
+    return edges.filter((edge): edge is GraphEdge => edge !== null);
   }

   async getNeighbors(
     nodeId: string,
   ): Promise<Array<{ node: GraphNode; edge: GraphEdge }>> {
-    const neighbors: Array<{ node: GraphNode; edge: GraphEdge }> = [];
-    for (const edge of await this.getIncidentEdges(nodeId)) {
-      const neighborId =
-        edge.sourceNodeId === nodeId ? edge.targetNodeId : edge.sourceNodeId;
-      const node = await this.getNode(neighborId);
-      if (node) neighbors.push({ node, edge });
-    }
-    return neighbors;
+    const edges = await this.getIncidentEdges(nodeId);
+    const neighbors = await Promise.all(
+      edges.map(async (edge) => {
+        const neighborId =
+          edge.sourceNodeId === nodeId ? edge.targetNodeId : edge.sourceNodeId;
+        const node = await this.getNode(neighborId);
+        return node ? { node, edge } : null;
+      }),
+    );
+    return neighbors.filter(
+      (pair): pair is { node: GraphNode; edge: GraphEdge } => pair !== null,
+    );
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/state/graph-indexes.ts` around lines 206 - 224, getIncidentEdges and
getNeighbors currently perform sequential KV reads; change them to parallelize
independent fetches using Promise.all: in getIncidentEdges, map edgeIds to
this.getEdge(edgeId) and await Promise.all, then filter out falsy results before
returning; in getNeighbors, after obtaining incident edges, build an array of
neighborId values, map them to this.getNode(neighborId) (or map edges to
promises that fetch the node and keep the edge), await Promise.all, then pair
each successful node with its corresponding edge (filtering out missing nodes)
so you avoid serial awaits in getIncidentEdges, getEdge, and getNode.

Source: Coding guidelines

src/functions/graph-retrieval.ts (1)

137-139: ⚡ Quick win

Consider using code-point comparison for cross-environment determinism.

While localeCompare is deterministic within a single locale, code-point comparison is more deterministic across different deployment environments.

🔄 Deterministic comparison alternative
-    const orderedMatches = [...matchingNodes].sort((a, b) =>
-      a.id.localeCompare(b.id),
-    );
+    const orderedMatches = [...matchingNodes].sort((a, b) =>
+      a.id < b.id ? -1 : a.id > b.id ? 1 : 0,
+    );

Apply the same change at line 244 in scoreExpansion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/graph-retrieval.ts` around lines 137 - 139, Replace
locale-aware sorting using a.id.localeCompare(b.id) with a deterministic
code-point comparison: compare a.id and b.id using their Unicode code points
(e.g., a simple string comparison like (a.id > b.id) - (a.id < b.id) or iterate
code points) so sorting is consistent across environments; update the same
change in the scoreExpansion logic where sorting is done (the comparable sort of
matchingNodes/orderedMatches) to use the code-point comparison instead of
localeCompare.
src/index.ts (1)

518-523: ⚡ Quick win

Replace the “what”-style block comment with intention-revealing naming/logging.

As per coding guidelines, for src/**/*.ts: “Do not use code comments explaining WHAT — use clear naming instead.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/index.ts` around lines 518 - 523, The explanatory block comment about
backfilling graph read side-indexes should be replaced by clear intent expressed
in code: extract the logic into a well-named function (e.g.,
backfillGraphReadSideIndexes or ensureGraphReadSideIndexesBackfilled) and add a
concise runtime log message (via your existing logger) that states the action
and gating condition (e.g., "backfilling graph read-side indexes; gated by
snapshot node count until readiness marker present"). Remove the WHAT-style
comment, call the new function from the current location, and ensure any
variables like snapshotNodeCount, readinessMarker, and the fall-back behavior
for graph retrieval are named to convey their purpose.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/index.ts`:
- Around line 531-540: The current guard uses graphSnap.stats.totalNodes but
still calls kv.list(...) which enumerates the full namespaces (including stale
records) and can exceed the intended startup ceiling; change this to either (a)
gate on metadata that guarantees the number of KV entries to be scanned (e.g., a
namespace key-count or snapshot-scannedCount) before calling kv.list, or (b)
avoid full kv.list and pass a bounded/streaming source into backfillGraphIndexes
(replace kv.list<...> with a paginated/limited iterator or kv.listKeys with a
limit and resume token, and/or change backfillGraphIndexes to accept an async
iterator) so we never perform an unbounded enumeration when
graphSnap.stats.totalNodes > GRAPH_INDEX_NODE_CEILING; update call sites and
signatures for backfillGraphIndexes, and ensure filters for stale are applied
during the bounded scan rather than after full enumeration.

In `@src/state/graph-indexes.ts`:
- Around line 68-74: The observation-to-node side-index currently only adds
links and never removes stale entries, so update the write and read paths: in
the incremental update that uses withKeyedLock and writes into KV.graphObsNodes
ensure you compute the diff between the node's current sourceObservationIds and
the previous value and remove this nodeId from any obsId buckets that were
dropped (and add for new ones) before persisting with kv.set; during any bulk
rebuild ensure KV.graphObsNodes is cleared and rebuilt from authoritative
node.sourceObservationIds; and modify loadNodeIdsForObservations to filter
returned nodeIds by loading each node and verifying
node.sourceObservationIds.includes(obsId) before returning it so stale obs->node
links are ignored.
- Around line 52-58: indexGraphNode currently only inserts into the shard
catalog and never updates an existing NameCatalogEntry, so a node rename leaves
the old name in KV.graphNameShards; inside the withKeyedLock block where you
load entries (NameCatalogEntry[]), change the logic so you check for an existing
entry by id and if found but entry.name !== node.name, update that entry.name
and persist by calling kv.set(KV.graphNameShards, shard, entries); if not found,
push the new {id: node.id, name: node.name} and persist as before — ensure this
change is made in the indexGraphNode flow so updated names overwrite the old
catalog entry.

---

Nitpick comments:
In `@src/functions/graph-retrieval.ts`:
- Around line 137-139: Replace locale-aware sorting using
a.id.localeCompare(b.id) with a deterministic code-point comparison: compare
a.id and b.id using their Unicode code points (e.g., a simple string comparison
like (a.id > b.id) - (a.id < b.id) or iterate code points) so sorting is
consistent across environments; update the same change in the scoreExpansion
logic where sorting is done (the comparable sort of
matchingNodes/orderedMatches) to use the code-point comparison instead of
localeCompare.

In `@src/index.ts`:
- Around line 518-523: The explanatory block comment about backfilling graph
read side-indexes should be replaced by clear intent expressed in code: extract
the logic into a well-named function (e.g., backfillGraphReadSideIndexes or
ensureGraphReadSideIndexesBackfilled) and add a concise runtime log message (via
your existing logger) that states the action and gating condition (e.g.,
"backfilling graph read-side indexes; gated by snapshot node count until
readiness marker present"). Remove the WHAT-style comment, call the new function
from the current location, and ensure any variables like snapshotNodeCount,
readinessMarker, and the fall-back behavior for graph retrieval are named to
convey their purpose.

In `@src/state/graph-indexes.ts`:
- Around line 206-224: getIncidentEdges and getNeighbors currently perform
sequential KV reads; change them to parallelize independent fetches using
Promise.all: in getIncidentEdges, map edgeIds to this.getEdge(edgeId) and await
Promise.all, then filter out falsy results before returning; in getNeighbors,
after obtaining incident edges, build an array of neighborId values, map them to
this.getNode(neighborId) (or map edges to promises that fetch the node and keep
the edge), await Promise.all, then pair each successful node with its
corresponding edge (filtering out missing nodes) so you avoid serial awaits in
getIncidentEdges, getEdge, and getNode.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c5b58ca9-000f-41d6-b059-08e451121790

📥 Commits

Reviewing files that changed from the base of the PR and between 25e7701 and e2608e3.

📒 Files selected for processing (10)
  • src/functions/export-import.ts
  • src/functions/graph-retrieval.ts
  • src/functions/graph.ts
  • src/functions/mesh.ts
  • src/functions/snapshot.ts
  • src/functions/temporal-graph.ts
  • src/index.ts
  • src/state/graph-indexes.ts
  • src/state/schema.ts
  • test/graph-index-parity.test.ts

Comment thread src/index.ts
Comment on lines +531 to +540
if (graphSnap && totalNodes > 0 && totalNodes <= GRAPH_INDEX_NODE_CEILING) {
const [graphNodes, graphEdges] = await Promise.all([
kv.list<import("./types.js").GraphNode>(KV.graphNodes),
kv.list<import("./types.js").GraphEdge>(KV.graphEdges),
]);
await backfillGraphIndexes(
kv,
graphNodes.filter((n) => !n.stale),
graphEdges.filter((e) => !e.stale),
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Node-count gating does not reliably bound startup enumeration cost.

This guard uses GraphSnapshot.stats.totalNodes, but kv.list(KV.graphNodes) / kv.list(KV.graphEdges) still enumerate full namespaces before stale filtering. If stale graph records have accumulated, this can exceed the intended ceiling and reintroduce startup blocking risk.

Please gate on metadata that bounds actual scanned cardinality before listing, or use a bounded/streamed backfill source instead of full kv.list enumeration. Based on PR objectives and the graph-index backfill contract, this path is intended to prevent full-corpus read behavior during boot.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/index.ts` around lines 531 - 540, The current guard uses
graphSnap.stats.totalNodes but still calls kv.list(...) which enumerates the
full namespaces (including stale records) and can exceed the intended startup
ceiling; change this to either (a) gate on metadata that guarantees the number
of KV entries to be scanned (e.g., a namespace key-count or
snapshot-scannedCount) before calling kv.list, or (b) avoid full kv.list and
pass a bounded/streaming source into backfillGraphIndexes (replace kv.list<...>
with a paginated/limited iterator or kv.listKeys with a limit and resume token,
and/or change backfillGraphIndexes to accept an async iterator) so we never
perform an unbounded enumeration when graphSnap.stats.totalNodes >
GRAPH_INDEX_NODE_CEILING; update call sites and signatures for
backfillGraphIndexes, and ensure filters for stale are applied during the
bounded scan rather than after full enumeration.

Comment on lines +52 to +58
await withKeyedLock(`gidx:shard:${shard}`, async () => {
const entries =
(await kv.get<NameCatalogEntry[]>(KV.graphNameShards, shard)) ?? [];
if (!entries.some((e) => e.id === node.id)) {
entries.push({ id: node.id, name: node.name });
await kv.set(KV.graphNameShards, shard, entries);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Rewrite shard entries when an existing node's name changes.

indexGraphNode() is insert-only here. If a write overwrites an existing GraphNode with a new name, the old catalog value stays searchable and the new name never becomes discoverable until a full rebuild, which breaks the index/enumeration parity this PR is aiming for.

💡 Proposed fix
   await withKeyedLock(`gidx:shard:${shard}`, async () => {
     const entries =
       (await kv.get<NameCatalogEntry[]>(KV.graphNameShards, shard)) ?? [];
-    if (!entries.some((e) => e.id === node.id)) {
+    const existing = entries.find((e) => e.id === node.id);
+    if (!existing) {
       entries.push({ id: node.id, name: node.name });
       await kv.set(KV.graphNameShards, shard, entries);
+    } else if (existing.name !== node.name) {
+      existing.name = node.name;
+      await kv.set(KV.graphNameShards, shard, entries);
     }
   });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await withKeyedLock(`gidx:shard:${shard}`, async () => {
const entries =
(await kv.get<NameCatalogEntry[]>(KV.graphNameShards, shard)) ?? [];
if (!entries.some((e) => e.id === node.id)) {
entries.push({ id: node.id, name: node.name });
await kv.set(KV.graphNameShards, shard, entries);
}
await withKeyedLock(`gidx:shard:${shard}`, async () => {
const entries =
(await kv.get<NameCatalogEntry[]>(KV.graphNameShards, shard)) ?? [];
const existing = entries.find((e) => e.id === node.id);
if (!existing) {
entries.push({ id: node.id, name: node.name });
await kv.set(KV.graphNameShards, shard, entries);
} else if (existing.name !== node.name) {
existing.name = node.name;
await kv.set(KV.graphNameShards, shard, entries);
}
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/state/graph-indexes.ts` around lines 52 - 58, indexGraphNode currently
only inserts into the shard catalog and never updates an existing
NameCatalogEntry, so a node rename leaves the old name in KV.graphNameShards;
inside the withKeyedLock block where you load entries (NameCatalogEntry[]),
change the logic so you check for an existing entry by id and if found but
entry.name !== node.name, update that entry.name and persist by calling
kv.set(KV.graphNameShards, shard, entries); if not found, push the new {id:
node.id, name: node.name} and persist as before — ensure this change is made in
the indexGraphNode flow so updated names overwrite the old catalog entry.

Comment on lines +68 to +74
for (const obsId of obsIds ?? []) {
await withKeyedLock(`gidx:obs:${obsId}`, async () => {
const nodeIds = (await kv.get<string[]>(KV.graphObsNodes, obsId)) ?? [];
if (!nodeIds.includes(nodeId)) {
nodeIds.push(nodeId);
await kv.set(KV.graphObsNodes, obsId, nodeIds);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Observation side-index never forgets removed links.

Both the incremental path and the bulk rebuild path only add obsId -> nodeId links. After snapshot restore/rebuild or any overwrite that removes a sourceObservationId, KV.graphObsNodes can still point that old observation at a live node, and loadNodeIdsForObservations() does not re-validate membership before returning it. That can seed chunk expansion from observations that no longer reference the node.

This needs either old/new diffing on incremental writes plus scope cleanup during rebuild, or a read-side membership check against node.sourceObservationIds.

Also applies to: 243-248, 282-288

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/state/graph-indexes.ts` around lines 68 - 74, The observation-to-node
side-index currently only adds links and never removes stale entries, so update
the write and read paths: in the incremental update that uses withKeyedLock and
writes into KV.graphObsNodes ensure you compute the diff between the node's
current sourceObservationIds and the previous value and remove this nodeId from
any obsId buckets that were dropped (and add for new ones) before persisting
with kv.set; during any bulk rebuild ensure KV.graphObsNodes is cleared and
rebuilt from authoritative node.sourceObservationIds; and modify
loadNodeIdsForObservations to filter returned nodeIds by loading each node and
verifying node.sourceObservationIds.includes(obsId) before returning it so stale
obs->node links are ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant