Serve graph reads from side-indexes instead of full enumeration#893
Serve graph reads from side-indexes instead of full enumeration#893rohitg00 wants to merge 1 commit into
Conversation
state::list over the graph scopes blocks the worker event loop past
~25K nodes, so every read path that enumerated mem:graph:nodes and
mem:graph:edges (searchByEntities, expandFromChunks, temporalQuery,
graph-query query/startNodeId) was a worker-killer at scale.
New side-indexes under src/state/graph-indexes.ts:
- mem:graph:name-shards: hash(nodeId) % 64 -> {id, name}[], the name
catalog as 64 bounded gets with exact substring-match semantics
- mem:graph:adjacency: nodeId -> incident edgeId[], bounding traversal
by degree x depth (graph-query BFS capped at 5000 visited nodes)
- mem:graph:obs-nodes: obsId -> nodeId[] for chunk expansion
- mem:graph:index-meta: readiness marker; absent means readers fall
back to the previous enumeration, never silently empty
Writes are hints only; readers verify every hit against the live
record (stale flag plus snapshot resetAt), so cascades and wipes need
no index cleanup. Hints are written by graph-extract, temporal
extract, mesh LWW merges, import, and snapshot restore. The marker is
set by boot backfill (gated on snapshot totalNodes <= 25K), by
graph-snapshot-rebuild, and by graph-reset, which also clears the name
shards so post-reset retrieval stops surfacing pre-reset rows.
Property-value matching in graph-query is served from the snapshot
topNodes with an explicit coverage warning when the snapshot does not
cover the whole corpus. Start-node iteration in retrieval scoring is
now deterministic (sorted by node id) so index and enumeration paths
score identically.
Parity tests compare both paths on identical graphs for all four read
paths, plus fallback, post-rebuild maintenance, and post-reset cases.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis PR introduces a per-node edge index and sharded name catalog to eliminate full-corpus enumeration on graph query and traversal reads. Graph queries on corpora exceeding ~25K nodes previously crashed due to heartbeat timeouts during ChangesGraph Read-Side Index Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/state/graph-indexes.ts (1)
206-224: ⚡ Quick winFan out traversal reads in parallel.
getIncidentEdges()andgetNeighbors()currently serialize one KV read per edge/node. On high-degree traversals that turns the indexed path into RTT-per-hop latency even though these reads are independent.As per coding guidelines,
Use parallel operations with Promise.all for independent KV writes/reads.💡 Proposed refactor
async getIncidentEdges(nodeId: string): Promise<GraphEdge[]> { const edgeIds = await loadAdjacentEdgeIds(this.kv, nodeId); - const edges: GraphEdge[] = []; - for (const edgeId of edgeIds) { - const edge = await this.getEdge(edgeId); - if (edge) edges.push(edge); - } - return edges; + const edges = await Promise.all( + edgeIds.map((edgeId) => this.getEdge(edgeId)), + ); + return edges.filter((edge): edge is GraphEdge => edge !== null); } async getNeighbors( nodeId: string, ): Promise<Array<{ node: GraphNode; edge: GraphEdge }>> { - const neighbors: Array<{ node: GraphNode; edge: GraphEdge }> = []; - for (const edge of await this.getIncidentEdges(nodeId)) { - const neighborId = - edge.sourceNodeId === nodeId ? edge.targetNodeId : edge.sourceNodeId; - const node = await this.getNode(neighborId); - if (node) neighbors.push({ node, edge }); - } - return neighbors; + const edges = await this.getIncidentEdges(nodeId); + const neighbors = await Promise.all( + edges.map(async (edge) => { + const neighborId = + edge.sourceNodeId === nodeId ? edge.targetNodeId : edge.sourceNodeId; + const node = await this.getNode(neighborId); + return node ? { node, edge } : null; + }), + ); + return neighbors.filter( + (pair): pair is { node: GraphNode; edge: GraphEdge } => pair !== null, + ); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/state/graph-indexes.ts` around lines 206 - 224, getIncidentEdges and getNeighbors currently perform sequential KV reads; change them to parallelize independent fetches using Promise.all: in getIncidentEdges, map edgeIds to this.getEdge(edgeId) and await Promise.all, then filter out falsy results before returning; in getNeighbors, after obtaining incident edges, build an array of neighborId values, map them to this.getNode(neighborId) (or map edges to promises that fetch the node and keep the edge), await Promise.all, then pair each successful node with its corresponding edge (filtering out missing nodes) so you avoid serial awaits in getIncidentEdges, getEdge, and getNode.Source: Coding guidelines
src/functions/graph-retrieval.ts (1)
137-139: ⚡ Quick winConsider using code-point comparison for cross-environment determinism.
While
localeCompareis deterministic within a single locale, code-point comparison is more deterministic across different deployment environments.🔄 Deterministic comparison alternative
- const orderedMatches = [...matchingNodes].sort((a, b) => - a.id.localeCompare(b.id), - ); + const orderedMatches = [...matchingNodes].sort((a, b) => + a.id < b.id ? -1 : a.id > b.id ? 1 : 0, + );Apply the same change at line 244 in
scoreExpansion.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/graph-retrieval.ts` around lines 137 - 139, Replace locale-aware sorting using a.id.localeCompare(b.id) with a deterministic code-point comparison: compare a.id and b.id using their Unicode code points (e.g., a simple string comparison like (a.id > b.id) - (a.id < b.id) or iterate code points) so sorting is consistent across environments; update the same change in the scoreExpansion logic where sorting is done (the comparable sort of matchingNodes/orderedMatches) to use the code-point comparison instead of localeCompare.src/index.ts (1)
518-523: ⚡ Quick winReplace the “what”-style block comment with intention-revealing naming/logging.
As per coding guidelines, for
src/**/*.ts: “Do not use code comments explaining WHAT — use clear naming instead.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/index.ts` around lines 518 - 523, The explanatory block comment about backfilling graph read side-indexes should be replaced by clear intent expressed in code: extract the logic into a well-named function (e.g., backfillGraphReadSideIndexes or ensureGraphReadSideIndexesBackfilled) and add a concise runtime log message (via your existing logger) that states the action and gating condition (e.g., "backfilling graph read-side indexes; gated by snapshot node count until readiness marker present"). Remove the WHAT-style comment, call the new function from the current location, and ensure any variables like snapshotNodeCount, readinessMarker, and the fall-back behavior for graph retrieval are named to convey their purpose.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/index.ts`:
- Around line 531-540: The current guard uses graphSnap.stats.totalNodes but
still calls kv.list(...) which enumerates the full namespaces (including stale
records) and can exceed the intended startup ceiling; change this to either (a)
gate on metadata that guarantees the number of KV entries to be scanned (e.g., a
namespace key-count or snapshot-scannedCount) before calling kv.list, or (b)
avoid full kv.list and pass a bounded/streaming source into backfillGraphIndexes
(replace kv.list<...> with a paginated/limited iterator or kv.listKeys with a
limit and resume token, and/or change backfillGraphIndexes to accept an async
iterator) so we never perform an unbounded enumeration when
graphSnap.stats.totalNodes > GRAPH_INDEX_NODE_CEILING; update call sites and
signatures for backfillGraphIndexes, and ensure filters for stale are applied
during the bounded scan rather than after full enumeration.
In `@src/state/graph-indexes.ts`:
- Around line 68-74: The observation-to-node side-index currently only adds
links and never removes stale entries, so update the write and read paths: in
the incremental update that uses withKeyedLock and writes into KV.graphObsNodes
ensure you compute the diff between the node's current sourceObservationIds and
the previous value and remove this nodeId from any obsId buckets that were
dropped (and add for new ones) before persisting with kv.set; during any bulk
rebuild ensure KV.graphObsNodes is cleared and rebuilt from authoritative
node.sourceObservationIds; and modify loadNodeIdsForObservations to filter
returned nodeIds by loading each node and verifying
node.sourceObservationIds.includes(obsId) before returning it so stale obs->node
links are ignored.
- Around line 52-58: indexGraphNode currently only inserts into the shard
catalog and never updates an existing NameCatalogEntry, so a node rename leaves
the old name in KV.graphNameShards; inside the withKeyedLock block where you
load entries (NameCatalogEntry[]), change the logic so you check for an existing
entry by id and if found but entry.name !== node.name, update that entry.name
and persist by calling kv.set(KV.graphNameShards, shard, entries); if not found,
push the new {id: node.id, name: node.name} and persist as before — ensure this
change is made in the indexGraphNode flow so updated names overwrite the old
catalog entry.
---
Nitpick comments:
In `@src/functions/graph-retrieval.ts`:
- Around line 137-139: Replace locale-aware sorting using
a.id.localeCompare(b.id) with a deterministic code-point comparison: compare
a.id and b.id using their Unicode code points (e.g., a simple string comparison
like (a.id > b.id) - (a.id < b.id) or iterate code points) so sorting is
consistent across environments; update the same change in the scoreExpansion
logic where sorting is done (the comparable sort of
matchingNodes/orderedMatches) to use the code-point comparison instead of
localeCompare.
In `@src/index.ts`:
- Around line 518-523: The explanatory block comment about backfilling graph
read side-indexes should be replaced by clear intent expressed in code: extract
the logic into a well-named function (e.g., backfillGraphReadSideIndexes or
ensureGraphReadSideIndexesBackfilled) and add a concise runtime log message (via
your existing logger) that states the action and gating condition (e.g.,
"backfilling graph read-side indexes; gated by snapshot node count until
readiness marker present"). Remove the WHAT-style comment, call the new function
from the current location, and ensure any variables like snapshotNodeCount,
readinessMarker, and the fall-back behavior for graph retrieval are named to
convey their purpose.
In `@src/state/graph-indexes.ts`:
- Around line 206-224: getIncidentEdges and getNeighbors currently perform
sequential KV reads; change them to parallelize independent fetches using
Promise.all: in getIncidentEdges, map edgeIds to this.getEdge(edgeId) and await
Promise.all, then filter out falsy results before returning; in getNeighbors,
after obtaining incident edges, build an array of neighborId values, map them to
this.getNode(neighborId) (or map edges to promises that fetch the node and keep
the edge), await Promise.all, then pair each successful node with its
corresponding edge (filtering out missing nodes) so you avoid serial awaits in
getIncidentEdges, getEdge, and getNode.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c5b58ca9-000f-41d6-b059-08e451121790
📒 Files selected for processing (10)
src/functions/export-import.tssrc/functions/graph-retrieval.tssrc/functions/graph.tssrc/functions/mesh.tssrc/functions/snapshot.tssrc/functions/temporal-graph.tssrc/index.tssrc/state/graph-indexes.tssrc/state/schema.tstest/graph-index-parity.test.ts
| if (graphSnap && totalNodes > 0 && totalNodes <= GRAPH_INDEX_NODE_CEILING) { | ||
| const [graphNodes, graphEdges] = await Promise.all([ | ||
| kv.list<import("./types.js").GraphNode>(KV.graphNodes), | ||
| kv.list<import("./types.js").GraphEdge>(KV.graphEdges), | ||
| ]); | ||
| await backfillGraphIndexes( | ||
| kv, | ||
| graphNodes.filter((n) => !n.stale), | ||
| graphEdges.filter((e) => !e.stale), | ||
| ); |
There was a problem hiding this comment.
Node-count gating does not reliably bound startup enumeration cost.
This guard uses GraphSnapshot.stats.totalNodes, but kv.list(KV.graphNodes) / kv.list(KV.graphEdges) still enumerate full namespaces before stale filtering. If stale graph records have accumulated, this can exceed the intended ceiling and reintroduce startup blocking risk.
Please gate on metadata that bounds actual scanned cardinality before listing, or use a bounded/streamed backfill source instead of full kv.list enumeration. Based on PR objectives and the graph-index backfill contract, this path is intended to prevent full-corpus read behavior during boot.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/index.ts` around lines 531 - 540, The current guard uses
graphSnap.stats.totalNodes but still calls kv.list(...) which enumerates the
full namespaces (including stale records) and can exceed the intended startup
ceiling; change this to either (a) gate on metadata that guarantees the number
of KV entries to be scanned (e.g., a namespace key-count or
snapshot-scannedCount) before calling kv.list, or (b) avoid full kv.list and
pass a bounded/streaming source into backfillGraphIndexes (replace kv.list<...>
with a paginated/limited iterator or kv.listKeys with a limit and resume token,
and/or change backfillGraphIndexes to accept an async iterator) so we never
perform an unbounded enumeration when graphSnap.stats.totalNodes >
GRAPH_INDEX_NODE_CEILING; update call sites and signatures for
backfillGraphIndexes, and ensure filters for stale are applied during the
bounded scan rather than after full enumeration.
| await withKeyedLock(`gidx:shard:${shard}`, async () => { | ||
| const entries = | ||
| (await kv.get<NameCatalogEntry[]>(KV.graphNameShards, shard)) ?? []; | ||
| if (!entries.some((e) => e.id === node.id)) { | ||
| entries.push({ id: node.id, name: node.name }); | ||
| await kv.set(KV.graphNameShards, shard, entries); | ||
| } |
There was a problem hiding this comment.
Rewrite shard entries when an existing node's name changes.
indexGraphNode() is insert-only here. If a write overwrites an existing GraphNode with a new name, the old catalog value stays searchable and the new name never becomes discoverable until a full rebuild, which breaks the index/enumeration parity this PR is aiming for.
💡 Proposed fix
await withKeyedLock(`gidx:shard:${shard}`, async () => {
const entries =
(await kv.get<NameCatalogEntry[]>(KV.graphNameShards, shard)) ?? [];
- if (!entries.some((e) => e.id === node.id)) {
+ const existing = entries.find((e) => e.id === node.id);
+ if (!existing) {
entries.push({ id: node.id, name: node.name });
await kv.set(KV.graphNameShards, shard, entries);
+ } else if (existing.name !== node.name) {
+ existing.name = node.name;
+ await kv.set(KV.graphNameShards, shard, entries);
}
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await withKeyedLock(`gidx:shard:${shard}`, async () => { | |
| const entries = | |
| (await kv.get<NameCatalogEntry[]>(KV.graphNameShards, shard)) ?? []; | |
| if (!entries.some((e) => e.id === node.id)) { | |
| entries.push({ id: node.id, name: node.name }); | |
| await kv.set(KV.graphNameShards, shard, entries); | |
| } | |
| await withKeyedLock(`gidx:shard:${shard}`, async () => { | |
| const entries = | |
| (await kv.get<NameCatalogEntry[]>(KV.graphNameShards, shard)) ?? []; | |
| const existing = entries.find((e) => e.id === node.id); | |
| if (!existing) { | |
| entries.push({ id: node.id, name: node.name }); | |
| await kv.set(KV.graphNameShards, shard, entries); | |
| } else if (existing.name !== node.name) { | |
| existing.name = node.name; | |
| await kv.set(KV.graphNameShards, shard, entries); | |
| } | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/state/graph-indexes.ts` around lines 52 - 58, indexGraphNode currently
only inserts into the shard catalog and never updates an existing
NameCatalogEntry, so a node rename leaves the old name in KV.graphNameShards;
inside the withKeyedLock block where you load entries (NameCatalogEntry[]),
change the logic so you check for an existing entry by id and if found but
entry.name !== node.name, update that entry.name and persist by calling
kv.set(KV.graphNameShards, shard, entries); if not found, push the new {id:
node.id, name: node.name} and persist as before — ensure this change is made in
the indexGraphNode flow so updated names overwrite the old catalog entry.
| for (const obsId of obsIds ?? []) { | ||
| await withKeyedLock(`gidx:obs:${obsId}`, async () => { | ||
| const nodeIds = (await kv.get<string[]>(KV.graphObsNodes, obsId)) ?? []; | ||
| if (!nodeIds.includes(nodeId)) { | ||
| nodeIds.push(nodeId); | ||
| await kv.set(KV.graphObsNodes, obsId, nodeIds); | ||
| } |
There was a problem hiding this comment.
Observation side-index never forgets removed links.
Both the incremental path and the bulk rebuild path only add obsId -> nodeId links. After snapshot restore/rebuild or any overwrite that removes a sourceObservationId, KV.graphObsNodes can still point that old observation at a live node, and loadNodeIdsForObservations() does not re-validate membership before returning it. That can seed chunk expansion from observations that no longer reference the node.
This needs either old/new diffing on incremental writes plus scope cleanup during rebuild, or a read-side membership check against node.sourceObservationIds.
Also applies to: 243-248, 282-288
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/state/graph-indexes.ts` around lines 68 - 74, The observation-to-node
side-index currently only adds links and never removes stale entries, so update
the write and read paths: in the incremental update that uses withKeyedLock and
writes into KV.graphObsNodes ensure you compute the diff between the node's
current sourceObservationIds and the previous value and remove this nodeId from
any obsId buckets that were dropped (and add for new ones) before persisting
with kv.set; during any bulk rebuild ensure KV.graphObsNodes is cleared and
rebuilt from authoritative node.sourceObservationIds; and modify
loadNodeIdsForObservations to filter returned nodeIds by loading each node and
verifying node.sourceObservationIds.includes(obsId) before returning it so stale
obs->node links are ignored.
Closes #828.
state::list over the graph scopes blocks the worker event loop past ~25K nodes, so every read path that enumerated mem:graph:nodes and mem:graph:edges (searchByEntities, expandFromChunks, temporalQuery, graph-query query/startNodeId) was a worker-killer at scale. The earlier side-index work fixed the write paths and stats; this finishes the read paths.
New side-indexes in src/state/graph-indexes.ts: a 64-shard name catalog read as bounded gets with exact substring semantics, a per-node adjacency index bounding traversal by degree x depth (BFS capped at 5000 visited nodes with an explicit truncated warning), an obsId to nodeId index for chunk expansion, and a readiness marker. Marker absent means readers fall back to the previous enumeration, never silently empty. Writes are hints only; readers verify every hit against the live record (stale flag plus snapshot resetAt), so cascades and wipes need no index cleanup. The marker is set by boot backfill (gated on snapshot totalNodes <= 25K), graph-snapshot-rebuild, and graph-reset, which also clears the name shards so post-reset retrieval stops surfacing pre-reset rows.
Parity tests compare index and enumeration paths on identical graphs for all four read paths, plus fallback, post-rebuild maintenance, and post-reset cases.
Summary by CodeRabbit
New Features
Improvements
Tests