Skip to content

feat: implement high-order tier vector search, embedding backfill, an…#784

Open
Tanmay-008 wants to merge 1 commit into
rohitg00:mainfrom
Tanmay-008:feat/high-order-vector-search
Open

feat: implement high-order tier vector search, embedding backfill, an…#784
Tanmay-008 wants to merge 1 commit into
rohitg00:mainfrom
Tanmay-008:feat/high-order-vector-search

Conversation

@Tanmay-008
Copy link
Copy Markdown
Contributor

@Tanmay-008 Tanmay-008 commented Jun 2, 2026

This PR extends mem::smart-search to query the 4 high-order memory tiers (Semantic facts, Procedural skills, Crystals, and Insights) alongside existing observations and lessons. Previously, these tiers—produced by the consolidation pipeline—sat "dark" in the KV store.

Detailed Features & Changes

  1. Per-Tier In-Process BM25 Search: Implemented in-process BM25 scoring in src/functions/high-order-search.ts. High-order tiers are loaded in parallel and scored against search query tokens.
  2. Cross-Agent Leak Protection (Security Gate): Since high-order consolidated memory lacks per-agent boundaries, if filterAgentId is passed, high-order search is skipped entirely. This prevents cross-agent leaks.
  3. Cross-Tier expandIds Dispatch: Updated expandIds in src/functions/smart-search.ts to identify tier prefixes (sem_, proc_/skill_, crys_, ins_) and resolve them from the correct KV scope, unifying the output shape.
  4. Configuration Options: Added configuration checks (AGENTMEMORY_HIGH_ORDER_SEARCH and AGENTMEMORY_SMART_SEARCH_CONFIDENCE_FLOOR).
  5. Schema Updates for Embeddings: Added optional embedding (base64 Float32Array) and embeddingModel (version/model stamp) fields to SemanticMemory, ProceduralMemory, Crystal, and Insight interfaces in src/types.ts.
  6. Inline Embeddings (Write Path): Configured write operations to generate and attach embeddings inline using the active embedding model when new items are saved in the consolidation pipeline, reflection, and crystallization paths.
  7. Background Lazy Backfill Engine: Added mem::backfill-embeddings::high-order (src/functions/high-order-backfill.ts) background task and REST endpoint to scan and backfill missing or outdated embeddings. Wired smart-search to detect missing embeddings and trigger this backfill in the background without blocking the user response.
  8. Vector Scoring & Reciprocal Rank Fusion (RRF): Configured high-order-search to calculate vector cosine similarity using the active embedding model. Fuses BM25 and Vector scores using Reciprocal Rank Fusion (RRF) with a precision of 4 decimal places. Falls back gracefully to BM25-only on model mismatch while scheduling a re-embed task.

Fixes #770

Summary by CodeRabbit

  • New Features

    • High-order search functionality enabling semantic, procedural, crystallized, and insight-based retrieval with vector embedding support.
    • Vector embeddings automatically generated and stored for all memory types.
    • New backfill endpoint to update embeddings across memory tiers.
    • Smart-search API extended to return high-order memory results alongside observations.
  • Documentation

    • Updated to reflect 126 REST API endpoints.
  • Tests

    • Added comprehensive test suites for high-order search and integration scenarios.

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 2, 2026

@Tanmay-008 is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements searchable high-order memory tiers with vector embeddings, backfill capability, and multi-tier ranked retrieval. It adds embedding generation to consolidation, crystallization, and reflection pipelines; introduces a new searchHighOrderTiers function that ranks candidates via BM25-like term matching and RRF fusion; integrates high-order results into smart-search alongside traditional observations; and provides on-demand backfill via a new REST endpoint.

Changes

High-Order Tier Search Feature

Layer / File(s) Summary
Type definitions and configuration helpers
src/types.ts, src/config.ts, src/state/vector-index.ts
New HighOrderTier union and CompactHighOrderResult interface for high-order items; embedding metadata (embedding, embeddingModel) added to SemanticMemory, ProceduralMemory, Crystal, Insight; vector encoding helpers float32ToBase64/base64ToFloat32 exported.
High-order tier search and ranking
src/functions/high-order-search.ts
searchHighOrderTiers loads semantic/procedural/crystal/insight candidates from KV, applies confidence/project filters, scores via BM25-like term relevance and optional cosine similarity, fuses scores with Reciprocal Rank Fusion, and flags missing/mismatched embeddings for backfill.
Embedding generation in memory pipelines
src/functions/consolidation-pipeline.ts, src/functions/crystallize.ts, src/functions/reflect.ts
Consolidation embeds semantic facts and procedural skills; crystallization embeds narrative+lessons; reflection embeds insight title+content. All embedding failures logged as non-fatal warnings; processing continues when provider unavailable.
High-order embedding backfill registration
src/functions/high-order-backfill.ts
registerHighOrderBackfillFunction wires mem::backfill-embeddings::high-order handler that loads all four collections, filters for missing/mismatched embeddings, batches in groups of 20, embeds via provider, and persists embedding+model back to KV with per-collection result tallies.
Smart-search high-order retrieval and response extension
src/functions/smart-search.ts
expandIds routes semantic/procedural/crystal/insight prefixes to tier-specific KV reads; compact (query-based) mode conditionally calls searchHighOrderTiers and triggers async backfill on missing embeddings; both response paths include optional highOrder results.
API endpoint wiring and configuration boot
src/index.ts, src/triggers/api.ts, src/mcp/server.ts, src/mcp/tools-registry.ts
Whitelisted smart-search payload fields; new POST /agentmemory/backfill/high-order endpoint; MCP server passes includeHighOrder flag; tool schema updated; worker registers backfill function and logs high-order enablement state.
Documentation updates and test coverage
AGENTS.md, README.md, test/high-order-search.test.ts, test/smart-search.test.ts
Endpoint count incremented to 126; comprehensive suite for searchHighOrderTiers covering multi-tier matching, filtering, RRF ranking, and preview truncation; extended smart-search tests with config mocking and high-order integration scenarios.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • rohitg00/agentmemory#770: Proposes high-order tier indexing/search and embedding backfill, matching the core implementation in this PR.

Possibly related PRs

  • rohitg00/agentmemory#683: Modifies float32ToBase64/base64ToFloat32 in src/state/vector-index.ts for correct Float32Array base64 round-tripping with explicit byteOffset/byteLength handling.
  • rohitg00/agentmemory#654: Modifies mem::smart-search request/handler in src/functions/smart-search.ts for agent-scoped filtering and result handling.

Suggested reviewers

  • rohitg00

Poem

🐰 A rabbit hops through memory tiers so deep,
Embedding vectors where the insights sleep,
Four halls of knowledge—all searchable by score,
RRF fuses ranks, then backfill asks for more! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title is truncated and incomplete ('an…'), making it impossible to fully understand the feature being implemented. Provide the complete title without truncation so reviewers can clearly understand all key aspects of the feature.
Docstring Coverage ⚠️ Warning Docstring coverage is 4.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/state/vector-index.ts (1)

14-21: ⚠️ Potential issue | 🟠 Major | 💤 Low value

Validate base64ToFloat32 input to prevent silent embedding truncation

  • In src/state/vector-index.ts (base64ToFloat32, lines 14-21), new Float32Array(..., buf.byteLength / Float32Array.BYTES_PER_ELEMENT) will coerce a non-integer element count and can return a shorter array without throwing when buf.byteLength isn’t a multiple of 4.
  • Call sites in src/functions/high-order-search.ts decode s.embedding/p.embedding/c.embedding/i.embedding via base64ToFloat32(...) (around lines 80, 95, 111, 129) without visible guardrails around the decode.
  • Add explicit length validation (e.g., buf.byteLength % 4 === 0) and have the function/callers reject or skip invalid embeddings instead of proceeding with a truncated vector.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/state/vector-index.ts` around lines 14 - 21, The base64ToFloat32 function
can silently produce truncated Float32Arrays when the buffer length isn't a
multiple of 4; add an explicit validation in base64ToFloat32 (check
buf.byteLength % Float32Array.BYTES_PER_ELEMENT === 0) and have it throw or
return a clear failure (e.g., null or throw Error) instead of constructing a
possibly shortened array, and then update the call sites in high-order-search.ts
where s.embedding, p.embedding, c.embedding, and i.embedding are decoded to
detect that failure and skip or reject those invalid embeddings (handle the
returned null/exception path so invalid base64 embeddings do not proceed into
downstream vector logic).
src/mcp/server.ts (1)

264-273: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle boolean false for includeHighOrder.

Line 272 only disables on the literal string "false". If an MCP client sends JSON boolean false, this still evaluates to true, so the new opt-out cannot actually opt out for those callers.

Suggested fix
             const expandIds = parseCsvList(args.expandIds).slice(0, 20);
             const limit = Math.max(1, Math.min(100, asNumber(args.limit, 10) ?? 10));
+            if (
+              args.includeHighOrder !== undefined &&
+              args.includeHighOrder !== true &&
+              args.includeHighOrder !== false &&
+              args.includeHighOrder !== "true" &&
+              args.includeHighOrder !== "false"
+            ) {
+              return {
+                status_code: 400,
+                body: { error: "includeHighOrder must be a boolean or 'true'/'false'" },
+              };
+            }
+            const includeHighOrder =
+              args.includeHighOrder === undefined
+                ? true
+                : args.includeHighOrder === true || args.includeHighOrder === "true";
             const result = await sdk.trigger({
               function_id: "mem::smart-search",
               payload: {
                 query: args.query,
                 expandIds,
                 limit,
-                includeHighOrder: args.includeHighOrder !== "false",
+                includeHighOrder,
               },
             });

As per coding guidelines, src/mcp/server.ts: "MCP tool handler implementation must validate args with typeof checks".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mcp/server.ts` around lines 264 - 273, The includeHighOrder flag
currently only checks for the string "false" which misses JSON boolean false;
update the evaluation in the sdk.trigger payload (near function_id
"mem::smart-search" in src/mcp/server.ts) to validate the arg with typeof checks
and treat both the boolean false and the string "false" as opt-outs (e.g., set
includeHighOrder to false when args.includeHighOrder === false ||
args.includeHighOrder === "false", otherwise true). Ensure you reference and
adjust the use of args.includeHighOrder alongside parseCsvList and asNumber
handling so the handler follows the MCP args validation guideline.
🧹 Nitpick comments (2)
src/functions/smart-search.ts (1)

153-154: 💤 Low value

Simplify the defensive Array.isArray check.

searchHighOrderTiers always returns { results: CompactHighOrderResult[]; needsBackfill: boolean }, so the Array.isArray check is unnecessary. Consider simplifying:

-const highOrderResults = Array.isArray(highOrderResponse) ? highOrderResponse : highOrderResponse.results;
-const needsBackfill = Array.isArray(highOrderResponse) ? false : highOrderResponse.needsBackfill;
+const { results: highOrderResults, needsBackfill } = highOrderResponse;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/smart-search.ts` around lines 153 - 154, Remove the redundant
Array.isArray defensive checks around highOrderResponse since
searchHighOrderTiers always returns an object; replace the conditional
assignments with direct property access: set highOrderResults =
highOrderResponse.results and needsBackfill = highOrderResponse.needsBackfill.
Update any related assumptions in the function smart-search.ts (references:
highOrderResults, needsBackfill, searchHighOrderTiers, highOrderResponse) and
remove the unnecessary Array.isArray branches.
test/high-order-search.test.ts (1)

95-353: ⚡ Quick win

Consider adding test coverage for the needsBackfill flag.

The PR objectives highlight backfill triggering as a key feature ("smart-search can trigger backfill background tasks when embeddings are missing/outdated"), but the current tests don't verify the needsBackfill flag behavior. Consider adding tests for:

  1. needsBackfill: true when candidates have no embedding field.
  2. needsBackfill: true when candidates have embedding but embeddingModel differs from the active provider's name.
  3. needsBackfill: false when all candidates have embeddings matching the active provider model.

This would strengthen confidence that the backfill-triggering integration works as designed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/high-order-search.test.ts` around lines 95 - 353, Add unit tests
covering the needsBackfill flag for searchHighOrderTiers: create cases using
mockKV and helper factories
(makeSemantic/makeProcedural/makeCrystal/makeInsight) that assert needsBackfill
=== true when stored candidates lack an embedding, assert needsBackfill === true
when candidates have an embedding but embeddingModel !== active provider name,
and assert needsBackfill === false when all candidates include embeddings with
embeddingModel equal to the active provider name; reuse existing patterns in the
test file (kv := mockKV(), await kv.set(...), call searchHighOrderTiers(...))
and assert the returned object has the expected needsBackfill boolean alongside
result checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/high-order-backfill.ts`:
- Around line 11-121: The backfill function registerHighOrderBackfillFunction
updates multiple KV scopes but never creates an audit trail; update the function
to call recordAudit() after each successful scope backfill (or once after all
scopes complete) including scope name, count backfilled, embeddingModel
(ep.name) and timestamp so changes are recorded; specifically add calls to
recordAudit(...) after the semantic, procedural, crystals and insights loops (or
one consolidated recordAudit at the end using the results object) and ensure
errors still return without emitting a success audit.

In `@src/triggers/api.ts`:
- Around line 1110-1122: Validate and whitelist each expected field from
req.body before calling sdk.trigger: ensure query is a string, expandIds is an
array of strings (or accept a comma-separated string and split/coerce), limit is
a number (parse numeric strings to Number and ignore non-numeric), project and
agentId are strings, and includeLessons/includeHighOrder are booleans; only
include each key in the payload if it passes its type check (otherwise omit or
coerce safely) when constructing the object passed to sdk.trigger so you never
forward raw, unvalidated values from req.body.

---

Outside diff comments:
In `@src/mcp/server.ts`:
- Around line 264-273: The includeHighOrder flag currently only checks for the
string "false" which misses JSON boolean false; update the evaluation in the
sdk.trigger payload (near function_id "mem::smart-search" in src/mcp/server.ts)
to validate the arg with typeof checks and treat both the boolean false and the
string "false" as opt-outs (e.g., set includeHighOrder to false when
args.includeHighOrder === false || args.includeHighOrder === "false", otherwise
true). Ensure you reference and adjust the use of args.includeHighOrder
alongside parseCsvList and asNumber handling so the handler follows the MCP args
validation guideline.

In `@src/state/vector-index.ts`:
- Around line 14-21: The base64ToFloat32 function can silently produce truncated
Float32Arrays when the buffer length isn't a multiple of 4; add an explicit
validation in base64ToFloat32 (check buf.byteLength %
Float32Array.BYTES_PER_ELEMENT === 0) and have it throw or return a clear
failure (e.g., null or throw Error) instead of constructing a possibly shortened
array, and then update the call sites in high-order-search.ts where s.embedding,
p.embedding, c.embedding, and i.embedding are decoded to detect that failure and
skip or reject those invalid embeddings (handle the returned null/exception path
so invalid base64 embeddings do not proceed into downstream vector logic).

---

Nitpick comments:
In `@src/functions/smart-search.ts`:
- Around line 153-154: Remove the redundant Array.isArray defensive checks
around highOrderResponse since searchHighOrderTiers always returns an object;
replace the conditional assignments with direct property access: set
highOrderResults = highOrderResponse.results and needsBackfill =
highOrderResponse.needsBackfill. Update any related assumptions in the function
smart-search.ts (references: highOrderResults, needsBackfill,
searchHighOrderTiers, highOrderResponse) and remove the unnecessary
Array.isArray branches.

In `@test/high-order-search.test.ts`:
- Around line 95-353: Add unit tests covering the needsBackfill flag for
searchHighOrderTiers: create cases using mockKV and helper factories
(makeSemantic/makeProcedural/makeCrystal/makeInsight) that assert needsBackfill
=== true when stored candidates lack an embedding, assert needsBackfill === true
when candidates have an embedding but embeddingModel !== active provider name,
and assert needsBackfill === false when all candidates include embeddings with
embeddingModel equal to the active provider name; reuse existing patterns in the
test file (kv := mockKV(), await kv.set(...), call searchHighOrderTiers(...))
and assert the returned object has the expected needsBackfill boolean alongside
result checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a97f9561-fe90-423c-a829-1f09353c58b7

📥 Commits

Reviewing files that changed from the base of the PR and between de95403 and 363ffc3.

📒 Files selected for processing (17)
  • AGENTS.md
  • README.md
  • src/config.ts
  • src/functions/consolidation-pipeline.ts
  • src/functions/crystallize.ts
  • src/functions/high-order-backfill.ts
  • src/functions/high-order-search.ts
  • src/functions/reflect.ts
  • src/functions/smart-search.ts
  • src/index.ts
  • src/mcp/server.ts
  • src/mcp/tools-registry.ts
  • src/state/vector-index.ts
  • src/triggers/api.ts
  • src/types.ts
  • test/high-order-search.test.ts
  • test/smart-search.test.ts

Comment on lines +11 to +121
export function registerHighOrderBackfillFunction(sdk: ISdk, kv: StateKV): void {
sdk.registerFunction("mem::backfill-embeddings::high-order", async () => {
const ep = getEmbeddingProvider();
if (!ep) {
return { success: false, error: "No embedding provider available" };
}

const results = {
semantic: 0,
procedural: 0,
crystals: 0,
insights: 0,
};

try {
// 1. Semantic Facts
const semantics = await kv.list<SemanticMemory>(KV.semantic);
const semToUpdate = semantics.filter(
(s) => !s.embedding || s.embeddingModel !== ep.name
);
for (let i = 0; i < semToUpdate.length; i += BACKFILL_BATCH_SIZE) {
const batch = semToUpdate.slice(i, i + BACKFILL_BATCH_SIZE);
const texts = batch.map((s) => s.fact);
try {
const vectors = await ep.embedBatch(texts);
for (let j = 0; j < batch.length; j++) {
batch[j].embedding = float32ToBase64(vectors[j]);
batch[j].embeddingModel = ep.name;
await kv.set(KV.semantic, batch[j].id, batch[j]);
}
results.semantic += batch.length;
} catch (e) {
logger.warn("Semantic backfill batch failed", { error: String(e) });
}
}

// 2. Procedural Skills
const procedurals = await kv.list<ProceduralMemory>(KV.procedural);
const procToUpdate = procedurals.filter(
(p) => !p.embedding || p.embeddingModel !== ep.name
);
for (let i = 0; i < procToUpdate.length; i += BACKFILL_BATCH_SIZE) {
const batch = procToUpdate.slice(i, i + BACKFILL_BATCH_SIZE);
const texts = batch.map((p) => `${p.name} ${p.triggerCondition} ${p.steps.join(" ")}`);
try {
const vectors = await ep.embedBatch(texts);
for (let j = 0; j < batch.length; j++) {
batch[j].embedding = float32ToBase64(vectors[j]);
batch[j].embeddingModel = ep.name;
await kv.set(KV.procedural, batch[j].id, batch[j]);
}
results.procedural += batch.length;
} catch (e) {
logger.warn("Procedural backfill batch failed", { error: String(e) });
}
}

// 3. Crystals
const crystals = await kv.list<Crystal>(KV.crystals);
const crysToUpdate = crystals.filter(
(c) => !c.embedding || c.embeddingModel !== ep.name
);
for (let i = 0; i < crysToUpdate.length; i += BACKFILL_BATCH_SIZE) {
const batch = crysToUpdate.slice(i, i + BACKFILL_BATCH_SIZE);
const texts = batch.map((c) => `${c.narrative} ${c.lessons.join(" ")}`);
try {
const vectors = await ep.embedBatch(texts);
for (let j = 0; j < batch.length; j++) {
batch[j].embedding = float32ToBase64(vectors[j]);
batch[j].embeddingModel = ep.name;
await kv.set(KV.crystals, batch[j].id, batch[j]);
}
results.crystals += batch.length;
} catch (e) {
logger.warn("Crystal backfill batch failed", { error: String(e) });
}
}

// 4. Insights
const insights = await kv.list<Insight>(KV.insights);
const insToUpdate = insights.filter(
(ins) => !ins.deleted && (!ins.embedding || ins.embeddingModel !== ep.name)
);
for (let i = 0; i < insToUpdate.length; i += BACKFILL_BATCH_SIZE) {
const batch = insToUpdate.slice(i, i + BACKFILL_BATCH_SIZE);
const texts = batch.map((ins) => `${ins.title} ${ins.content}`);
try {
const vectors = await ep.embedBatch(texts);
for (let j = 0; j < batch.length; j++) {
batch[j].embedding = float32ToBase64(vectors[j]);
batch[j].embeddingModel = ep.name;
await kv.set(KV.insights, batch[j].id, batch[j]);
}
results.insights += batch.length;
} catch (e) {
logger.warn("Insight backfill batch failed", { error: String(e) });
}
}

const total = results.semantic + results.procedural + results.crystals + results.insights;
if (total > 0) {
logger.info("High-order embedding backfill complete", { backfilled: results });
}

return { success: true, backfilled: results };
} catch (err) {
const errorMsg = err instanceof Error ? err.message : String(err);
logger.error("High-order backfill encountered a fatal error", { error: errorMsg });
return { success: false, error: errorMsg };
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add an audit record for high-order backfills.

This function rewrites four KV scopes but never calls recordAudit(), so manual or scheduled backfills leave no audit trail.

Suggested fix
 import { float32ToBase64 } from "../state/vector-index.js";
 import { logger } from "../logger.js";
+import { recordAudit } from "./audit.js";
@@
       const total = results.semantic + results.procedural + results.crystals + results.insights;
       if (total > 0) {
         logger.info("High-order embedding backfill complete", { backfilled: results });
       }
+
+      await recordAudit(
+        kv,
+        "backfill_embeddings",
+        "mem::backfill-embeddings::high-order",
+        [],
+        { backfilled: results, embeddingModel: ep.name },
+      );
 
       return { success: true, backfilled: results };

As per coding guidelines, src/functions/**/*.ts: "Use recordAudit() for all state-changing operations".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/high-order-backfill.ts` around lines 11 - 121, The backfill
function registerHighOrderBackfillFunction updates multiple KV scopes but never
creates an audit trail; update the function to call recordAudit() after each
successful scope backfill (or once after all scopes complete) including scope
name, count backfilled, embeddingModel (ep.name) and timestamp so changes are
recorded; specifically add calls to recordAudit(...) after the semantic,
procedural, crystals and insights loops (or one consolidated recordAudit at the
end using the results object) and ensure errors still return without emitting a
success audit.

Comment thread src/triggers/api.ts
Comment on lines +1110 to +1122
const body = req.body as Record<string, unknown>;
const result = await sdk.trigger({
function_id: "mem::smart-search",
payload: {
...(body.query !== undefined && { query: body.query }),
...(body.expandIds !== undefined && { expandIds: body.expandIds }),
...(body.limit !== undefined && { limit: body.limit }),
...(body.project !== undefined && { project: body.project }),
...(body.includeLessons !== undefined && { includeLessons: body.includeLessons }),
...(body.includeHighOrder !== undefined && { includeHighOrder: body.includeHighOrder }),
...(body.agentId !== undefined && { agentId: body.agentId }),
},
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate smart-search field types before forwarding them.

This whitelist still forwards raw values. A request like { "query": {}, "expandIds": "sem_1", "limit": "10" } now passes the REST boundary unchanged and leaves mem::smart-search to reject or mis-handle it downstream.

As per coding guidelines, src/triggers/**/*.ts: "validate and whitelist fields from request body (never pass raw body to sdk.trigger())", and {src/mcp/server.ts,src/triggers/**/*.ts}: "Input validation must occur at system boundaries (MCP handlers, REST endpoints)".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/triggers/api.ts` around lines 1110 - 1122, Validate and whitelist each
expected field from req.body before calling sdk.trigger: ensure query is a
string, expandIds is an array of strings (or accept a comma-separated string and
split/coerce), limit is a number (parse numeric strings to Number and ignore
non-numeric), project and agentId are strings, and
includeLessons/includeHighOrder are booleans; only include each key in the
payload if it passes its type check (otherwise omit or coerce safely) when
constructing the object passed to sdk.trigger so you never forward raw,
unvalidated values from req.body.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(smart-search): index semantic facts, procedural skills, crystals, insights, facets

1 participant