feat(semantic): [alpha build] provider-aware typed embeddings, reranking, diagnostics, and eval harness by Zireael · Pull Request #87 · cortexkit/aft

Zireael · 2026-06-02T18:26:19Z

Summary

Semantic search in AFT moves from a minimal embedding-and-cosine prototype to a provider-capability-aware retrieval subsystem with typed vectors, optional reranking, background lifecycle management, diagnostics, and evaluation tooling. This is a public preview — the feature is functional and tested (~93 new tests) but expects iteration based on real-world feedback.

What changed

The upgrade touches the full semantic pipeline — config, indexing, retrieval, diagnostics, and observability — without breaking the default fastembed experience.

Typed vector representations

Vectors are no longer opaque f32 blobs. Every stored vector carries explicit type metadata (DenseF32, Int8SourceDecoded, BinaryPacked) and is paired with its source kind so the correct distance metric is selected automatically. Binary packed vectors use Hamming search (native bitwise XOR + popcount) instead of cosine, which is both faster and semantically correct for quantized embeddings. This unlocks Perplexity's base64_binary and base64_int8 output modes alongside standard dense providers.

Provider capability profiles

Each embedding backend (fastembed, OpenAI-compatible, Ollama, Perplexity) declares what it supports: output encoding, distance metric, dimension range, max batch size. The config layer validates combinations at configure time — you cannot accidentally request binary vectors through a cosine-only provider. Profiles also carry fingerprint fields so switching providers triggers a clean index rebuild rather than silent corruption.

Fingerprint-driven index lifecycle

A SemanticIndexFingerprint captures every dimension that affects index correctness: backend, model, base_url, dimension, chunking_version, output_encoding, storage_strategy, vector kinds, normalization, and prompt hashes. diff() classifies changes as Rebuild (structural — re-embed everything), ClearQueryCache (query prompts changed — invalidate cached results only), or None. This replaces the previous "delete and hope" invalidation with precise, explainable rebuild decisions.

Non-blocking cold start

Index builds run in a background thread with cooperative cancellation (SemanticCancellationToken via AtomicU64 generation counter). The build checks the generation before each embedding batch and exits early when a reconfigure arrives. Priority ordering ensures high-value files (recently edited, high PageRank) get embedded first. Exponential backoff handles transient provider failures without blocking the session.

Stale-vector pruning

When files are edited, deleted, moved, excluded, or re-included, the index tracks which vectors are stale and prunes them during the next refresh cycle. Every vector record carries file/chunk ownership metadata (file path, version, chunk hash, index fingerprint) so pruning is traceable and deterministic.

File policy and docs chunking

A configurable file policy controls which files enter the index (include globs, exclude globs, max file size, max chunk count). The docs chunker splits Markdown and documentation files into semantic sections before embedding, improving recall for documentation-shaped queries.

Reranking pipeline

Optional reranking via any OpenAI-compatible /v1/rerank or chat-completion endpoint. The pipeline sends initial retrieval candidates to a reranker, parses the response (supporting multiple JSON shapes), and reorders results with safe fallback — if the reranker fails, the original cosine-similarity order is returned unchanged. Config fields: rerank.enabled, rerank.model, rerank.base_url, rerank.api_key_env, rerank.max_candidates.

Search pipeline metrics and diagnostics

Every aft_search call records timing, cache hits/misses, result counts, and reranker fallback events. Metrics are exposed through the status command and through JSONL diagnostic logs for offline analysis. The DiagnosticsOutputMode config controls verbosity in tool output (compact | verbose | off).

Semantic doctor

semantic_doctor is a health-check command that reports config summary, index summary, metrics summary, provider summary, and actionable suggestions. Use it to verify that the index is healthy, the provider is reachable, and the configuration is consistent.

Semantic eval harness

semantic_eval runs a JSONL-defined evaluation suite against the semantic index. Each case specifies a query, expected paths, expected symbols, and top-k. The harness computes recall@k and MRR (Mean Reciprocal Rank) for quantifying retrieval quality across config changes.

Status integration

The status command now includes semantic health metrics: lifecycle state, entry count, dimension, total queries, cache hit ratio, average query time, and provider info. The OpenCode TUI sidebar surfaces these alongside the existing index state.

Config trust boundary

backend, base_url, and api_key_env are user-only fields — project-level aft.jsonc cannot inject these. A hostile repository cannot redirect embeddings at an attacker-controlled endpoint or exfiltrate API keys. The plugin logs a warning when it strips a project-level setting.

Contextualized document-chunk embedding (partial)

Initial support for Perplexity-style document/chunk grouped embedding — chunks from the same source document are batched together rather than flattened. Oversized document handling and retry logic are still in progress (see roadmap).

How to test

Default fastembed (zero-config)

# Enable semantic search in your AFT config
# ~/.config/opencode/aft.jsonc or ~/.pi/agent/aft.jsonc:
{ "semantic_search": true }

# Start a session — index builds in background
# Run aft_search with a concept query:
aft_search({ "query": "authentication middleware" })

Verify: results appear with source: semantic or source: hybrid tags. Status shows [index: ready] after build completes.

Provider switching

// Switch to OpenAI-compatible
{
  "semantic_search": true,
  "semantic": {
    "backend": "openai_compatible",
    "model": "text-embedding-3-small",
    "base_url": "https://api.openai.com/v1",
    "api_key_env": "OPENAI_API_KEY"
  }
}

Verify: index rebuilds automatically on next session start. Status shows new provider/model.

Reranking

{
  "semantic_search": true,
  "semantic": {
    "backend": "openai_compatible",
    "model": "text-embedding-3-small",
    "base_url": "https://api.openai.com/v1",
    "api_key_env": "OPENAI_API_KEY"
  },
  "rerank": {
    "enabled": true,
    "model": "rerank-english-v3.0",
    "base_url": "https://api.cohere.com",
    "api_key_env": "COHERE_API_KEY"
  }
}

Verify: search results show reranker-sorted order. Disable reranker — results fall back to cosine order.

Semantic doctor

aft_search({ "query": "test" })  # trigger index build if cold
# Then check health via status command or semantic_doctor

Verify: health report shows ConfigSummary, IndexSummary, MetricsSummary, ProviderSummary.

Eval harness

// Create eval-cases.jsonl:
{"query": "authentication handler", "expected_paths": ["src/auth/middleware.ts"], "expected_symbols": ["authMiddleware"], "top_k": 10}
{"query": "database connection", "expected_paths": ["src/db/pool.ts"], "expected_symbols": ["createPool"], "top_k": 10}

Verify: returns recall@k and MRR scores.

Test coverage

~93 tests across 8 test sub-tasks covering:

Config parsing and backward compatibility
Fingerprint diff matrix (all field combinations → Rebuild/ClearQueryCache/None)
File policy, docs chunking, and manifest handling
VectorStore trait with DenseF32 and BinaryPacked implementations
Binary packed-vector storage and Hamming search
Lifecycle states, snapshots, and stale-vector pruning
Search pipeline metrics, diagnostics, and DiagnosticsOutputMode
Concurrency, race conditions, and cancellation token behavior
Security trust boundary enforcement (project config stripping)
Semantic doctor health report
Semantic eval harness (JSONL parsing, scoring, recall/MRR)
Reranking pipeline (parse multiple JSON shapes, fallback on failure)

Roadmap

Still in progress or planned for follow-up:

aft-t6p.23: Complete contextualized document-chunk embedding (oversized docs, retry logic) — partially implemented
aft-t6p.2.2: Configurable snippet truncation in reranking (currently hardcoded at 200 chars)
aft-t6p.18: End-to-end verification across all backends
aft-t6p.5: Configuration and operations documentation
Performance benchmarking suite
Migration tooling for index format upgrades

Architecture notes

Key new modules:

crates/aft/src/semantic_rerank.rs — reranking pipeline with safe fallback
crates/aft/src/semantic_diagnostics.rs — JSONL diagnostic logging
crates/aft/src/semantic_doctor.rs — health-check report generation
crates/aft/src/semantic_eval.rs — evaluation harness (JSONL parser, scoring)
crates/aft/src/vector_store.rs — VectorStore trait with DenseF32 and BinaryPacked implementations
crates/aft/src/commands/semantic_doctor.rs — doctor command handler
crates/aft/src/commands/semantic_eval.rs — eval command handler

Modified significantly:

crates/aft/src/semantic_index.rs — lifecycle management, fingerprint-driven invalidation, non-blocking build, stale pruning, typed vectors
crates/aft/src/config.rs — provider profiles, rerank config, trust boundary fields
crates/aft/src/commands/status.rs — semantic health metrics
crates/aft/src/commands/semantic_search.rs — reranking integration, diagnostics output mode

^{Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.}

Summary by cubic

Upgrades semantic search to a provider-aware pipeline with typed vectors, reranking, contextualized document-chunk embedding, partial-ready querying, and built-in diagnostics/eval. Adds Perplexity support and Hamming search for binary/int8, and hardens lifecycle, metrics, and config.

New Features
- Provider profiles and typed vectors (f32, int8, binary packed) with auto metric selection; enables Perplexity base64_binary/base64_int8.
- Contextualized document-chunk embedding with oversized-document splitting, retry/backoff, and build diagnostics.
- Fingerprint-driven lifecycle with background builds, partial-ready state, and precise stale‑vector pruning.
- Optional reranking via OpenAI-compatible endpoints with safe fallback.
- Metrics/diagnostics: JSONL logs with retention and configurable verbosity; status, semantic_doctor, and semantic_eval surface semantic health.
Bug Fixes
- Cosine similarity guards zero-norm vectors and clamps scores to [-1, 1].
- Reranker: robust fence stripping, out-of-bounds index warning, duplicate index prevention, and max_candidate_chars support.
- Search no longer overwrites lifecycle to Ready; validates non-empty queries; cancellation token uses acquire/release ordering.
- Chunking fixes: correct end_line for final splits and accurate oversized-document counters.
- Config parsing accepts all semantic fields (dimensions, encoding, storage strategy, input mode, metric, prompts, diagnostics, rerank limits) and now also reads jsonl_logging, jsonl_path, include_raw_queries, include_snippets, retention_days, and metrics_window_size; TypeScript enums aligned with Rust.

^{Written for commit d204e2d. Summary will update on new commits.}

Greptile Summary

This PR upgrades the semantic search subsystem from a minimal prototype to a full provider-aware retrieval pipeline, introducing typed vectors, fingerprint-driven index lifecycle management, optional reranking, background build with cooperative cancellation, and diagnostic tooling.

Core pipeline (semantic_index.rs, vector_store.rs): adds VectorStore abstraction, EmbeddingModelProfile for provider capability validation, SemanticIndexFingerprint with diff() for precise rebuild decisions, and stale-vector pruning. Cancellation token uses correct Acquire/Release ordering; cosine_similarity guards zero-norm vectors and clamps output.
Reranking (semantic_rerank.rs, semantic_search.rs): optional OpenAI-compatible reranker with safe fallback and markdown-fence stripping. A field/method naming collision on diagnostics_enabled silently disables JSONL logging when jsonl_logging: true is set without diagnostics_enabled: true.
TypeScript config (config.ts): new enum schemas and trust-boundary mergeSemanticConfig; rerank and diagnostics fields are absent from SemanticConfigSchema and stripped by Zod before reaching Rust.

Confidence Score: 3/5

Safe to merge with the diagnostics_enabled field/method fix applied — without it, any user who enables JSONL logging alone gets silence.

The search handler reads the raw bool field instead of the diagnostics_enabled() method that unifies diagnostics_enabled || jsonl_logging, silently breaking JSONL-only logging configs.

crates/aft/src/commands/semantic_search.rs (line 51 field vs. method) and packages/opencode-plugin/src/config.ts (rerank/diagnostics fields absent from SemanticConfigSchema).

Important Files Changed

Filename	Overview
crates/aft/src/commands/semantic_search.rs	Rewrites search handler to add reranking, diagnostics, and partial-index handling. Field access instead of method call on `diagnostics_enabled` silently disables JSONL logging when only `jsonl_logging: true` is set.
crates/aft/src/semantic_rerank.rs	New reranking pipeline with safe fallback. Markdown-fence stripper has an edge case for single-line format that returns empty string, causing a graceful but silent degradation.
crates/aft/src/config.rs	Adds provider enums, SemanticBackendConfig with rerank/diagnostics fields, and a `diagnostics_enabled()` method that ORs the field with `jsonl_logging` — but callers bypass this method and read the field directly.
packages/opencode-plugin/src/config.ts	Adds SemanticConfigSchema with new enum types and project-level trust boundary enforcement. Missing rerank/diagnostics fields from the schema (silently stripped by Zod) and minor warning message formatting inconsistency.
crates/aft/src/vector_store.rs	New VectorStore trait + FlatF32 and FlatBinaryHamming implementations. Search, upsert, prune, and orphan-removal logic is correct and well-tested.
crates/aft/src/semantic_index.rs	Major expansion adding typed vectors, fingerprint-driven lifecycle, background build with cooperative cancellation, stale-vector pruning, and provider profiles. Cosine similarity correctly guards zero-norm vectors and clamps output.
crates/aft/src/semantic_diagnostics.rs	JSONL diagnostics logger and in-memory metrics. Correct use of retention policy and configurable verbosity. No issues found.
crates/aft/src/context.rs	Adds SemanticCancellationToken (AtomicU64 with correct Acquire/Release ordering) and lazy JSONL logger initialization. Logic in `init_diagnostics_logger` is correct, gated on `jsonl_logging` field.

Comments Outside Diff (1)

packages/opencode-plugin/src/config.ts, line 37-54 (link)

TypeScript enum values don't match the Rust serde strings — config will fail to deserialize

Several new enum schemas use values that don't align with the Rust serde representation:
- SemanticOutputEncodingEnum allows "binary", "ubinary", "int8", "uint8" but Rust OutputEncoding deserializes from "base64_binary" and "base64_int8".
- SemanticStorageStrategyEnum allows "flat" and "binary_pack" but Rust StorageStrategy expects "native_f32" and "binary_packed".
- SemanticInputModeEnum includes "chunk_extracts" and "contextualized" but Rust InputMode only has "flat_texts" and "document_chunks".
- SemanticDistanceMetricEnum uses "dot" but Rust DistanceMetric expects "dot_product".
- SemanticBackendEnum is missing the new "perplexity" variant added to Rust.
A user who follows the TypeScript autocomplete and picks output_encoding: "int8" will pass TypeScript validation but receive a deserialization error (or silent fallback to default) from the Rust binary at runtime.

_{Reviews (5): Last reviewed commit: "fix(configure): add missing JSONL/metric..." | Re-trigger Greptile}

Add scripts, docs, Dockerfile, and package.json scripts for Docker-based Rust validation (fmt/check/clippy/test) so Windows users without MSVC Build Tools can still validate Rust code. - scripts/docker-rust.ps1: PowerShell script supporting fmt/check/clippy/ test/validate/shell tasks with persistent Docker volumes - Dockerfile.rust: minimal Rust image with rustfmt + clippy pre-installed - docs/docker-rust-validation.md: full usage and design documentation - package.json: 6 new docker:rust:* convenience scripts Design: Linux-target validation via rust:1-bookworm, persistent cargo volumes for caching, fail-fast sequential validation.

…rough, fingerprint upgrade

…ng fields, tests

…or pruning, write-lock sync

…ark data

…pgrade, invalidation tests

- SemanticFilePolicy config struct with include_code/include_docs/ include_configs/binary_detection/generated_file_detection/globs - parse_semantic_files_config handler in configure.rs - File policy evaluation: should_index_file(), is_generated_file(), is_config_file(), is_docs_file() - Docs chunker: collect_docs_chunks() with heading-based splitting for markdown, splitting by file for other doc types - collect_chunks routes doc files through docs chunker, skips binary/generated/config files per policy - SemanticIndexFingerprint extended with file_policy_hash and docs_chunker_version; diff() triggers rebuild on policy change - build_with_progress/refresh_stale_files accept &SemanticFilePolicy - compute_file_policy_hash() deterministic hash of policy fields - Re-export SemanticFilePolicy from semantic_index module - All test callers updated with &SemanticFilePolicy::default()

…iority ordering, backoff - CancellationToken (Arc<AtomicU64> generation counter) for cooperative build cancellation on reconfigure - Cancel old semantic index builds instead of detaching when config changes - Priority file ordering: README/docs first, then core source, then tests, then rest - Embedding backoff: exponential retry with jitter for remote provider rate limits - SemanticIndexStatus::Partial variant with completeness percentage for partial builds - Search reports partial index state during cold start - Phase-boundary cancellation checks between model init, disk read, incremental refresh, and full rebuild

Add Perplexity backend with InputMode::DocumentChunks support for contextualized embedding where chunks carry document-level context. - SemanticBackend::Perplexity variant with config, profile, engine - DocumentChunks/PerDocumentChunks/DocumentEmbeddings structs - embed_document_chunks() routes Perplexity to grouped embedding API - build_with_progress_contextualized() groups chunks by document - Wire configure.rs to branch on input_mode: DocumentChunks - SemanticEmbeddingModel::input_mode() public accessor - EmbeddingModelProfile with contextualized_supported guard - Response validation: index continuity, missing documents, dimension

…to trait-backed module Bead: aft-t6p.12 Extracts Vec<EmbeddingEntry> storage and search from SemanticIndexSnapshot into a VectorStore trait with FlatF32VectorStore implementation. This decouples the storage layer from the lifecycle logic and prepares for alternative backends (binary Hamming, approximate ANN). Key changes: - vector_store.rs: VectorStore trait + ScoredChunk/PruneStats types - FlatF32VectorStore: flat scan with cosine similarity (preserves existing behaviour exactly) - FlatBinaryHammingVectorStore: forward-looking Hamming-search impl - SemanticIndexSnapshot delegates search/len/prune/entries to store - Fixed dimension-sync bug where set_dimension updated the snapshot dimension but not the store dimension, causing search to return 0 - EmbeddingEntry and IndexedFileMetadata made pub for trait compatibility

On Windows, use copyFileSync for the binary replacement (which overwrites the target — renameSync fails with EEXIST). If it fails, the original binary at binaryPath is preserved. The temp file cleanup is now wrapped in its own try/catch so a cleanup failure does NOT propagate as a download failure — the binary was already successfully placed at binaryPath. Addresses PR cortexkit#69 cubic review finding P2.

Implement bead aft-t6p.24: file identity manifest + vector ownership records. Changes: - **FileRecord struct**: identity record with content_hash, size_bytes, mtime, language, document_kind, inclusion_policy_hash, indexed_at - **file_manifest on SemanticIndexSnapshot**: HashMap<PathBuf, FileRecord> tracking which files produced which vectors, enabling precise stale-vector pruning when files are edited, deleted, or excluded - **V8 serialization format**: extends V7 with per-entry chunk_hash (after each vector) and file manifest block (after all entry vectors). Full backward compatibility with V1-V7 reads. - **chunk_hash on EmbeddingEntry**: deterministic hash of chunk content fields for tracing which version of a chunk produced a stored vector - **compute_chunk_hash**: blake3-based deterministic hash - **build_manifest_from_store helper**: populates file_manifest from store's file_metadata, called in all builder functions (build_from_chunks, build_with_progress_contextualized, refresh_stale_files) and from_bytes for V1-V7 cache migration - **next_chunk_id, fingerprint_string**: forward-looking fields on snapshot for future unique ID assignment and fingerprint tracking

…rmalization, and model profiles Adds aft-t6p.20 (Typed embedding vector representation + storage-strategy resolution): - TypedVector (source-side) and StoredVector (persisted) enums with DenseF32, DenseInt8, BinaryPacked, and Quantized variants - StorageStrategy (NativeF32, DecodeNormalizeF32, BinaryPacked) - VectorKind enum for runtime type tagging - DistanceMetric (Cosine, DotProduct, Euclidean, Hamming) - NormalizationPolicy (AlreadyNormalized, NormalizeOnInsertQuery, NotApplicable) - EmbeddingModelProfile fields: source_vector_kind, stored_vector_kind, metric, normalization - convert_vector() / validate_compatible() on EmbeddingModelProfile - blake3 dependency for chunk hashing

… + dummy base_url for Perplexity profile test Two fixes for `fingerprint_invalidation_tests`: - Mock HTTP server now lowercases header names before matching Content-Length (reqwest/hyper sends lowercase `content-length:`). - `base64_int8_profile_from_config_selects_correctly` test provides a dummy `base_url` for the Perplexity backend (required by `from_config`). Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

- Add StorageStrategy::BinaryPacked variant for packed-bit vector storage - Add EmbeddingModelProfile::perplexity_binary() with BinaryPacked → Hamming path - Wire from_config to select perplexity_binary profile when Base64Binary encoding - Implement parse_embedding_value for Base64Binary (decode → 0.0/1.0 f32 vec) - Implement into_stored for TypedVector::BinaryPacked (requires BinaryPacked strategy) - Update validate_config and validate_compatible to accept Base64Binary+BinaryPacked - Replace old "not yet supported" test with parse_embedding_value_base64_binary_succeeds - 886/893 tests pass (7 pre-existing Docker failures) Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

Add semantic_diagnostics module with SearchDiagnostics, SearchPipelineType, SearchWarning, SearchMetricsCollector, PhaseTimer, score_statistics, top1_margin. Instrument handle_semantic_search with per-phase timing and warning collection. Wire SearchMetricsCollector into AppContext. 17 new tests, 902/910 lib tests pass (8 pre-existing Docker failures). Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

- Add SemanticDiagnosticsLogger with file append, rotation (50 MB), and retention cleanup (file-deletion based on mtime) - Add SearchDiagnosticsEvent struct for JSONL serialization with raw_query redaction (opt-in via include_raw_queries) and snippet placeholder (include_snippets) - Add config fields: jsonl_logging, jsonl_path, include_raw_queries, include_snippets, retention_days to SemanticBackendConfig - Add lazy-init diagnostics_logger on AppContext with resolve_diagnostics_log_path helper (env var → project root → ~/.cache) - Wire JSONL record into handle_semantic_search diagnostics block - 4 new tests: raw query redaction, raw query inclusion, disk write verification, missing-file recovery - 907/914 lib tests pass (7 pre-existing Docker failures) Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

…rch output Add DiagnosticsOutputMode enum (Off/Minimal/Verbose) and output_mode field to SemanticBackendConfig. Implement format_diagnostics_prefix() for Minimal (warnings only) and Verbose (scores + latency + warnings) output modes. Wire into handle_semantic_search response text. 4 new tests, 25 diagnostics tests total. 910/918 lib tests pass (8 pre-existing Docker failures). Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

Add optional reranking via OpenAI-compatible chat endpoint. When enabled, aft_search overfetches candidates, sends them to a reranker model, and re-sorts by relevance. Falls back gracefully on any error. - Add RerankConfig fields to SemanticBackendConfig (rerank_enabled, rerank_model, rerank_base_url, rerank_api_key_env, rerank_timeout_ms, rerank_max_candidates) - Create semantic_rerank.rs with RerankerClient, RerankOutcome enum, and rerank_candidates function - Add RerankerFailure warning variant to SearchWarning - Wire reranking into handle_semantic_search (overfetch → rerank → re-sort) - Add rerank_latency_ms to SearchDiagnostics and SearchDiagnosticsEvent - Include rerank latency in verbose diagnostics output - 6 unit tests for reranker parsing, skip conditions, and failure handling All 25 diagnostics + 6 reranker tests pass. 917/924 total tests pass (7 pre-existing Docker infrastructure failures).

Add 40+ unit tests to fingerprint_invalidation_tests covering: - SemanticBackendConfig deserialization (minimal, all-fields, defaults) - EmbeddingModelProfile validation for all encoding types - TypedVector conversion and StoredVector roundtrip - convert_vector and validate_compatible rejection paths - Distance metric auto-resolution for f32/int8/binary - base64_int8 signed int8 decode correctness - Template hashing, enum roundtrips, resolve helpers Minor: add #[derive(Debug)] to StoredVector for test ergonomics. Closes aft-t6p.6.1

Add 6 new tests to fingerprint_invalidation_tests covering: - file_policy_hash mismatch triggers rebuild - docs_chunker_version mismatch triggers rebuild - multi-field changes still trigger rebuild - rebuild+query_prompt: rebuild wins - only query_prompt change: ClearQueryCache - non-fingerprint field changes: NoChange Total: 22 fingerprint tests. Closes aft-t6p.6.2

Add 29 tests covering: - is_generated_file: protobuf, minified, dist, build, generated, dart - is_doc_extension and is_config_extension validation - classify_semantic_file for code/doc/config - collect_docs_chunks markdown heading splitting - SemanticFilePolicy defaults and builtin globs - FileRecord field population - build_manifest_from_store construction and cleanup Closes aft-t6p.6.3

… tests Add 23 tests covering: - FlatF32VectorStore: search, empty, dimension mismatch, CRUD, prune, stats - FlatBinaryHammingVectorStore: search, ranking, prune, delete, stats - hamming_distance and popcount64 correctness - Binary decode: byte-aligned, non-byte-aligned, padding, error Closes aft-t6p.6.4

Add 8 tests covering: - SemanticIndexLifecycle: cold start, set/get, failed+error, all variants - SemanticIndexSnapshot: search ranking, immutability after clone - VectorStore: prune_stale_vectors, prune_orphans Closes aft-t6p.6.5

Add 10 tests covering: - HybridRerank pipeline type display - Metrics collector: window size 1, cache hit rate, zero result rate, low confidence rate, latency percentiles - Diagnostics output mode defaults - Warning formatting: minimal (all variants, verifies suppressed), verbose (all 9 variants) - SearchWarning serde roundtrip for all 8 variants Closes aft-t6p.6.6

Add 4 tests covering: - Concurrent snapshot clones produce independent results - Concurrent read threads see identical data via Arc - Mutex contention across 10 threads does not deadlock - Arc strong_count tracks clone/drop correctly Closes aft-t6p.6.7

Add 6 tests covering: - Trust file atomic write (no tmp files left behind) - Multiple projects trusted independently - Untrust is idempotent - Trust state survives reload (serde roundtrip) - Nonexistent project path is untrusted (fail-closed) Closes aft-t6p.6.8

The validate_compatible_rejects_binary_stored_with_cosine_metric test was missing source_vector_kind: BinaryPacked, causing the first match block to fail with 'unsupported source→stored vector conversion' instead of reaching the metric compatibility check.

Add local retrieval evaluation harness for measuring semantic search quality. New files: - crates/aft/src/semantic_eval.rs — pure-logic module with: - EvalCase, EvalResult, EvalSummary structs - JSONL parser (tolerates blank lines and comments) - path_matches() — cross-platform suffix matching - symbol_matches() — Rust/other-language symbol normalization - score_case() — per-case recall@k and MRR scoring - score_suite() — aggregate metrics across a suite - crates/aft/src/commands/semantic_eval.rs — handler wiring: - Reads .aft/semantic-eval.jsonl, returns EvalSummary as JSON - Supports top_k override and include_per_case toggle - Returns tri-state response per AFT honest reporting convention Wiring: - crates/aft/src/lib.rs: pub mod semantic_eval - crates/aft/src/commands/mod.rs: pub mod semantic_eval - crates/aft/src/main.rs: dispatch semantic_eval command Tests: 44 tests passing (parser, matcher, scorer, handler)

Add semantic_doctor command that produces a SemanticHealthReport gathering: - Config summary (backend, model, dimensions, metric, prompts, rerank) - Index state (lifecycle, entry count, dimension, fingerprint freshness) - Search quality metrics (p50/p95 latency, zero-result/low-confidence rates) - Provider connectivity (optional probe) - Active warnings and actionable suggestions New files: - crates/aft/src/semantic_doctor.rs — HealthStatus, ConfigSummary, IndexSummary, MetricsSummary, ProviderSummary, Suggestion, SemanticHealthReport structs with Serialize and Display impls - crates/aft/src/commands/semantic_doctor.rs — command handler with optional probe_provider param, suggestion generation for disabled/ building/failed/ready states, 7 handler tests + 6 model tests Wiring: - crates/aft/src/lib.rs: pub mod semantic_doctor - crates/aft/src/commands/mod.rs: pub mod semantic_doctor - crates/aft/src/main.rs: dispatch "semantic_doctor" command Also: fix semantic_eval temp directory race condition (atomic counter). Tests: 14 semantic_doctor + 44 semantic_eval passing, check+clippy+fmt clean.

Extend the semantic_index_info section of the status command to include: - Search quality metrics (total_queries, p50/p95 latency, zero_result_rate, low_confidence_rate, embedding_failure_rate, lexical_failure_rate) - Rerank status (rerank_enabled, rerank_model) - Diagnostics state (diagnostics_enabled, prompt_active) The TUI/status surfaces can now show pipeline health without a separate semantic_doctor call. Metrics are zero when no queries have been recorded. Tests: status + semantic_doctor tests passing, check+clippy+fmt clean.

- Add 3 new tests: markdown-fence parsing, snippet truncation, max_candidates limit - Fix missing-ID append: semantic_search now appends missing indices in original order - Add max_candidate_chars config field (default 2500) to SemanticBackendConfig - Use config.rerank_max_candidate_chars instead of hardcoded 200 in reranker - Update all test configs with new field Bead: aft-t6p.2.1

cubic-dev-ai

4 issues found across 107 files

_{Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic}

Remove .beads/, .qartez/, .claude/, .omo/, .kiro/, .lean-ctx/ from the branch. These are local agent working directories that should not be distributed. Add them to .gitignore to prevent future accidents. Addresses cubic review comments on PR cortexkit#87.

cubic-dev-ai

1 issue found across 69 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".gitignore">

<violation number="1" location=".gitignore:95">
P2: Inconsistent .gitignore pattern: `omo/` should likely be `.omo/` to match the hidden tooling directory convention used by all other entries in this block.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-06-02T18:40:25Z

+.beads/
+.qartez/
+.claude/
+omo/


P2: Inconsistent .gitignore pattern: omo/ should likely be .omo/ to match the hidden tooling directory convention used by all other entries in this block.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At .gitignore, line 95: <comment>Inconsistent .gitignore pattern: `omo/` should likely be `.omo/` to match the hidden tooling directory convention used by all other entries in this block.</comment> <file context> @@ -87,3 +87,11 @@ benchmarks/aft-search/.bench/ +.beads/ +.qartez/ +.claude/ +omo/ +.kiro/ +.lean-ctx/ </file context>

Remove .alfonso/, agents.md, beads-data-*.jsonl, magic-context-*.md, biome.json_ from the branch. Add them to .gitignore to prevent future inclusion in PRs.

Restore .alfonso/ from main (it exists upstream). Keep agents.md, beads-data-*.jsonl, magic-context-*.md, biome.json_ removed and gitignored since they don't exist on main.

Zireael · 2026-06-02T18:56:08Z

Source code for semantic search functionality for public preview.
Feature skeleton is there, needs finishing up, polishing static tests and functional testing.
One more thing that would need adding would be model2vec 'Potion Code 16M' support. If it performs well in tests, I think it could become fast, cheap and performant default semantic model.

Here's imlementation plans for sprints under this epic (in gastown beads format):
aft-semantic-search-upgrade.json

1. Fix duplicate entries in reranked output (greptile P1) - Add !used[i] check in filter_map to prevent duplicate indices - File: crates/aft/src/commands/semantic_search.rs 2. Strip markdown fences from LLM reranker responses (greptile P1) - Many chat models wrap JSON in code fences - Add strip_markdown_fences() helper applied before parsing - File: crates/aft/src/semantic_rerank.rs 3. Align TypeScript enum values with Rust serde (qubic P1) - SemanticBackendEnum: add perplexity variant - SemanticOutputEncodingEnum: float, base64_int8, base64_binary - SemanticStorageStrategyEnum: native_f32, decode_normalize_f32, binary_packed - SemanticInputModeEnum: flat_texts, document_chunks - SemanticDistanceMetricEnum: auto, cosine, dot_product, euclidean, hamming - File: packages/opencode-plugin/src/config.ts

cubic-dev-ai

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode-plugin/src/config.ts">

<violation number="1" location="packages/opencode-plugin/src/config.ts:40">
P2: Semantic enum literals were renamed without backward-compatibility aliases or migration, breaking existing configs that use old values.</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

cubic-dev-ai · 2026-06-02T19:34:20Z

+const SemanticBackendEnum = z.enum(["fastembed", "openai_compatible", "ollama", "perplexity"]);
+
+/** Output encoding mode for embeddings. */
+const SemanticOutputEncodingEnum = z.enum(["float", "base64_int8", "base64_binary"]);


P2: Semantic enum literals were renamed without backward-compatibility aliases or migration, breaking existing configs that use old values.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode-plugin/src/config.ts, line 40: <comment>Semantic enum literals were renamed without backward-compatibility aliases or migration, breaking existing configs that use old values.</comment> <file context> @@ -34,19 +34,19 @@ const CheckerEnum = z.enum([ /** Output encoding mode for embeddings. */ -const SemanticOutputEncodingEnum = z.enum(["float", "binary", "ubinary", "int8", "uint8"]); +const SemanticOutputEncodingEnum = z.enum(["float", "base64_int8", "base64_binary"]); /** Storage strategy for embedding vectors. */ </file context>

…s, retry, diagnostics Add three features to build_with_progress_contextualized: 1. Oversized document handling: split_oversized_document() partitions documents exceeding DEFAULT_MAX_CHUNKS_PER_DOCUMENT (100) into sub-groups, preserving chunk order with synthetic '(part N)' titles. 2. Retry logic: embed_document_group_with_retry() wraps each document group with exponential backoff (3 retries, 1s base, 8s cap), only retrying transient errors (rate limits, timeouts, server errors). Failed groups are skipped with a warning instead of aborting the entire build. 3. Diagnostics: ContextualizedBuildDiagnostics struct tracks documents_processed, chunks_embedded, rejected_oversized, retried_groups, failed_groups, and max_chunks_in_document. Summary logged via slog_info! at build completion.

Coverage: - chunks grouped by source document (multi-file) - chunk order preserved within each document - wrong chunk count in response fails loudly - unknown file path in response fails - dimension mismatch fails with specific error - stale-vector pruning after contextualized index + refresh - Perplexity backend defaults to DocumentChunks input mode - Fastembed backend verifies FlatTexts for contrast - oversized document is split into sub-groups (>100 chunks) - empty file set produces empty index - retry on transient errors (429 rate limit) - non-transient errors are NOT retried - progress callback reports correct done/total counts

CRITICAL fixes: - cosine_similarity: guard NaN from zero-norm vectors + clamp to [-1,1] - semantic_search: remove unconditional Ready status overwrite (search must not change lifecycle state) - reranker: add out-of-bounds index warning when LLM returns indices exceeding candidate count HIGH fixes: - build_embed_text: remove duplicate name: field in embed text format - split_large_chunk: fix end_line for final sub-chunk (was using chunk.start_line + total_lines instead of chunk_start + current_lines) - strip_markdown_fences: robust fence stripping with language tag handling and proper closing-fence detection - rejected_oversized: actually increment counter when documents are split MEDIUM fixes: - SemanticCancellationToken: use Acquire/Release ordering instead of Relaxed for cross-thread generation counter - semantic_search: validate non-empty query before processing

parse_semantic_config previously only handled 6 fields (backend, model, base_url, api_key_env, timeout_ms, max_batch_size). Now it also parses: dimensions, output_encoding, input_mode, storage_strategy, distance_metric, query_prompt_template, document_prompt_template, diagnostics_enabled, low_confidence_threshold, output_mode, rerank_enabled, rerank_model, rerank_base_url, rerank_api_key_env, rerank_timeout_ms, rerank_max_candidates, rerank_max_candidate_chars. Note: the TS plugin's getStrippedSemanticKeys() intentionally strips these fields from PROJECT config (untrusted) as a security boundary. They can still be set from USER config (trusted). The Rust side now correctly accepts all fields when the plugin sends them.

cubic-dev-ai

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/aft/src/commands/configure.rs">

<violation number="1" location="crates/aft/src/commands/configure.rs:342">
P1: `rerank_base_url` is parsed without SSRF validation, unlike `base_url`</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

cubic-dev-ai · 2026-06-04T17:20:38Z

+            .to_string()
+            .into();
+    }
+    if let Some(raw) = obj.get("rerank_base_url") {


P1: rerank_base_url is parsed without SSRF validation, unlike base_url

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At crates/aft/src/commands/configure.rs, line 342: <comment>`rerank_base_url` is parsed without SSRF validation, unlike `base_url`</comment> <file context> @@ -230,6 +230,150 @@ fn parse_semantic_config( + .to_string() + .into(); + } + if let Some(raw) = obj.get("rerank_base_url") { + semantic.rerank_base_url = raw + .as_str() </file context>

Also parse: jsonl_logging, jsonl_path, include_raw_queries, include_snippets, retention_days, metrics_window_size.

cubic-dev-ai

2 issues found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/aft/src/commands/configure.rs">

<violation number="1" location="crates/aft/src/commands/configure.rs:382">
P2: `semantic.jsonl_path` lacks path validation/normalization, unlike other path configs in this file (`validate_storage_dir`, `parse_lsp_paths_extra`) which enforce absolute paths and reject `..` traversal. This creates path-injection risk for downstream JSONL diagnostics writes.</violation>

<violation number="2" location="crates/aft/src/commands/configure.rs:402">
P2: `semantic.retention_days` uses lossy `u64 -> u32` cast with silent overflow instead of explicit validation.</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

cubic-dev-ai · 2026-06-04T22:01:47Z

+            "configure: semantic.jsonl_logging must be a boolean".to_string()
+        })?;
+    }
+    if let Some(raw) = obj.get("jsonl_path") {


P2: semantic.jsonl_path lacks path validation/normalization, unlike other path configs in this file (validate_storage_dir, parse_lsp_paths_extra) which enforce absolute paths and reject .. traversal. This creates path-injection risk for downstream JSONL diagnostics writes.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At crates/aft/src/commands/configure.rs, line 382: <comment>`semantic.jsonl_path` lacks path validation/normalization, unlike other path configs in this file (`validate_storage_dir`, `parse_lsp_paths_extra`) which enforce absolute paths and reject `..` traversal. This creates path-injection risk for downstream JSONL diagnostics writes.</comment> <file context> @@ -374,6 +374,42 @@ fn parse_semantic_config( + "configure: semantic.jsonl_logging must be a boolean".to_string() + })?; + } + if let Some(raw) = obj.get("jsonl_path") { + semantic.jsonl_path = if raw.is_null() { + None </file context>

cubic-dev-ai · 2026-06-04T22:01:47Z

+        semantic.retention_days = raw.as_u64().ok_or_else(|| {
+            "configure: semantic.retention_days must be an unsigned integer".to_string()
+        })? as u32;


P2: semantic.retention_days uses lossy u64 -> u32 cast with silent overflow instead of explicit validation.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At crates/aft/src/commands/configure.rs, line 402: <comment>`semantic.retention_days` uses lossy `u64 -> u32` cast with silent overflow instead of explicit validation.</comment> <file context> @@ -374,6 +374,42 @@ fn parse_semantic_config( + })?; + } + if let Some(raw) = obj.get("retention_days") { + semantic.retention_days = raw.as_u64().ok_or_else(|| { + "configure: semantic.retention_days must be an unsigned integer".to_string() + })? as u32; </file context>

Suggested change

semantic.retention_days = raw.as_u64().ok_or_else(|| {

"configure: semantic.retention_days must be an unsigned integer".to_string()

})? as u32;

let v = raw.as_u64().ok_or_else(|| {

"configure: semantic.retention_days must be an unsigned integer".to_string()

})?;

semantic.retention_days = u32::try_from(v)

.map_err(|_| "configure: semantic.retention_days is too large".to_string())?;

Zireael and others added 30 commits May 24, 2026 11:10

aft-t6p.7: provider capabilities — config profiles, dimension pass-th…

50a7e65

…rough, fingerprint upgrade

aft-t6p.1: embedding query/document prompt-template support

34073be

aft-t6p.15: semantic config trust boundary — TypeScript schema, warni…

f60a2a9

…ng fields, tests

aft-t6p.8: semantic index lifecycle — immutable snapshots, stale-vect…

0f640ca

…or pruning, write-lock sync

chore: add testuser non-root runner to docker-rust.ps1, update benchm…

54377d9

…ark data

aft-t6p.9: semantic fingerprint — config matrix, diff engine, V6→V7 u…

0c60fcc

…pgrade, invalidation tests

chore: bead tracking, architecture docs, and biome config

945cef2

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

Zireael added 4 commits June 1, 2026 09:24

cubic-dev-ai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread .beads/README.md Outdated

Comment thread .beads/config.yaml Outdated

Comment thread .claude/settings.json Outdated

Comment thread .qartez/acks/5813b13fa433d553 Outdated

Zireael changed the title ~~feat(semantic): provider-aware typed embeddings, reranking, diagnostics, and eval harness~~ feat(semantic): [alpha build] provider-aware typed embeddings, reranking, diagnostics, and eval harness Jun 2, 2026

cubic-dev-ai Bot reviewed Jun 2, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread crates/aft/src/commands/semantic_search.rs

Comment thread crates/aft/src/semantic_rerank.rs

Zireael added 2 commits June 2, 2026 20:43

chore: remove remaining non-source files from PR

349332b

Remove .alfonso/, agents.md, beads-data-*.jsonl, magic-context-*.md, biome.json_ from the branch. Add them to .gitignore to prevent future inclusion in PRs.

chore: restore upstream .alfonso, keep other junk removed

603115c

Restore .alfonso/ from main (it exists upstream). Keep agents.md, beads-data-*.jsonl, magic-context-*.md, biome.json_ removed and gitignored since they don't exist on main.

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread packages/opencode-plugin/src/config.ts

Comment thread crates/aft/src/commands/semantic_search.rs Outdated

cubic-dev-ai Bot reviewed Jun 2, 2026

View reviewed changes

Zireael added 4 commits June 2, 2026 23:16

cubic-dev-ai Bot reviewed Jun 4, 2026

View reviewed changes

fix(configure): add missing JSONL/metrics config field parsers

d204e2d

Also parse: jsonl_logging, jsonl_path, include_raw_queries, include_snippets, retention_days, metrics_window_size.

cubic-dev-ai Bot reviewed Jun 4, 2026

View reviewed changes

Conversation

Zireael commented Jun 2, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Typed vector representations

Provider capability profiles

Fingerprint-driven index lifecycle

Non-blocking cold start

Stale-vector pruning

File policy and docs chunking

Reranking pipeline

Search pipeline metrics and diagnostics

Semantic doctor

Semantic eval harness

Status integration

Config trust boundary

Contextualized document-chunk embedding (partial)

How to test

Default fastembed (zero-config)

Provider switching

Reranking

Semantic doctor

Eval harness

Test coverage

Roadmap

Architecture notes

Summary by cubic

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zireael commented Jun 2, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Zireael commented Jun 2, 2026 •

edited by greptile-apps Bot

Loading

cubic-dev-ai Bot left a comment •

edited

Loading