Hi! We're integrating lance-c as the items / FTS storage backend for an internal C++ service and want to understand early on the surface for compound FTS queries.
The Phase 2 design doc already names this as future work (docs/superpowers/specs/2026-04-23-phase2-vector-search-indexing-design.md:22):
Compound boolean FTS queries (Boost / Boolean / Phrase composition). MVP
exposes match + fuzzy; the composer can be added later without breaking
changes.
We'd like to know the rough timeline for the implementation and happy to help land this. What's your preferred shape for the C ABI?
What we need
Concretely, we need to express each FtsQuery variant from lance_index::scalar::inverted::query over the C ABI:
| Need (downstream consumer) |
Maps to |
| AND across query terms |
MatchQuery::with_operator(And) |
| Type-ahead (last token as prefix) + expansion cap |
prefix expansion → BooleanQuery(Should) of MatchQuery::with_max_expansions(...) |
Per-attribute boost (title^1.2 body) |
MultiMatchQuery or BooleanQuery(Should) of MatchQuery::with_boost |
Sub-queries with per-clause match_all_terms / last_as_prefix / boost |
BooleanQuery(Should) of distinct MatchQuery |
| Phrase queries with slop |
PhraseQuery::with_slop |
| Negative-boost re-rank |
BoostQuery |
The current C function:
int32_t lance_scanner_full_text_search(
LanceScanner* scanner,
const char* query,
const char* const* columns,
uint32_t max_fuzzy_distance);
covers MatchQuery::new + MatchQuery::with_fuzziness(Some(d)) only.
Two surface shapes we're considering
Option A — JSON entrypoint. Single new function:
/// Set a serialized FtsQuery (lance_index::scalar::inverted::query::FtsQuery)
/// as JSON. The Rust types already derive Serialize/Deserialize.
int32_t lance_scanner_fts_query_json(
LanceScanner* scanner,
const char* fts_query_json,
size_t json_len);
Pros: one C symbol; forward-compatible with new variants for free;
implementation is serde_json::from_slice + set_fts_query.
Cons: callers serialize JSON; errors surface late (deserialization), not at
call sites; less idiomatic vs the typed nearest/nprobes/refine_factor
chain pattern.
Option B — Typed handles. Opaque LanceFtsQuery* plus builders:
LanceFtsQuery* lance_fts_match_new(const char* terms);
int32_t lance_fts_match_set_column(LanceFtsQuery*, const char*);
int32_t lance_fts_match_set_operator(LanceFtsQuery*, LanceFtsOperator);
int32_t lance_fts_match_set_fuzziness(LanceFtsQuery*, int32_t);
int32_t lance_fts_match_set_max_expansions(LanceFtsQuery*, uint32_t);
int32_t lance_fts_match_set_boost(LanceFtsQuery*, float);
int32_t lance_fts_match_set_prefix_length(LanceFtsQuery*, uint32_t);
LanceFtsQuery* lance_fts_phrase_new(const char* terms);
int32_t lance_fts_phrase_set_slop(LanceFtsQuery*, uint32_t);
LanceFtsQuery* lance_fts_boost_new(LanceFtsQuery* positive, LanceFtsQuery* negative, float boost);
LanceFtsQuery* lance_fts_multi_match_new(const LanceFtsQuery* const* matches, size_t n);
LanceFtsQuery* lance_fts_boolean_new(void);
int32_t lance_fts_boolean_add(LanceFtsQuery*, LanceFtsOccur, LanceFtsQuery*);
void lance_fts_query_close(LanceFtsQuery*);
int32_t lance_scanner_fts_query(LanceScanner*, LanceFtsQuery*);
Pros: matches the existing Scanner::nearest + nprobes/refine_factor/ef pattern; errors surface at construction time; idiomatic in C++ via RAII wrappers in lance.hpp.
Cons: ~12 new C functions vs 1; new variants in the future need new
symbols (still ABI-compatible — just additive).
Hi! We're integrating
lance-cas the items / FTS storage backend for an internal C++ service and want to understand early on the surface for compound FTS queries.The Phase 2 design doc already names this as future work (
docs/superpowers/specs/2026-04-23-phase2-vector-search-indexing-design.md:22):We'd like to know the rough timeline for the implementation and happy to help land this. What's your preferred shape for the C ABI?
What we need
Concretely, we need to express each
FtsQueryvariant fromlance_index::scalar::inverted::queryover the C ABI:MatchQuery::with_operator(And)BooleanQuery(Should)ofMatchQuery::with_max_expansions(...)title^1.2 body)MultiMatchQueryorBooleanQuery(Should)ofMatchQuery::with_boostmatch_all_terms/last_as_prefix/boostBooleanQuery(Should)of distinctMatchQueryPhraseQuery::with_slopBoostQueryThe current C function:
covers
MatchQuery::new+MatchQuery::with_fuzziness(Some(d))only.Two surface shapes we're considering
Option A — JSON entrypoint. Single new function:
Pros: one C symbol; forward-compatible with new variants for free;
implementation is
serde_json::from_slice+set_fts_query.Cons: callers serialize JSON; errors surface late (deserialization), not at
call sites; less idiomatic vs the typed
nearest/nprobes/refine_factorchain pattern.
Option B — Typed handles. Opaque
LanceFtsQuery*plus builders:Pros: matches the existing
Scanner::nearest+nprobes/refine_factor/efpattern; errors surface at construction time; idiomatic in C++ via RAII wrappers inlance.hpp.Cons: ~12 new C functions vs 1; new variants in the future need new
symbols (still ABI-compatible — just additive).