Skip to content

Add SstFileReader::MayMatch API#30

Draft
murphy-4o wants to merge 8 commits into
masterfrom
unique-key-rocksdb-maymatch
Draft

Add SstFileReader::MayMatch API#30
murphy-4o wants to merge 8 commits into
masterfrom
unique-key-rocksdb-maymatch

Conversation

@murphy-4o

@murphy-4o murphy-4o commented May 5, 2026

Copy link
Copy Markdown
Member

Why

UNIQUE KEY uses RocksDB SST readers mostly as a negative filter: incoming keys are checked against existing SSTs, and most SST/key pairs do not match.

For that mostly-negative path, SstFileReader::MultiGet is still too heavy because it is a point-lookup API. Even when the bloom filter avoids data-block reads, MultiGet still prepares lookup/range state and result/status plumbing needed to return values or not-found statuses.

Summary

This PR adds that narrow bloom-only fast path. MayMatch batches user keys through the table filter and returns booleans, so definitely-absent keys avoid the rest of the point-lookup machinery before callers fall back to normal lookup work for maybe-present keys.

In a negative-probe microbenchmark, MultiGet was about 181 ns/key, while the MayMatch was about 27 ns/key with the filter resident.

  • Adds SstFileReader::MayMatch and routes it through TableReader::MayMatch.
  • Exposes ReadOptions on the public SstFileReader::MayMatch overload so callers can request no-IO behavior through read_tier.
  • Implements bloom-only batch checks for block-based tables, with safe all-true fallback when a table cannot answer safely.

Other changes:

  • Pins the format check to origin/master because the workflow fetches facebook/rocksdb as upstream; without the override, format-diff.sh checks unrelated ClickHouse fork deltas instead of this PR's diff.

@murphy-4o murphy-4o force-pushed the unique-key-rocksdb-maymatch branch from 0ba0b80 to 3faa153 Compare May 5, 2026 08:50
@murphy-4o murphy-4o changed the base branch from unique-key-rocksdb-base-10.10 to master May 5, 2026 08:50
@murphy-4o murphy-4o force-pushed the unique-key-rocksdb-maymatch branch from 3faa153 to 6374942 Compare May 5, 2026 08:57
Exposes the SST bloom filter via a thin virtual that bypasses the
full MultiGet scaffolding (KeyContext / GetContext / sort / LookupKey).
Also adds skip_filters parameter to SstFileReader::MultiGet to avoid
redundant bloom check on pre-filtered keys.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@murphy-4o murphy-4o force-pushed the unique-key-rocksdb-maymatch branch from 6374942 to e6240f2 Compare May 5, 2026 09:01
murphy-4o added 5 commits May 5, 2026 17:20
Expose ReadOptions on SstFileReader::MayMatch so callers can request no-IO behavior through read_tier. The earlier commit message text about SstFileReader::MultiGet was stale; this branch only adds the bloom-only MayMatch path.
@murphy-4o murphy-4o changed the title Add RocksDB bloom-only batch API Add SstFileReader::MayMatch API May 5, 2026
murphy-4o added 2 commits May 5, 2026 17:44
Make the format sanity check compare PR changes against ClickHouse/rocksdb master instead of the facebook/rocksdb upstream remote fetched by the workflow.
Clarify MayMatch fallback semantics in comments and skip filter setup entirely for empty batches.
@murphy-4o murphy-4o marked this pull request as ready for review May 5, 2026 10:09
@murphy-4o murphy-4o marked this pull request as draft May 5, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant