Skip to content

Harden binary fuse filter#5202

Open
SirTyson wants to merge 3 commits intostellar:masterfrom
SirTyson:harden-binary-fuse-filter
Open

Harden binary fuse filter#5202
SirTyson wants to merge 3 commits intostellar:masterfrom
SirTyson:harden-binary-fuse-filter

Conversation

@SirTyson
Copy link
Copy Markdown
Contributor

@SirTyson SirTyson commented Apr 2, 2026

Description

This fixes a bug in the binary fuse filter, where very occasionally we would corrupt a value in the index. Specifically, the last element in reverseOrder is a sentinel value that should never be reset, even in the case of underflow error. However, when underflow error occurred, we would reset the entire vector, including the sentinel.

The first commit fixes this issue. The 2nd commit adds code coverage support (shamelessly stolen from @graydon #5012) and uses it to add more comprehensive unit tests for edge cases, including the sentinel issue.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Binary Fuse filter population/retry logic (fixing a sentinel-reset corruption case) and adds a lightweight “coverage mark” mechanism plus targeted unit tests to exercise rare retry paths deterministically.

Changes:

  • Fix reverseOrder reset behavior to avoid clearing the sentinel during retry/reset paths.
  • Introduce CovMark infrastructure (COVMARK_HIT / COVMARK_CHECK_HIT_IN_CURR_SCOPE) to assert execution of rare internal paths from tests.
  • Add BinaryFuseFilter unit tests covering error retry, peeling failure retry, and duplicate-removal paths; remove DiskIndex’s legacy retry workaround.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/util/test/BinaryFuseTests.cpp Adds tests that force rare BinaryFuseFilter populate/retry paths and assert they were hit.
src/test/CovMark.h Defines the coverage-mark API, counters, and RAII guard used by tests.
src/test/CovMark.cpp Defines the global coverage counter storage for test builds.
src/bucket/DiskIndex.cpp Removes the previous retry-on-out_of_range workaround when constructing the filter.
lib/binaryfusefilter.h Implements sentinel-safe reset, adds test forcing hooks, and emits coverage marks in retry/duplicate paths.

@SirTyson SirTyson force-pushed the harden-binary-fuse-filter branch from bc7795f to 2fdc666 Compare April 2, 2026 21:41
// pre-hashed keys) can produce collisions even when the input keys are unique.
// To simulate this, we just duplicate all the input hashes, which guarantees
// collisions.
TEST_CASE("binary fuse filter duplicate removal", "[BinaryFuseFilter]")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you understand why this fails in CI?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some UB due to how I copied a vector in test setup, should be fixed now.

@SirTyson SirTyson force-pushed the harden-binary-fuse-filter branch from 2fdc666 to 246acdc Compare April 3, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants