Skip to content

Extract diskann-storage crate from diskann-providers#873

Open
varat73 wants to merge 3 commits intomainfrom
extract-diskann-storage
Open

Extract diskann-storage crate from diskann-providers#873
varat73 wants to merge 3 commits intomainfrom
extract-diskann-storage

Conversation

@varat73
Copy link
Copy Markdown
Contributor

@varat73 varat73 commented Apr 2, 2026

Summary

Extract a new diskann-storage Tier 2 crate from diskann-providers as the first phase of decomposing the providers god crate (37K lines → ~15K target).

What moved to diskann-storage

Item Description
StorageReadProvider / StorageWriteProvider Core storage I/O traits
FileStorageProvider Filesystem-backed implementation
VirtualStorageProvider In-memory/overlay VFS (feature-gated behind virtual_storage)
DynWriteProvider / WriteProviderWrapper / WriteSeek Dynamic dispatch adapters
path_utility Canonical file-path naming for index artifacts
DatasetDto Aligned vector dataset transfer type
proto_storage Generic protobuf load/save via prost

Key properties

  • Zero diskann-* dependencies — only prost + thiserror. Clean Tier 2 crate.
  • 38 tests in diskann-storage covering all moved functionality.
  • Backward compatiblediskann-providers re-exports all moved types; existing crate::storage::StorageReadProvider paths continue to work.
  • Consumer crates migrateddiskann-disk, diskann-tools, diskann-benchmark updated to import directly from diskann-storage.

Verification

  • cargo fmt --all --check
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test -p diskann-storage — 38 passed, 0 failed
  • cargo test -p diskann-providers --profile ci — 520 passed, 0 failed
  • cargo check --workspace

varat73 and others added 2 commits April 1, 2026 20:13
Create new Tier 2 crate diskann-storage with zero diskann-* dependencies,
containing:
- StorageReadProvider / StorageWriteProvider traits
- FileStorageProvider (filesystem-backed)
- VirtualStorageProvider (in-memory/overlay VFS, feature-gated)
- Path utility functions for index artifact naming
- DatasetDto transfer type
- Generic protobuf load/save via prost

diskann-providers now depends on diskann-storage and re-exports all moved
types to preserve backward compatibility for existing consumers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update diskann-disk, diskann-tools, and diskann-benchmark to import
storage types (StorageReadProvider, StorageWriteProvider,
FileStorageProvider, VirtualStorageProvider, DynWriteProvider,
WriteProviderWrapper, path_utility functions) from diskann-storage
instead of the re-exports in diskann-providers.

Fix unfulfilled lint expectations in benchmark files where
#[expect(clippy::disallowed_methods)] was no longer triggered because
VirtualStorageProvider::new_physical now lives in diskann-storage.

Run cargo fmt --all to normalize import ordering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@varat73 varat73 requested review from a team and Copilot April 2, 2026 03:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extracts storage abstractions and implementations from diskann-providers into a new Tier-2 crate, diskann-storage, and migrates downstream crates to use the new crate directly while keeping diskann-providers::storage::* as a compatibility re-export.

Changes:

  • Added new diskann-storage crate containing storage traits, filesystem and virtual storage providers, path naming helpers, dataset DTO, and protobuf load/save utilities.
  • Updated diskann-providers to re-export moved storage APIs from diskann-storage and to re-export DatasetDto from diskann-storage.
  • Migrated consumer crates (diskann-tools, diskann-disk, diskann-benchmark) to import storage traits/providers and path helpers from diskann-storage.

Reviewed changes

Copilot reviewed 54 out of 61 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
diskann-tools/src/utils/relative_contrast.rs Switch storage provider imports to diskann-storage.
diskann-tools/src/utils/random_data_generator.rs Test-only migration to diskann-storage virtual provider; main trait import still via providers re-export.
diskann-tools/src/utils/ground_truth.rs Switch StorageReadProvider/StorageWriteProvider imports to diskann-storage.
diskann-tools/src/utils/generate_synthetic_labels_utils.rs Switch DynWriteProvider + test storage providers to diskann-storage.
diskann-tools/src/utils/gen_associated_data_from_range.rs Switch StorageWriteProvider + test providers to diskann-storage.
diskann-tools/src/utils/build_pq.rs Move path utility + storage provider imports to diskann-storage.
diskann-tools/src/utils/build_disk_index.rs Switch storage provider imports (incl. tests) to diskann-storage.
diskann-tools/src/bin/subsample_bin.rs Switch FileStorageProvider/StorageWriteProvider imports to diskann-storage.
diskann-tools/src/bin/relative_contrast.rs Switch FileStorageProvider import to diskann-storage.
diskann-tools/src/bin/random_data_generator.rs Switch FileStorageProvider import to diskann-storage.
diskann-tools/src/bin/generate_pq.rs Switch FileStorageProvider import to diskann-storage.
diskann-tools/src/bin/generate_minmax.rs Switch StorageReadProvider/FileStorageProvider usage to diskann-storage.
diskann-tools/src/bin/gen_associated_data_from_range.rs Switch FileStorageProvider import to diskann-storage.
diskann-tools/Cargo.toml Add diskann-storage dependency (currently enabling virtual_storage in normal deps).
diskann-storage/src/virtual_storage_provider.rs Add/expand VirtualStorageProvider implementation + tests.
diskann-storage/src/storage_provider.rs New core storage traits + dyn adapters + wrapper + tests.
diskann-storage/src/proto_storage.rs New generic prost-backed load_proto/save_proto helpers + tests.
diskann-storage/src/path_utility.rs New canonical artifact path helpers + tests.
diskann-storage/src/lib.rs New crate root wiring and re-exports.
diskann-storage/src/file_storage_provider.rs Filesystem-backed provider documentation + expanded tests.
diskann-storage/src/dataset_dto.rs New DatasetDto type + tests (moved from providers utils).
diskann-storage/Cargo.toml New crate manifest (edition 2024) with virtual_storage feature.
diskann-providers/src/utils/utils.rs Remove in-crate DatasetDto definition (moved).
diskann-providers/src/utils/mod.rs Re-export DatasetDto from diskann-storage.
diskann-providers/src/storage/storage_provider.rs Remove in-crate storage traits/dyn adapters (moved).
diskann-providers/src/storage/path_utility.rs Remove in-crate path utilities (moved).
diskann-providers/src/storage/mod.rs Re-export diskann-storage storage APIs for backward compatibility.
diskann-providers/Cargo.toml Add dependency on diskann-storage (currently enabling virtual_storage unconditionally).
diskann-providers/benches/benchmarks/copy_aligned_data_bench.rs Cleanup after storage extraction (still uses providers storage module).
diskann-providers/benches/benchmarks_iai/copy_aligned_data_bench_iai.rs Cleanup after storage extraction (still uses providers storage module).
diskann-disk/src/utils/partition.rs Switch storage trait imports (and tests) to diskann-storage.
diskann-disk/src/utils/aligned_file_reader/virtual_aligned_reader_factory.rs Switch VirtualStorageProvider import to diskann-storage.
diskann-disk/src/utils/aligned_file_reader/storage_provider_aligned_file_reader.rs Switch StorageReadProvider and test provider imports to diskann-storage.
diskann-disk/src/utils/aligned_file_reader/aligned_file_reader_factory.rs Switch FileStorageProvider import (macOS/miri path) to diskann-storage.
diskann-disk/src/storage/quant/pq/pq_generation.rs Switch storage traits in main + tests to diskann-storage.
diskann-disk/src/storage/quant/generator.rs Switch storage traits and tests to diskann-storage.
diskann-disk/src/storage/disk_index_writer.rs Switch storage traits + path utilities to diskann-storage.
diskann-disk/src/storage/disk_index_reader.rs Switch StorageReadProvider + tests to diskann-storage.
diskann-disk/src/storage/cached_writer.rs Switch StorageWriteProvider + tests to diskann-storage.
diskann-disk/src/storage/cached_reader.rs Switch StorageReadProvider + tests to diskann-storage.
diskann-disk/src/search/provider/disk_vertex_provider.rs Switch storage traits + get_disk_index_file to diskann-storage.
diskann-disk/src/search/provider/disk_provider.rs Switch storage trait + path helper imports to diskann-storage.
diskann-disk/src/build/chunking/checkpoint/checkpoint_record_manager_with_file.rs Switch FileStorageProvider import to diskann-storage.
diskann-disk/src/build/builder/quantizer.rs Switch storage traits to diskann-storage.
diskann-disk/src/build/builder/inmem_builder.rs Switch dyn write wrapper types to diskann-storage.
diskann-disk/src/build/builder/core.rs Switch storage traits and path helpers to diskann-storage.
diskann-disk/src/build/builder/build.rs Switch storage traits and tests to diskann-storage.
diskann-disk/Cargo.toml Add diskann-storage dependency.
diskann-benchmark/src/utils/datafiles.rs Import StorageReadProvider from diskann-storage (still uses providers FileStorageProvider).
diskann-benchmark/src/inputs/save_and_load.rs Switch StorageReadProvider import to diskann-storage.
diskann-benchmark/src/backend/disk_index/build.rs Switch storage traits to diskann-storage.
diskann-benchmark/src/backend/disk_index/benchmarks.rs Switch FileStorageProvider import to diskann-storage.
diskann-benchmark/Cargo.toml Add diskann-storage dependency.
Cargo.toml Add diskann-storage to workspace members and workspace dependencies.
Cargo.lock Record new crate and dependency graph updates.
Comments suppressed due to low confidence (1)

diskann-storage/src/virtual_storage_provider.rs:292

  • The new trait_based_read_provider test uses &dyn StorageReadProvider<Reader = impl Read + Seek>, which is not a valid/idiomatic way to express “any reader” and may fail to compile due to impl Trait usage inside an associated-type equality. Prefer a generic helper like fn read_via_trait<P: StorageReadProvider>(p: &P) where P::Reader: Read + Seek { ... } (or specify the concrete reader type explicitly) to keep the test compiling on stable Rust.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address PR review feedback:

- diskann-providers: wire virtual_storage feature to forward to
  diskann-storage/virtual_storage instead of unconditionally enabling it
- diskann-disk, diskann-tools: move diskann-storage virtual_storage
  feature to dev-dependencies only (matches prior vfs gating)
- diskann-tools/random_data_generator.rs: migrate StorageWriteProvider
  import to diskann_storage
- diskann-benchmark/datafiles.rs: migrate FileStorageProvider usage
  to diskann_storage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@varat73
Copy link
Copy Markdown
Contributor Author

varat73 commented Apr 2, 2026

Regarding the review comments:

build_pq.rs \p_val\ (comment 1): This is a preexisting bug — our PR only changed imports in this file, not logic. Filed as a separate issue to avoid scope creep.

random_data_generator.rs (comment 2): Fixed in d6128ea — split the import so \StorageWriteProvider\ comes from \diskann_storage.

diskann-tools/Cargo.toml \�irtual_storage\ (comment 3): Fixed in d6128ea — moved \�irtual_storage\ feature to dev-dependencies only.

diskann-providers/Cargo.toml \�irtual_storage\ (comment 4): Fixed in d6128ea — \diskann-storage\ is now without features in normal deps; \�irtual_storage\ feature forwards to \diskann-storage/virtual_storage.

datafiles.rs \FileStorageProvider\ (comment 5): Fixed in d6128ea — migrated to \diskann_storage::FileStorageProvider.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 70.60150% with 782 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.35%. Comparing base (0ced23d) to head (d6128ea).

Files with missing lines Patch % Lines
diskann-tools/src/utils/search_disk_index.rs 0.00% 281 Missing ⚠️
diskann-benchmark/src/inputs/disk.rs 4.47% 192 Missing ⚠️
diskann-tools/src/utils/search_index_utils.rs 74.01% 112 Missing ⚠️
diskann-tools/src/utils/build_pq.rs 0.00% 53 Missing ⚠️
diskann-benchmark/src/main.rs 91.12% 33 Missing ⚠️
diskann-tools/src/bin/random_data_generator.rs 0.00% 28 Missing ⚠️
diskann-tools/src/bin/generate_pq.rs 0.00% 25 Missing ⚠️
diskann-storage/src/storage_provider.rs 80.39% 10 Missing ⚠️
diskann-storage/src/virtual_storage_provider.rs 93.82% 10 Missing ⚠️
...nn-tools/src/bin/gen_associated_data_from_range.rs 0.00% 9 Missing ⚠️
... and 9 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #873      +/-   ##
==========================================
+ Coverage   89.31%   89.35%   +0.03%     
==========================================
  Files         445      447       +2     
  Lines       84095    84343     +248     
==========================================
+ Hits        75113    75361     +248     
  Misses       8982     8982              
Flag Coverage Δ
miri 89.35% <70.60%> (+0.03%) ⬆️
unittests 89.19% <70.60%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/utils/datafiles.rs 83.33% <100.00%> (-0.25%) ⬇️
diskann-disk/src/build/builder/build.rs 94.16% <ø> (ø)
diskann-disk/src/build/builder/core.rs 95.25% <ø> (ø)
diskann-disk/src/build/builder/inmem_builder.rs 86.93% <ø> (ø)
diskann-disk/src/build/builder/quantizer.rs 90.52% <ø> (ø)
diskann-disk/src/search/provider/disk_provider.rs 91.02% <ø> (ø)
...n-disk/src/search/provider/disk_vertex_provider.rs 87.30% <ø> (ø)
diskann-disk/src/storage/disk_index_reader.rs 92.30% <ø> (ø)
diskann-disk/src/storage/disk_index_writer.rs 96.58% <ø> (ø)
diskann-disk/src/storage/quant/generator.rs 92.72% <ø> (ø)
... and 32 more

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants