add bf_tree benchmark infrastructure#1106
Conversation
4201a33 to
1661985
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1106 +/- ##
==========================================
- Coverage 89.45% 89.43% -0.02%
==========================================
Files 484 484
Lines 91407 91495 +88
==========================================
+ Hits 81765 81831 +66
- Misses 9642 9664 +22
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
2981ffc to
c36e2a5
Compare
There was a problem hiding this comment.
Pull request overview
Adds bf_tree-backed benchmark support to diskann-benchmark (full-precision + spherical quantization; static + streaming variants) and updates diskann-bftree to address inplace-delete behavior and reduce per-call allocations in neighbor serialization.
Changes:
- Add a new
bftreeinput schema + backend benchmarks (static build/search and streaming runbook modes, including spherical quantization bit-width dispatch). - Add delete-time vector caching in
diskann-bftreeto support inplace-delete pruning after the underlying storage entries are removed. - Rework neighbor list serialization to avoid per-call heap allocations (stack buffer fast path).
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-bftree/src/quant.rs | Makes get_vector_sync available outside tests for delete-time caching. |
| diskann-bftree/src/provider.rs | Adds delete caches + accessor fallbacks for inplace-delete pruning; exposes clear_delete_caches(). |
| diskann-bftree/src/neighbors.rs | Switches neighbor serialization to a stack buffer fast path. |
| diskann-benchmark/src/inputs/mod.rs | Centralizes PRINT_WIDTH + write_field! for consistent summaries. |
| diskann-benchmark/src/inputs/graph_index.rs | Exposes IndexBuild helpers for reuse by bf_tree inputs. |
| diskann-benchmark/src/inputs/bftree.rs | New bf_tree input schemas (FP + spherical; static + streaming) and display/validation. |
| diskann-benchmark/src/backend/index/streaming/stats.rs | Adds a bf_tree-only GenericStats::noop() helper for skipped maintenance steps. |
| diskann-benchmark/src/backend/index/mod.rs | Registers bf_tree benchmarks behind the bftree feature. |
| diskann-benchmark/src/backend/index/bftree/mod.rs | New bf_tree benchmark module + shared streaming runner. |
| diskann-benchmark/src/backend/index/bftree/full_precision.rs | Implements bf_tree full-precision static + streaming benchmarks. |
| diskann-benchmark/src/backend/index/bftree/spherical.rs | Implements bf_tree spherical quantization static + streaming benchmarks. |
| diskann-benchmark/example/graph-index-bftree.json | Example config for bf_tree full-precision static build/search. |
| diskann-benchmark/example/graph-index-bftree-stream.json | Example config for bf_tree full-precision streaming runbook. |
| diskann-benchmark/example/graph-index-bftree-spherical.json | Example config for bf_tree spherical quantization static build/search. |
| diskann-benchmark/example/graph-index-bftree-spherical-stream.json | Example config for bf_tree spherical quantization streaming runbook. |
| diskann-benchmark/Cargo.toml | Adds bf_tree dependency + feature flag. |
| Cargo.toml | Adds diskann-bftree to workspace dependencies. |
| Cargo.lock | Locks in the new dependency edge. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b8cdc3a to
8b0469c
Compare
|
@copilot re-evaluate please |
Re-evaluated: you’re right, those |
0461814 to
cfddda1
Compare
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Jordan, this is looking good!
1170fe8 to
3bf48a1
Compare
|
I'm not sure I fully understand the bug in deletion. Why do we need to keep the deleted vector around for pruning its neighbors? If it's in someone's neighbor list after deletion, it should be ignored during pruning. Is there some difference in how bf-tree handles this versus how in-mem providers handle this that meant deletes needed to be handled differently? |
|
I think posting some experiments with a more complicated runbook would be a good idea, it looks like all the experimental results here are from |
There was a problem hiding this comment.
I left some cosmetic comments but overall the structure and code look good to me. For the streaming I have a couple of concerns:
- I would like to see experiments with a more complicated runbook to make sure we have caught any perf or correctness issues. I would especially want to make sure we've tested
replaceadequately. - I am not sure I understand why encountering deleted nodes during pruning and throwing a
Transienterror is a bug. This seems like intended behavior and the transient error can be appropriately ignored. If we instead leave the deleted vectors in the graph until the end of deletion and take them into account during pruning of other nodes, we're just wasting work by computing distances to them anyway. Can you explain the reasoning here if I am wrong?
Thanks!
In-mem marks the vectors as deleted but keeps them around to be used during the inplace delete as the query point for finding replacement neighbors. bf_tree's design decision to hard delete would break this flow, which is why storing them temporarily is necessary. The bug I'm addressing was introduced during the bftree migration out of providers and into its own crate and we opted to go for the hard deletes. |
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks - I think this is revealing issues with our inplace_delete API that need to get resolved. I'd say stick with hard deletes and avoid the inplace delete method that requires soft-deletes for now.
3bc7e49 to
860f7e6
Compare
|
Could you add your new json examples as integration tests to https://github.com/microsoft/DiskANN/blob/main/diskann-benchmark/src/main.rs? Now we have a runbook in our test data directory, we should have a norm that all example jsons should be wired up as integration tests. |
And to add on to this, perhaps you could modify the README to mention adding example jsons as integration tests when you mention creating them? |
26926eb to
b55e43e
Compare
|
Thank you for bearing with the weirdness of soft deletes and slot recycling, and for dealing with the learning curve of big-ann-benchmarks and runbooks! Now that the integration tests are present and you validated with a longer runbook, I am happy with this PR. |
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Jordan - almost there. My biggest concern is the code added to diskann, which I think could be greatly simplified by just changing the ranked error call in add_edge_and_prune to an allow_transient.
- Add bf-tree benchmark backend with integration tests for all example JSONs - Introduce SlotReclaim for immediate slot recycling in hard-delete providers - Reorder inplace_delete_inner to re-enable VisitedAndTopK for hard-delete providers - Make transient error handling conditional in robust_prune_list - Move run_streaming into a dedicated util file - Migrate NeighborProvider to scratch-based API (NeighborScratch) - Document hard-delete behavior, VisitedAndTopK incompatibility, and integration test requirements in README Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bdb1014 to
c23f170
Compare
hildebrandmw
left a comment
There was a problem hiding this comment.
Two quick comments, then we'll be good to go.
## Breaking changes since 0.53.0 ### Graph search: `DataProvider` contract collapsed into `SearchAccessor` (#1067) `Accessor`, `BuildQueryComputer`, `ExpandBeam`, `SearchExt`, and `AsNeighbor`/`NeighborAccessor` are merged into a single `SearchAccessor` trait. The indexing layer no longer has a notion of element types. - **Upgrade:** Implement `SearchAccessor` instead of the removed traits; use `SearchAccessor::expand_beam` for search. `SearchStrategy`/`InsertStrategy`/`DefaultSearchStrategy`/`DefaultPostProcessor` now carry a lifetime, and the query is passed into `search_accessor` (accessors may now borrow the query). `SearchPostProcess` no longer takes a `QueryComputer` (only requires `HasId`). The blanket `workingset::Fill` impl for `workingset::Map` was removed — implement `Fill` yourself, or use the new synchronous `Map::fill` helper. ### Insert/prune: consolidated into `PruneAccessor` (#1138, follow-up to #1067) Removed `DelegateNeighbor`, `AsNeighbor`, `AsNeighborMut`, `HasElementRef`, `BuildDistanceComputer`, `workingset::Fill`, and `workingset::AsWorkingSet`, folded into a single `PruneAccessor` trait. - **Upgrade:** Implement `PruneAccessor` (provides `neighbors()` for neighbor delegation and `fill()` returning both a `View` and the distance computer). Note `neighbors()` now borrows `&mut self`. ### `VectorId`: removed scalar conversion traits/bounds (#1145, #1133) Dropped `VectorIdTryFrom`, `TryIntoVectorId`, methods `vector_id_try_from`/`try_into_vector_id`, helpers `vecid_from_u32`/`vecid_from_usize`, and `IdConversionError`/`ErrorToVectorId`. Internal IDs no longer need to convert to/from `usize`. - **Upgrade:** Where `usize` conversion is still required (e.g. roaring-treemap keys in `diskann-label-filter`), add an explicit `IntoUsize` bound (now required on `RoaringAttributeStore`, `InlineBetaStrategy`, `QueryBitmapEvaluator`/`BitmapFilter`). `DiskANNIndex::prune_range` now takes `impl IntoIterator<Item = DP::InternalId> + Send` instead of `Range<DP::InternalId>` — construct the iterator for your ID type at the call site. `SimpleNeighborProviderAsync` and `bftree::VectorProvider` are no longer generic over the ID type (fixed to `u32`). ### `DiskIndexReader`: dropped vestigial `VectorType` generic (#1161) - **Upgrade:** Replace `DiskIndexReader::<T>::new(...)` with `DiskIndexReader::new(...)`. ### Filtered search renames (#1149) `MultihopSearch` → `MultihopFilterSearch`; benchmark config phases `MultiHopSearchPhase`/`InlineSearchPhase` → `MultihopFilterSearchPhase`/`InlineFilterSearchPhase`. - **Upgrade:** Update references to the new names. ### `diskann-garnet` FFI: BIN/Q8 quantizers, bumped to 2.0.0 (#1050) Vectors are now stored as `Poly<[u8], AlignOfEight>`; a type-erased `GarnetQuantizer` trait replaces index/provider type parameterization. New FFI: `insert()` returns a success/training-ready flag, plus `build_quant_table()` and `backfill_quant_vectors()` for caller-driven async training/backfill. Accessor renamed to `DynamicAccessor`; the FSM is now lockable and gained `visit_used()`. - **Upgrade:** Garnet consumers must adopt the new 2.0.0 FFI surface (handle the new `insert()` return flag and drive `build_quant_table`/`backfill_quant_vectors`). ## Notable fixes & features (non-breaking) - **Concurrency:** Fixed an ID double-minting race during concurrent insert/delete; delete now removes mapping/attributes before marking the ID free (#1146). Fixed garnet handling of missing quant vectors during `delete()` (#1130). - **Quantization:** Fixed quantizer detection in `train_quantizer()` / `set_element()` (#1140). - **Filtered search:** Added inline filtered search with adaptive `L` (#1131), per the filtered-search RFC (#1128). - **bftree:** Fixed a bug writing the length to the neighbor list (#1150). - **Benchmarks/infra:** Multi-vector MaxSim benchmark with BYOTE factory (#1027); `bf_tree` benchmark infrastructure (#1106); spherical exhaustive benchmark threadpool fix (#1148); right-sized `tiled_reduce` tile buffer (#1123). - `BufferedDistance` accepts `UnalignedSlice`: added a `PreprocessedDistanceFunction<UnalignedSlice<'_, T>>` impl for `BufferedDistance` (#1113).
Summary
Adds benchmark support for the bf_tree storage backend — both full-precision and spherical-quantized, in static (build-once) and streaming (insert/delete/search) modes.
What's included
Core crate fix (
diskann/src/graph/index.rs):inplace_delete_innernow pre-fetches the delete element before callingdelete(), so thatVisitedAndTopKcan use the deleted vector as a search query. This is critical for hard-delete providers (bf-tree) where the data is erased on delete. Soft-delete providers (in-mem) are unaffected since data remains accessible either way. All three delete methods (OneHop, TwoHopAndOneHop, VisitedAndTopK) continue to work for both provider types.Benchmark variants (4 tags):
graph-index-bftree-full-precision-f32— static build + searchgraph-index-bftree-stream-full-precision-f32— streaming with runbookgraph-index-build-bftree-spherical-quantization— static spherical (1/2/4-bit)graph-index-stream-bftree-spherical-quantization— streaming spherical (1/2/4-bit)Bug fixes in diskann-bftree:
Delete::delete. The ordering fix in the core crate ensures the delete element is captured before erasure, enabling all three inplace_delete methods to work correctly.append_vectorwould resize the neighbor list toself.dimand write the count atdim - 1, butset_neighborswrote a variable-length buffer. This inconsistency caused bf-tree page fragmentation and potential read errors when the two code paths produced different value sizes for the same key. Fix: both paths now write a fixed-size dim-element buffer using the same format (|neighbors|padding|count|).set_neighborsandappend_vectorpreviously allocated on every call. Replaced with a caller-provided&mut [u8]scratch buffer, eliminating per-call heap allocations.Benchmark infrastructure:
bftree_parameters_from(),run_streaming()in dedicated util moduleGenericStats::emptyreplaced withOption<GenericStats>for maintain stats (bf_tree has no separate drop_deleted/release phases)Example configs
See
diskann-benchmark/example/graph-index-bftree*.jsonfor ready-to-run configurations targeting test_data.Tests on CosmosDBDevBox VM (16 vCPU 64GB RAM 2,048GB SSD)
Results from MSTuring-1M
Full Precision streaming:
Spherical 2-bit streaming:
Spherical 4-bit streaming:
Comparison across bit widths at L=200:
The configuration used in the above tests was enough to keep all of the data in memory and didn't overflow to disk. I also tested full-precision using 32mb of cb size and experienced a 2x performance degradation due to the disk lookups.
CI Notes
CI reports an increase in IR size of 11%, which is over the 5% allowed in the CI file. This is largely unavoidable given the amount of code added here, so that CI gate should be ignored for this PR.