Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
e3e1651
Sync changes from CDB_DiskANN repo
gopal-msr Feb 10, 2026
ec2091f
Before merging with main
gopal-msr Feb 10, 2026
5ad6e44
Merge with main
gopal-msr Feb 10, 2026
a949024
Working version of inline beta search
gopal-msr Feb 16, 2026
95cd4a9
Merge branch 'main' into sync-from-cdb-diskann
gopal-msr Feb 16, 2026
98ad4f7
Removing unnecessary stats
gopal-msr Feb 17, 2026
edfdee6
Fix clippy warnings
Mar 2, 2026
cbb77f6
use search and build benchmark apis
Mar 12, 2026
670782f
Rename struct for recall metrics
Mar 12, 2026
6c2c967
Use copyIds
Mar 12, 2026
d13dc7f
Use renamed struct in SearchResults
Mar 12, 2026
bd19bde
Evaluate query progressively without flattening and cloning the json …
Mar 12, 2026
8e3a89b
Use config api to validate values
Mar 12, 2026
40b1314
Specify number of threads explicitly
Mar 12, 2026
3d23972
Error when the visit of input expression fails when creating encodedF…
Mar 12, 2026
9e35ccb
remove new runtime method added, use method in benchmark::core
Mar 12, 2026
5a8c560
Use dispatch rule to validate benchmark type support
Mar 12, 2026
de20365
Use compute_medioid helper
Mar 12, 2026
9c477cd
Remaining changes from search + build api refactor
Mar 12, 2026
f87e24a
Merge branch 'gopalsr/integrate_bm' into sync-from-cdb-diskann
Mar 12, 2026
7f24432
Apply suggestion from @Copilot
sampathrg Mar 12, 2026
9c1870a
fix merge errors
Mar 12, 2026
54471b0
Merge branch 'sync-from-cdb-diskann' of https://github.com/microsoft/…
Mar 12, 2026
1591956
Fix merge errors white recall metrics
sampathrg Mar 12, 2026
f13e849
Remove the need for Vec<T> variants
sampathrg Mar 16, 2026
7a1244a
Undo unecessary change
sampathrg Mar 16, 2026
86208c8
Remove whitespaces from file
sampathrg Mar 16, 2026
4441dc7
Fix formatting errors
sampathrg Mar 16, 2026
5ed93b9
Fix clippy warning
sampathrg Mar 16, 2026
28c8ec3
Merge branch 'main' into sync-from-cdb-diskann
sampathrg Mar 16, 2026
072edb5
Fix build errors after merge with main - Use Knn instead of the old S…
sampathrg Mar 16, 2026
6a31782
Undo rename of RecallMetrics
sampathrg Mar 16, 2026
a70ee53
Remove fallback to unfiltered search.
sampathrg Mar 18, 2026
a313013
Fix formatting error
sampathrg Mar 18, 2026
7ea6e4e
Address review comments
sampathrg Mar 23, 2026
8010d37
Formatting + revert to old names for some functions
sampathrg Mar 23, 2026
8a86e0d
Add some unit tests + smoke test for benchmark
sampathrg Mar 23, 2026
5664dca
Update output serializer + remove unnecessary type parameter
sampathrg Mar 23, 2026
b03bef4
Put the benchmarks and the smoke test behind a feature
sampathrg Mar 23, 2026
c1b7a3c
changes to Cargo.lock
sampathrg Mar 23, 2026
026bb91
Merge branch 'main' into sync-from-cdb-diskann
sampathrg Mar 23, 2026
5ebe477
Move the tests to a separate folder as is the convention
sampathrg Mar 26, 2026
c9a97bd
Fix formatting
sampathrg Mar 26, 2026
cc6a9b6
Fix formatting
sampathrg Mar 26, 2026
27e20a9
Merge branch 'main' into sync-from-cdb-diskann
sampathrg Mar 26, 2026
47a2e69
Fix build errors after merging with main
sampathrg Mar 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ env:
RUST_BACKTRACE: 1
# The features we want to explicitly test. For example, the `flatbuffers-build` feature
# of `diskann-quantization` requires additional setup and so must not be included by default.
DISKANN_FEATURES: "virtual_storage,bf_tree,spherical-quantization,product-quantization,tracing,experimental_diversity_search,disk-index,flatbuffers,linalg,codegen"
DISKANN_FEATURES: "virtual_storage,bf_tree,spherical-quantization,product-quantization,tracing,experimental_diversity_search,disk-index,flatbuffers,linalg,codegen,document-index"

# Use the Rust version specified in rust-toolchain.toml
rust_stable: "1.92"
Expand Down
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions diskann-benchmark/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@ scalar-quantization = []
# Enable minmax-quantization based algorithms
minmax-quantization = []

# Enable Document Index benchmarks
document-index = []

# Enable Disk Index benchmarks
disk-index = [
"diskann-disk/perf_test",
Expand Down
39 changes: 39 additions & 0 deletions diskann-benchmark/example/document-filter.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{
"search_directories": [
"test_data/disk_index_search"
],
"jobs": [
{
"type": "document-index-build",
"content": {
"build": {
"data_type": "float32",
"data": "disk_index_siftsmall_learn_256pts_data.fbin",
"data_labels": "data.256.label.jsonl",
"distance": "squared_l2",
"max_degree": 32,
"l_build": 50,
"alpha": 1.2,
"num_threads": 4
},
"search": {
"queries": "disk_index_sample_query_10pts.fbin",
"query_predicates": "query.10.label.jsonl",
"groundtruth": "disk_index_10pts_idx_uint32_truth_search_filter_res.bin",
"beta": 0.5,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to turn this parameter for different scenarios, such as small/large dataset, different embedding models, or 0.5 will be better value for most of sceanrios?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fixed. This parameter is used to boost the search similarity score when something matches a filter. That score gets used in while prioritizing what to walk next. Tagging @gopalrs to confirm.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimentally, 0.5 has proved to give satisfactory results, with the caveat that it is dependent on filter density. But once we have the query planner in place, that caveat will be removed.

"reps": 5,
"num_threads": [
1
],
"runs": [
{
"search_n": 20,
"search_l": [20, 30, 40],
"recall_k": 10
}
]
}
}
}
]
}
Loading
Loading