-
Notifications
You must be signed in to change notification settings - Fork 391
Support for inline-beta filtered search with expressions #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gopalrs
wants to merge
46
commits into
main
Choose a base branch
from
sync-from-cdb-diskann
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
e3e1651
Sync changes from CDB_DiskANN repo
gopal-msr ec2091f
Before merging with main
gopal-msr 5ad6e44
Merge with main
gopal-msr a949024
Working version of inline beta search
gopal-msr 95cd4a9
Merge branch 'main' into sync-from-cdb-diskann
gopal-msr 98ad4f7
Removing unnecessary stats
gopal-msr edfdee6
Fix clippy warnings
cbb77f6
use search and build benchmark apis
670782f
Rename struct for recall metrics
6c2c967
Use copyIds
d13dc7f
Use renamed struct in SearchResults
bd19bde
Evaluate query progressively without flattening and cloning the json …
8e3a89b
Use config api to validate values
40b1314
Specify number of threads explicitly
3d23972
Error when the visit of input expression fails when creating encodedF…
9e35ccb
remove new runtime method added, use method in benchmark::core
5a8c560
Use dispatch rule to validate benchmark type support
de20365
Use compute_medioid helper
9c477cd
Remaining changes from search + build api refactor
f87e24a
Merge branch 'gopalsr/integrate_bm' into sync-from-cdb-diskann
7f24432
Apply suggestion from @Copilot
sampathrg 9c1870a
fix merge errors
54471b0
Merge branch 'sync-from-cdb-diskann' of https://github.com/microsoft/…
1591956
Fix merge errors white recall metrics
sampathrg f13e849
Remove the need for Vec<T> variants
sampathrg 7a1244a
Undo unecessary change
sampathrg 86208c8
Remove whitespaces from file
sampathrg 4441dc7
Fix formatting errors
sampathrg 5ed93b9
Fix clippy warning
sampathrg 28c8ec3
Merge branch 'main' into sync-from-cdb-diskann
sampathrg 072edb5
Fix build errors after merge with main - Use Knn instead of the old S…
sampathrg 6a31782
Undo rename of RecallMetrics
sampathrg a70ee53
Remove fallback to unfiltered search.
sampathrg a313013
Fix formatting error
sampathrg 7ea6e4e
Address review comments
sampathrg 8010d37
Formatting + revert to old names for some functions
sampathrg 8a86e0d
Add some unit tests + smoke test for benchmark
sampathrg 5664dca
Update output serializer + remove unnecessary type parameter
sampathrg b03bef4
Put the benchmarks and the smoke test behind a feature
sampathrg c1b7a3c
changes to Cargo.lock
sampathrg 026bb91
Merge branch 'main' into sync-from-cdb-diskann
sampathrg 5ebe477
Move the tests to a separate folder as is the convention
sampathrg c9a97bd
Fix formatting
sampathrg cc6a9b6
Fix formatting
sampathrg 27e20a9
Merge branch 'main' into sync-from-cdb-diskann
sampathrg 47a2e69
Fix build errors after merging with main
sampathrg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| { | ||
| "search_directories": [ | ||
| "test_data/disk_index_search" | ||
| ], | ||
| "jobs": [ | ||
| { | ||
| "type": "document-index-build", | ||
| "content": { | ||
| "build": { | ||
| "data_type": "float32", | ||
| "data": "disk_index_siftsmall_learn_256pts_data.fbin", | ||
| "data_labels": "data.256.label.jsonl", | ||
| "distance": "squared_l2", | ||
| "max_degree": 32, | ||
| "l_build": 50, | ||
| "alpha": 1.2, | ||
| "num_threads": 4 | ||
| }, | ||
| "search": { | ||
| "queries": "disk_index_sample_query_10pts.fbin", | ||
| "query_predicates": "query.10.label.jsonl", | ||
| "groundtruth": "disk_index_10pts_idx_uint32_truth_search_filter_res.bin", | ||
| "beta": 0.5, | ||
| "reps": 5, | ||
| "num_threads": [ | ||
| 1 | ||
| ], | ||
| "runs": [ | ||
| { | ||
| "search_n": 20, | ||
| "search_l": [20, 30, 40], | ||
| "recall_k": 10 | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to turn this parameter for different scenarios, such as small/large dataset, different embedding models, or 0.5 will be better value for most of sceanrios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fixed. This parameter is used to boost the search similarity score when something matches a filter. That score gets used in while prioritizing what to walk next. Tagging @gopalrs to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimentally, 0.5 has proved to give satisfactory results, with the caveat that it is dependent on filter density. But once we have the query planner in place, that caveat will be removed.