-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-13196: Fix BM25 file access race condition #1622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We were incorrectly sharing a file reader across threads, which can lead to invalid results, as the test shows.
Checklist before you submit for review
|
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
adelapena
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I have only left a couple of suggestions that can be addressed on commit.
The primary goal here is to increase the probability of hitting the race. It seems it still isn't perfect, but anecdotally, we hit it frequently on my machine. Further, this kind of bug is not hard to see, so I don't think it is worth further time trying to make a perfect test.
|
✔️ Build ds-cassandra-pr-gate/PR-1622 approved by ButlerApproved by Butler |
Fixes: riptano/cndb#13196 We were incorrectly sharing a file reader across threads in BM25 queries, which can lead to invalid results, as I reproduced in the test. The PR fixes the issue by creating a `DocLengthsReader` object per query.
Fixes: riptano/cndb#13196 We were incorrectly sharing a file reader across threads in BM25 queries, which can lead to invalid results, as I reproduced in the test. The PR fixes the issue by creating a `DocLengthsReader` object per query.



Fixes: https://github.com/riptano/cndb/issues/13196
What is the issue
We were incorrectly sharing a file reader across threads in BM25 queries, which can lead to invalid results, as I reproduced in the test.
What does this PR fix and why was it fixed
The PR fixes the issue by creating a
DocLengthsReaderobject per query.