-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-14861: Fix usage of PrimaryKeyWithSource in SAI #2040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
KeyRangeUnionIterator merges streams of primary keys in such a way that duplicates are removed. Unfortunately it does not properly account for the fact that if a key with the empty clustering meets a key with a non-empty clustering and the same partition key, we must always return the key with the emtpy clustering. A key with an empty clustering will always fetch the rows matched by any specific row key for the same partition, but the reverse is not true.
Due to a very similar problem like in KeyRangeUnionIterator, KeyRangeIntersectionIterator could return either too few or too many keys, when keys with empty clusterings and keys with non-empty clusterings were present in the input key streams. In particular consider 2 input streams A and B with the following keys: A: 0: (1, Clustering.EMPTY) B: 0: (1, 1) 1: (1, 2) Key A.0 matches the whole partition 1. Therefore, the correct result of intersection are both keys of stream B. Unfortunately, the algorithm before this patch would advance both A and B iterators when emitting the first matching key. At the beginning of the second step, the iterator A would be already exhausted and no more keys would be produced. Finally key B.1 would be missing from the results. This patch fixes it by introducing two changes to the intersection algorithm: 1. A key with non-empty clustering wins over a key with empty clustering and same partition. 2. The selected highest key is not consumed while searching for the highest matching key, but that happens only after the search loop finds a match. Then we have more information which iterators would be moved to the next item. Iterators positioned at a key with an empty clustering can be advanced only after we run out of keys with non-empty clustering in the same partition or if there are no other keys with non-empty clustering. This patch also fixes another issue where we could return a less-specific key matching a full partition instead of a key matching one row: A: 0: (1, Clustering.EMPTY) B: 0: (1, 1) In that case the iterator returned a key with empty clustering, which would result in fetching and postfiltering many unnecessary rows.
If the code under test had a bug and never advanced the iterator properly, the test would fall into an infinite loop and could consume all memory (eventually OOMing).
Additionally, fix one minor test issue introduced by the previous commit, where we set a too low upper bound on intersection result size. We cannot use max() because a key with empty clustering can match multiple keys on the other side.
The PrimaryKeyWithSource class has been present for two years in the code base as an optimization for hybrid vector workloads, which have to materialize many primary keys in the search-then-sort query path. However, the logic is invalid for version aa (because we have the bug where compacted sstables write per row, not per partition) and it is also invalid for static columns. I think we need to find a way to use the PrimaryKeyWithSource in row aware cases due to the performance benefits, but we need to remove it from the above scenarios.
Checklist before you submit for review
|
a71bca2
to
8ea0924
Compare
src/java/org/apache/cassandra/index/sai/disk/PostingListKeyRangeIterator.java
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v2/RowAwarePrimaryKeyMap.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I have one minor suggestion about the newly introduced PKM interface method, but it's not a blocker.
8ea0924
to
3479e31
Compare
3479e31
to
932caff
Compare
|
✔️ Build ds-cassandra-pr-gate/PR-2040 approved by ButlerApproved by Butler |
Superseded by #2037 |
What is the issue
Creating this to run tests, will rebase once #2037 is merged (or I will merge this into that)
What does this PR fix and why was it fixed
...