CNDB-14861: Fix usage of PrimaryKeyWithSource in SAI #2040

michaeljmarshall · 2025-10-08T20:23:08Z

What is the issue

Creating this to run tests, will rebase once #2037 is merged (or I will merge this into that)

What does this PR fix and why was it fixed

...

KeyRangeUnionIterator merges streams of primary keys in such a way that duplicates are removed. Unfortunately it does not properly account for the fact that if a key with the empty clustering meets a key with a non-empty clustering and the same partition key, we must always return the key with the emtpy clustering. A key with an empty clustering will always fetch the rows matched by any specific row key for the same partition, but the reverse is not true.

Due to a very similar problem like in KeyRangeUnionIterator, KeyRangeIntersectionIterator could return either too few or too many keys, when keys with empty clusterings and keys with non-empty clusterings were present in the input key streams. In particular consider 2 input streams A and B with the following keys: A: 0: (1, Clustering.EMPTY) B: 0: (1, 1) 1: (1, 2) Key A.0 matches the whole partition 1. Therefore, the correct result of intersection are both keys of stream B. Unfortunately, the algorithm before this patch would advance both A and B iterators when emitting the first matching key. At the beginning of the second step, the iterator A would be already exhausted and no more keys would be produced. Finally key B.1 would be missing from the results. This patch fixes it by introducing two changes to the intersection algorithm: 1. A key with non-empty clustering wins over a key with empty clustering and same partition. 2. The selected highest key is not consumed while searching for the highest matching key, but that happens only after the search loop finds a match. Then we have more information which iterators would be moved to the next item. Iterators positioned at a key with an empty clustering can be advanced only after we run out of keys with non-empty clustering in the same partition or if there are no other keys with non-empty clustering. This patch also fixes another issue where we could return a less-specific key matching a full partition instead of a key matching one row: A: 0: (1, Clustering.EMPTY) B: 0: (1, 1) In that case the iterator returned a key with empty clustering, which would result in fetching and postfiltering many unnecessary rows.

If the code under test had a bug and never advanced the iterator properly, the test would fall into an infinite loop and could consume all memory (eventually OOMing).

Additionally, fix one minor test issue introduced by the previous commit, where we set a too low upper bound on intersection result size. We cannot use max() because a key with empty clustering can match multiple keys on the other side.

The PrimaryKeyWithSource class has been present for two years in the code base as an optimization for hybrid vector workloads, which have to materialize many primary keys in the search-then-sort query path. However, the logic is invalid for version aa (because we have the bug where compacted sstables write per row, not per partition) and it is also invalid for static columns. I think we need to find a way to use the PrimaryKeyWithSource in row aware cases due to the performance benefits, but we need to remove it from the above scenarios.

github-actions · 2025-10-08T20:23:27Z

src/java/org/apache/cassandra/index/sai/disk/PostingListKeyRangeIterator.java

src/java/org/apache/cassandra/index/sai/disk/PrimaryKeyMap.java

src/java/org/apache/cassandra/index/sai/disk/v2/RowAwarePrimaryKeyMap.java

pkolaczk

Looks good. I have one minor suggestion about the newly introduced PKM interface method, but it's not a blocker.

sonarqubecloud · 2025-10-09T17:32:48Z

Quality Gate passed

Issues
0 New issues
2 Accepted issues

Measures
0 Security Hotspots
84.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-10-09T17:36:52Z

✔️ Build ds-cassandra-pr-gate/PR-2040 approved by Butler

Approved by Butler
See build details here

michaeljmarshall · 2025-10-09T18:32:20Z

Superseded by #2037

michaeljmarshall and others added 10 commits October 3, 2025 16:59

CNDB-15570: Reproducer

55b09ea

Only use partitions in ResultRetriever if index not row aware

d1304d4

CNDB-15570: Add more randomized union/intersection iterator tests

64356ae

CNDB-15570: Fix duplicate keys issue in KeyRangeUnionIterator

d831783

CNDB-15570: Add one more intersection test

b90efa9

CNDB-15570: Protect test code against infinite loop

4ee0356

If the code under test had a bug and never advanced the iterator properly, the test would fall into an infinite loop and could consume all memory (eventually OOMing).

CNDB-15570: Add randomized tests for skipping

5df97b1

Additionally, fix one minor test issue introduced by the previous commit, where we set a too low upper bound on intersection result size. We cannot use max() because a key with empty clustering can match multiple keys on the other side.

michaeljmarshall force-pushed the cndb-14861 branch from a71bca2 to 8ea0924 Compare October 8, 2025 21:14

pkolaczk reviewed Oct 9, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/disk/PostingListKeyRangeIterator.java Show resolved Hide resolved

pkolaczk reviewed Oct 9, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/disk/PrimaryKeyMap.java Show resolved Hide resolved

pkolaczk reviewed Oct 9, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/disk/v2/RowAwarePrimaryKeyMap.java Show resolved Hide resolved

pkolaczk approved these changes Oct 9, 2025

View reviewed changes

michaeljmarshall force-pushed the cndb-14861 branch from 8ea0924 to 3479e31 Compare October 9, 2025 16:27

CNDB-14861: Fix usage of PrimaryKeyWithSource in SAI

932caff

michaeljmarshall force-pushed the cndb-14861 branch from 3479e31 to 932caff Compare October 9, 2025 16:36

michaeljmarshall closed this Oct 9, 2025

michaeljmarshall deleted the cndb-14861 branch October 9, 2025 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CNDB-14861: Fix usage of PrimaryKeyWithSource in SAI #2040

CNDB-14861: Fix usage of PrimaryKeyWithSource in SAI #2040

Uh oh!

michaeljmarshall commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pkolaczk left a comment

Uh oh!

sonarqubecloud bot commented Oct 9, 2025

Uh oh!

cassci-bot commented Oct 9, 2025

Uh oh!

michaeljmarshall commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CNDB-14861: Fix usage of PrimaryKeyWithSource in SAI #2040

CNDB-14861: Fix usage of PrimaryKeyWithSource in SAI #2040

Uh oh!

Conversation

michaeljmarshall commented Oct 8, 2025

What is the issue

What does this PR fix and why was it fixed

Uh oh!

github-actions bot commented Oct 8, 2025

Checklist before you submit for review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pkolaczk left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Oct 9, 2025

Quality Gate passed

Uh oh!

cassci-bot commented Oct 9, 2025

✔️ Build ds-cassandra-pr-gate/PR-2040 approved by Butler

Uh oh!

michaeljmarshall commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants