Skip to content

Conversation

michaeljmarshall
Copy link
Member

What is the issue

Creating this to run tests, will rebase once #2037 is merged (or I will merge this into that)

What does this PR fix and why was it fixed

...

michaeljmarshall and others added 10 commits October 3, 2025 16:59
KeyRangeUnionIterator merges streams of primary keys in such a way that
duplicates are removed. Unfortunately it does not properly account
for the fact that if a key with the empty clustering meets a key
with a non-empty clustering and the same partition key, we must
always return the key with the emtpy clustering. A key with an empty
clustering will always fetch the rows matched by any specific row
key for the same partition, but the reverse is not true.
Due to a very similar problem like in KeyRangeUnionIterator,
KeyRangeIntersectionIterator could return either too few or
too many keys, when keys with empty clusterings and keys
with non-empty clusterings were present in the input key streams.

In particular consider 2 input streams A and B with the following
keys:

A:
0: (1, Clustering.EMPTY)

B:
0: (1, 1)
1: (1, 2)

Key A.0 matches the whole partition 1. Therefore, the correct result
of intersection are both keys of stream B. Unfortunately, the algorithm
before this patch would advance both A and B iterators when emitting
the first matching key. At the beginning of the second step,
the iterator A would be already exhausted and no more keys would
be produced. Finally key B.1 would be missing from the results.

This patch fixes it by introducing two changes to the intersection
algorithm:

1. A key with non-empty clustering wins over a key with
empty clustering and same partition.

2. The selected highest key is not consumed while searching
for the highest matching key, but that happens only after the
search loop finds a match. Then we have more information
which iterators would be moved to the next item. Iterators positioned
at a key with an empty clustering can be advanced only after
we run out of keys with non-empty clustering in the same partition
or if there are no other keys with non-empty clustering.

This patch also fixes another issue where we could return
a less-specific key matching a full partition instead of a key
matching one row:

A:
0: (1, Clustering.EMPTY)

B:
0: (1, 1)

In that case the iterator returned a key with empty clustering,
which would result in fetching and postfiltering many unnecessary rows.
If the code under test had a bug and never advanced the
iterator properly, the test would fall into an infinite
loop and could consume all memory (eventually OOMing).
Additionally, fix one minor test issue introduced
by the previous commit, where we set a too low upper
bound on intersection result size. We cannot use max()
because a key with empty clustering can match multiple
keys on the other side.
The PrimaryKeyWithSource class has been
present for two years in the code base
as an optimization for hybrid vector workloads,
which have to materialize many primary keys
in the search-then-sort query path.

However, the logic is invalid for version
aa (because we have the bug where compacted
sstables write per row, not per partition)
and it is also invalid for static columns.

I think we need to find a way to use the
PrimaryKeyWithSource in row aware cases
due to the performance benefits, but we
need to remove it from the above scenarios.
Copy link

github-actions bot commented Oct 8, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

Copy link

@pkolaczk pkolaczk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have one minor suggestion about the newly introduced PKM interface method, but it's not a blocker.

Copy link

sonarqubecloud bot commented Oct 9, 2025

@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-2040 approved by Butler


Approved by Butler
See build details here

@michaeljmarshall
Copy link
Member Author

Superseded by #2037

@michaeljmarshall michaeljmarshall deleted the cndb-14861 branch October 9, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants