-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-15919: Optimize SAI NOT queries, push logic into posting lists #2112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Checklist before you submit for review
|
There are still some issues to be solved. Namely, dealing with version AA.
25d7f7e to
36fe292
Compare
pkolaczk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the other minor things I mentioned, we need a test for verifying how NEQ works together with token range / partition key restriction.
src/java/org/apache/cassandra/index/sai/disk/v1/postings/ComplementPostingList.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v1/IndexSearcher.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v1/IndexSearcher.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v1/IndexSearcher.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v1/IndexSearcher.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v1/SegmentMetadata.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v1/SegmentMetadataBuilder.java
Outdated
Show resolved
Hide resolved
bd99cd6 to
b16fd5e
Compare
Results on current commit:
[java] Result "org.apache.cassandra.test.microbench.index.sai.NEQQueryBench.queryNEQ":
[java] 7198946.987 ±(99.9%) 155705.068 ns/op [Average]
[java] (min, avg, max) = (7086466.212, 7198946.987, 7464311.708), stdev = 102989.262
[java] CI (99.9%): [7043241.919, 7354652.054] (assumes normal distribution)
[java]
[java] Benchmark (numRowsWithinPartition) Mode Cnt Score Error Units
[java] NEQQueryBench.queryNEQ 1000 avgt 10 1109937.157 ± 2362.813 ns/op
[java] NEQQueryBench.queryNEQ 10000 avgt 10 7198946.987 ± 155705.068 ns/op
Benchmark results for baseline.
[java] Result "org.apache.cassandra.test.microbench.index.sai.NEQQueryBench.queryNEQ":
[java] 327531893.145 ±(99.9%) 1209098.423 ns/op [Average]
[java] (min, avg, max) = (326375963.710, 327531893.145, 329248861.548), stdev = 799743.744
[java] CI (99.9%): [326322794.722, 328740991.569] (assumes normal distribution)
[java]
[java] Benchmark (numRowsWithinPartition) Mode Cnt Score Error Units
[java] NEQQueryBench.queryNEQ 1000 avgt 10 32187371.679 ± 67260.318 ns/op
[java] NEQQueryBench.queryNEQ 10000 avgt 10 327531893.145 ± 1209098.423 ns/op
[java] Result "org.apache.cassandra.test.microbench.index.sai.NEQQueryBench.queryNEQ":
[java] 17761666.739 ±(99.9%) 260418.205 ns/op [Average]
[java] (min, avg, max) = (17644295.340, 17761666.739, 18241636.689), stdev = 172250.519
[java] CI (99.9%): [17501248.533, 18022084.944] (assumes normal distribution)
[java]
[java] Benchmark (numRowsWithinPartition) Mode Cnt Score Error Units
[java] NEQQueryBench.queryNEQ 1000 avgt 10 2169861.695 ± 11355.830 ns/op
[java] NEQQueryBench.queryNEQ 10000 avgt 10 17761666.739 ± 260418.205 ns/op
b16fd5e to
5c4fa18
Compare
|
@pkolaczk - I added a benchmark and discovered that in the most extreme cases, we can get most of the benefit for an extremely small code change, compared to what I had proposed before. I think we should go with this one for now and leave a deeper re-write for later, if there is demand. |
|
❌ Build ds-cassandra-pr-gate/PR-2112 rejected by Butler1 regressions found Found 1 new test failures
Found 1 known test failures |



What is the issue
Fixes: https://github.com/riptano/cndb/issues/15919
Test PR: https://github.com/riptano/cndb/pull/15949
What does this PR fix and why was it fixed
In the original implementation for #820, we introduced the
PrimaryKeyMapIteratorto iterate all primary keys in an sstable and then do an anti-join on the result of an equality query. That design works, but requires some additional reads from disk to get primary keys that are unnecessary.There are two possible solutions:
primaryKeyFromRowIdthat takes primary key bounds and then uses a row id, when rows are from the same sstable. This will be worse that solution 1 because it creates an object per key and requires comparing sstable ids before comparing sstable row ids, but it is a significant improvement over the current solution, which hits disk to load the primary key.When testing on my local machine and reviewing the JMH benchmarks, I can see that the current solution is about 16x worse than the minimum solution (2) and 32x worse than the optimal (1) solution. Given that the benchmarks in question are highly specific to the use case, I do no think we have sufficient motivation to introduce the exceedingly complex (1) solution.
Note that the ideal solution to 1, that would have much less complexity, is to convert posting lists into a single iterator of sstable row ids, and then to take the complement of them.