CNDB-13075: Reuse the analyzed tokens of the right operand of an analyzed expression #1620

adelapena · 2025-03-04T15:53:46Z

CNDB-10731 added support for index analyzers to RowFilter. Currently, both the left and right operands of an expression are analyzed on every evaluation. However, the right operand has the same fixed value over the life of the expression, so we are doing the same analysis once and again.

This PR memoizes the analyzed tokens of the right operand. It also adds some refactoring and simplifications around how analysis is dealt with in RowFilter and Operator.

github-actions · 2025-03-04T15:54:04Z

Checklist before you submit for review

Make sure there is a PR in the CNDB project updating the Converged Cassandra version
Use NoSpamLogger for log lines that may appear frequently in the logs
Verify test results on Butler
Test coverage for new/modified code is > 80%
Proper code formatting
Proper title for each commit staring with the project-issue number, like CNDB-1234
Each commit has a meaningful description
Each commit is not very long and contains related changes
Renames, moves and reformatting are in distinct commits

pkolaczk · 2025-03-10T15:42:49Z

src/java/org/apache/cassandra/cql3/Operator.java

Really extremely minor nit: I feel a bit uncomfortable when both of those overloads have the same name, yet one is meant to be called on a non-analyzed value and another on an analyzed. Like... there seems to be a subtle semantic difference between them (and you often can only call one of them but not both). Maybe they should have a different name and the one that takes the lists should say "analyzed" explicitly?

Anyway, feel free to dismiss this comment. This is more my gut feeling than an objective thing. But I simply wondered if you think the same.

I've renamed it to isSatisfiedByAnalyzed. I guess it makes it a bit easier to identify the calls for RowFilter.

pkolaczk · 2025-03-10T15:45:58Z

src/java/org/apache/cassandra/db/filter/RowFilter.java

Just curious, why did we need separate queryAnalyzer / indexAnalyzer before and now we don't?
Is it because of caching you introduced in queriedTokens?

The new Index.Analyzer has two separate methods for index and query time analysis.

pkolaczk · 2025-03-10T18:32:19Z

src/java/org/apache/cassandra/index/Index.java

Does it mean we're using this only at query time?
How do we analyze the values at index build time?

Before Index.Analyzer had a single analyzing method, and Index had two methods returning the read and write analyzers. Now Index.Analyzer has two analyzing methods for reads and writes, and Index only has one method returning the new multipurpose Index.Analyzer.

The reasons for encapsulating the two types of internal SAI analyzers into a single Index.Analyzer object are:

Make caching of the queried term a bit easier.

Try to make confusing both analyzers a bit harder, since there is only one analyzer in the filtering expressions.

Get rid of some assertions around checking consistency between both analyzer parameters. The two types of analyzers were moved around together as a pack, so I guess it makes sense to encapsulate them together.

sonarqubecloud · 2025-03-25T13:32:05Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
87.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

…yzed expression

sonarqubecloud · 2025-04-26T16:51:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
87.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-04-26T16:56:22Z

❌ Build ds-cassandra-pr-gate/PR-1620 rejected by Butler

2 new test failure(s) in 9 builds
See build details here

Found 2 new test failures

Test	Explanation	Branch history	Upstream history
...gLegacyIndex.test_sstableloader_with_failing_2i	regression	🔴🔵🔵🔵🔵🔵🔵	🔵🔵🔵🔵🔵🔵🔵
...epairTest.testForcedNormalRepairWithOneNodeDown	regression	🔴🔵🔵🔵🔵🔵🔵	🔵🔵🔵🔵🔵🔵🔵

Found 665 known test failures

…yzed expression (#1620) Memoize the analyzed tokens of the right operand of an analyzed expression. Also add some refactoring and simplifications around how analysis is dealt with in `RowFilter` and `Operator`.

adelapena self-assigned this Mar 4, 2025

adelapena marked this pull request as draft March 4, 2025 15:54

adelapena marked this pull request as ready for review March 5, 2025 12:55

adelapena force-pushed the CNDB-13075-main branch from fca2d4f to d608a33 Compare March 5, 2025 12:56

adelapena marked this pull request as draft March 5, 2025 12:56

adelapena marked this pull request as ready for review March 5, 2025 15:52

adelapena force-pushed the CNDB-13075-main branch 4 times, most recently from a7fb9a8 to 92d1ecf Compare March 6, 2025 15:48

pkolaczk reviewed Mar 11, 2025

View reviewed changes

pkolaczk approved these changes Mar 11, 2025

View reviewed changes

adelapena force-pushed the CNDB-13075-main branch from 92d1ecf to a3acff8 Compare March 12, 2025 13:47

adelapena force-pushed the CNDB-13075-main branch 2 times, most recently from ed9bd02 to ef15d96 Compare March 25, 2025 12:48

CNDB-13075: Reuse the analyzed tokens of the right operand of an anal…

d860ad6

…yzed expression

adelapena force-pushed the CNDB-13075-main branch from ef15d96 to d860ad6 Compare April 26, 2025 16:09

adelapena merged commit d0701b3 into main Apr 29, 2025
475 of 485 checks passed

adelapena deleted the CNDB-13075-main branch April 29, 2025 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CNDB-13075: Reuse the analyzed tokens of the right operand of an analyzed expression #1620

CNDB-13075: Reuse the analyzed tokens of the right operand of an analyzed expression #1620

Uh oh!

adelapena commented Mar 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 4, 2025

Uh oh!

pkolaczk Mar 10, 2025

Uh oh!

adelapena Mar 12, 2025

Uh oh!

pkolaczk Mar 10, 2025

Uh oh!

adelapena Mar 12, 2025

Uh oh!

pkolaczk Mar 10, 2025

Uh oh!

adelapena Mar 12, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Mar 25, 2025

Uh oh!

sonarqubecloud bot commented Apr 26, 2025

Uh oh!

cassci-bot commented Apr 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CNDB-13075: Reuse the analyzed tokens of the right operand of an analyzed expression #1620

CNDB-13075: Reuse the analyzed tokens of the right operand of an analyzed expression #1620

Uh oh!

Conversation

adelapena commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 4, 2025

Checklist before you submit for review

Uh oh!

pkolaczk Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

adelapena Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

pkolaczk Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

adelapena Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

pkolaczk Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

adelapena Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Mar 25, 2025

Quality Gate passed

Uh oh!

sonarqubecloud bot commented Apr 26, 2025

Quality Gate passed

Uh oh!

cassci-bot commented Apr 26, 2025

❌ Build ds-cassandra-pr-gate/PR-1620 rejected by Butler

Found 2 new test failures

Found 665 known test failures

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adelapena commented Mar 4, 2025 •

edited

Loading

adelapena Mar 12, 2025 •

edited

Loading