Skip to content

Conversation

@adelapena
Copy link

@adelapena adelapena commented Mar 4, 2025

CNDB-10731 added support for index analyzers to RowFilter. Currently, both the left and right operands of an expression are analyzed on every evaluation. However, the right operand has the same fixed value over the life of the expression, so we are doing the same analysis once and again.

This PR memoizes the analyzed tokens of the right operand. It also adds some refactoring and simplifications around how analysis is dealt with in RowFilter and Operator.

@adelapena adelapena self-assigned this Mar 4, 2025
@github-actions
Copy link

github-actions bot commented Mar 4, 2025

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

@adelapena adelapena marked this pull request as draft March 4, 2025 15:54
@adelapena adelapena marked this pull request as ready for review March 5, 2025 12:55
@adelapena adelapena marked this pull request as draft March 5, 2025 12:56
@adelapena adelapena marked this pull request as ready for review March 5, 2025 15:52
@adelapena adelapena force-pushed the CNDB-13075-main branch 4 times, most recently from a7fb9a8 to 92d1ecf Compare March 6, 2025 15:48

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really extremely minor nit: I feel a bit uncomfortable when both of those overloads have the same name, yet one is meant to be called on a non-analyzed value and another on an analyzed. Like... there seems to be a subtle semantic difference between them (and you often can only call one of them but not both). Maybe they should have a different name and the one that takes the lists should say "analyzed" explicitly?

Anyway, feel free to dismiss this comment. This is more my gut feeling than an objective thing. But I simply wondered if you think the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it to isSatisfiedByAnalyzed. I guess it makes it a bit easier to identify the calls for RowFilter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why did we need separate queryAnalyzer / indexAnalyzer before and now we don't?
Is it because of caching you introduced in queriedTokens?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Index.Analyzer has two separate methods for index and query time analysis.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we're using this only at query time?
How do we analyze the values at index build time?

Copy link
Author

@adelapena adelapena Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before Index.Analyzer had a single analyzing method, and Index had two methods returning the read and write analyzers. Now Index.Analyzer has two analyzing methods for reads and writes, and Index only has one method returning the new multipurpose Index.Analyzer.

The reasons for encapsulating the two types of internal SAI analyzers into a single Index.Analyzer object are:

  • Make caching of the queried term a bit easier.
  • Try to make confusing both analyzers a bit harder, since there is only one analyzer in the filtering expressions.
  • Get rid of some assertions around checking consistency between both analyzer parameters. The two types of analyzers were moved around together as a pack, so I guess it makes sense to encapsulate them together.

@adelapena adelapena force-pushed the CNDB-13075-main branch 2 times, most recently from ed9bd02 to ef15d96 Compare March 25, 2025 12:48
@sonarqubecloud
Copy link

@sonarqubecloud
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1620 rejected by Butler


2 new test failure(s) in 9 builds
See build details here


Found 2 new test failures

Test Explanation Branch history Upstream history
...gLegacyIndex.test_sstableloader_with_failing_2i regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
...epairTest.testForcedNormalRepairWithOneNodeDown regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵

Found 665 known test failures

@adelapena adelapena merged commit d0701b3 into main Apr 29, 2025
475 of 485 checks passed
@adelapena adelapena deleted the CNDB-13075-main branch April 29, 2025 15:13
driftx pushed a commit that referenced this pull request Jun 13, 2025
…yzed expression (#1620)

Memoize the analyzed tokens of the right operand of an analyzed expression.
Also add some refactoring and simplifications around how analysis is dealt with
in `RowFilter` and `Operator`.
driftx pushed a commit that referenced this pull request Jun 13, 2025
…yzed expression (#1620)

Memoize the analyzed tokens of the right operand of an analyzed expression.
Also add some refactoring and simplifications around how analysis is dealt with
in `RowFilter` and `Operator`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants