Skip to content

Conversation

@adelapena
Copy link

Fix RowFilter application of analyzers to queries on collections.

RowFilter.SimpleExpression#isSatisfiedBy for CONTAINS and CONTAINS_KEY operators should extract the collection values and check them with Operator.ANALYZER_MATCHES rather than with equality.

@adelapena adelapena self-assigned this Feb 19, 2025
@github-actions
Copy link

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

return mapType.compose(foundValue).containsValue(mapType.getValuesType().compose(value));
}
throw new AssertionError();
return foundValue != null && Operator.CONTAINS.isSatisfiedBy(type, foundValue, value, indexAnalyzer, queryAnalyzer);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is so much better!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage complains we cover 3 of 4 conditions only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we cannot do a similar logical simplification in the column.isComplex() case?

* @param indexAnalyzer an index-provided function to transform the left-side compared value before comparison,
* it shouldn't be {@code null}.
*/
public boolean isSatisfiedBy(AbstractType<?> type,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new method can be used in a future improvement patch to cache the tokens of the right, fixed part of the operation in the calling RowFilter.Expression. That way we wouldn't need to analyze that fixed part once and again for every evaluated row. I'm leaving that improvement out of this ticket so we can focus on fixing the correctness bug as soon as possible.

@adelapena
Copy link
Author

PR for CNDB: https://github.com/riptano/cndb/pull/13047

Copy link

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the splits in CI failed.
We need PR against the other branch. Also, do we need to cover in tests ALLOW FILTERING?
Otherwise the patch LGTM. We need to check why test coverage is not 100%. I will do it after our meeting.

Otherwise the code LGTM

List<ByteBuffer> rightTokens,
Index.Analyzer indexAnalyzer)
{
throw new UnsupportedOperationException();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record - Test coverage complains we don't cover this.....

{
assert column.type.isCollection();
CollectionType<?> type = (CollectionType<?>)column.type;
assert (indexAnalyzer == null) == (queryAnalyzer == null);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the assertion test coverage complains about

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that shouldn't happen given how the expressions are built (see here). I can only reproduce this by convoluted synthetic means which probably don't give too much value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the point of assertions is to test invariants that are not currently reachable but might become reachable in the future but that would break this code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the point of assertions is to test invariants that are not currently reachable but might become reachable in the future but that would break this code.

Exactly, as mentioned in Slack I marked them mostly for history.
If later someone wants to check the coverage for an old PR - they see only the percentage and they cannot get to the details in Sonar anymore.

@ekaterinadimitrova2
Copy link

Just flagged on the PR which are the conditions/linkes test coverage was complaining about

Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I wonder if we can simplify the logic here and rely more on the Operator enums, but that can be done in a follow up change.

@ekaterinadimitrova2
Copy link

ekaterinadimitrova2 commented Feb 19, 2025

LGTM, I wonder if we can simplify the logic here and rely more on the Operator enums, but that can be done in a follow up change.

In Apache trunk that was already quite well done last summer. It makes things nicely compact and easier to add on top of them.
+1 on clean CI. The marked lines where test coverage complained are either UnsupportedOperationException thrown on implementations of inherited methods that shouldn’t be called because they are cut in validation before or assertions that we don't hit because the code does not have that bug.

@sonarqubecloud
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1583 rejected by Butler


1 new test failure(s) in 3 builds
See build details here


Found 1 new test failures

Test Explanation Branch history Upstream history
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴🔵🔵 🔵🔵🔵🔵🔵🔵🔵

Found 8 known test failures

Copy link

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI seems fine to me. @michaeljmarshall you may want to check the Generic Order By failures, but they do not seem related to what we do here. More like flakies. They also do not fail for me locally.
+1

@adelapena adelapena merged commit 780431d into main Feb 20, 2025
465 of 477 checks passed
@adelapena adelapena deleted the CNDB-13022-main branch February 20, 2025 11:29
djatnieks pushed a commit that referenced this pull request Mar 11, 2025
Fix `RowFilter` application of analyzers to queries on collections.

`RowFilter.SimpleExpression#isSatisfiedBy` for `CONTAINS` and
`CONTAINS_KEY` operators should extract the collection values and check
them with `Operator.ANALYZER_MATCHES` rather than with equality.
djatnieks pushed a commit that referenced this pull request May 18, 2025
Fix `RowFilter` application of analyzers to queries on collections.

`RowFilter.SimpleExpression#isSatisfiedBy` for `CONTAINS` and
`CONTAINS_KEY` operators should extract the collection values and check
them with `Operator.ANALYZER_MATCHES` rather than with equality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants