Skip to content

Conversation

@adelapena
Copy link

It's not allowed to create an index with non-Lucene analysis options on a frozen collection. For example:

CREATE TABLE t(k int PRIMARY KEY, v frozen<set<text>>);
CREATE CUSTOM INDEX ON t(FULL(v)) 
   USING 'StorageAttachedIndex' 
   WITH OPTIONS = {'case_sensitive': false}; -- InvalidRequestException: CQL type frozen<set<text>> cannot be analyzed

This IMO makes sense because it won't support any meaningful operator. However, we don't get that rejection if we try to specify an index_analyzer. For example:

CREATE TABLE t(k int PRIMARY KEY, v frozen<set<text>>);
INSERT INTO %s (k, v) VALUES (0, {'apples'})
CREATE CUSTOM INDEX ON t(FULL(v)) 
   USING 'StorageAttachedIndex' 
   WITH OPTIONS = {'index_analyzer':'STANDARD'}; -- Accepted
SELECT k FROM %s WHERE v CONTAINS 'ABC'; -- Column 'v' has an index but does not support the operators specified in the query.
SELECT k FROM %s WHERE v = {'apple'}; -- Accepted, but results are erratic

In this case, the entire serialized collection is treated as a single string and analyzed. This interpretation of the serialized collection as a string includes the metadata at the beginning, it doesn't use any kind of token separators between fields, etc., so it ends up as a bunch of non-printable characters.

This PR simply rejects creating indexes with analysis options on frozen collections, given that we don't have a way to correctly index them, and we don't support querying either.

@adelapena adelapena self-assigned this Feb 26, 2025
@github-actions
Copy link

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

@adelapena
Copy link
Author

PR for CNDB: https://github.com/riptano/cndb/pull/13152

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the column name (or at least the type) to the message?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sonarqubecloud
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1610 rejected by Butler


1 new test failure(s) in 3 builds
See build details here


Found 1 new test failures

Test Explanation Branch history Upstream history
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴🔴🔴 🔵🔵🔵🔵🔵🔵🔵

Found 2 known test failures

@adelapena adelapena merged commit f0ed83e into main Mar 18, 2025
474 of 479 checks passed
@adelapena adelapena deleted the CNDB-13074-main branch March 18, 2025 12:51
djatnieks pushed a commit that referenced this pull request Apr 14, 2025
Reject creating indexes with analysis options on frozen collections.
We don't have a way to correctly index them, and we don't support querying either.
djatnieks pushed a commit that referenced this pull request May 18, 2025
Reject creating indexes with analysis options on frozen collections.
We don't have a way to correctly index them, and we don't support querying either.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants