CNDB-15280: Fix AbstractReadQuery.toCQLString to produce correct CQL when possible #1985

adelapena · 2025-09-04T19:22:53Z

The method AbstractReadQuery.toCQLString prints commands as CQL queries including any column values. This includes the queried values in the WHERE part of a SELECT statement or the written values on INSERT and UPDATE statement. This method is used at least by the slow query logger, printing user data into the logs.

This PR includes a number of fixes to make toCQLString produce correct CQL syntax, mostly coming from CASSANDRA-16510. These are fixes to prevent printing wrong or redundant CQL syntax such as SELECT * FROM t ORDER BY (v ANN OF [...]) or SELECT * FROM t WHERE c1=0 AND c1=0. Those issues came up while writing tests for testing redaction with various types of queries, and are needed to properly test redaction. There is also some removal of code duplication. These changes are in a separate commit in an attempt to ease review.

This PR originally included both those fixes and redaction in separate commit, but they have been separated in two PRs at reviewer request. The PR for redaction is #2038. This PR however contains reviewer feedback regarding changes that have been ported to that PR, so it might be harder to follow.

github-actions · 2025-09-04T19:23:10Z

Copilot

Pull Request Overview

The PR "CNDB-15280: Remove user data from AbstractReadQuery.toCQLString" modifies the CQL string generation methods to redact sensitive user data from query logs. The main purpose is to prevent sensitive column values from appearing in slow query logs and other monitoring outputs by replacing them with "?" placeholders.

Key changes:

Modified AbstractReadQuery.toCQLString to accept a redact parameter for controlling value redaction
Added redaction capabilities throughout the CQL string generation chain
Updated SAI query plan logging to use redacted strings

Reviewed Changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`test/unit/org/apache/cassandra/index/sai/plan/PlanTest.java`	Updated test mocks to support new redaction parameter and added tests for both redacted and non-redacted plan output
`test/unit/org/apache/cassandra/db/filter/DataLimitsTest.java`	Fixed test assertions to use "LIMIT" instead of "ROWS LIMIT" for CQL compatibility
`test/unit/org/apache/cassandra/db/filter/ColumnFilterTest.java`	Updated all test assertions to test both redacted and non-redacted CQL string generation
`test/unit/org/apache/cassandra/db/SinglePartitionReadCommandCQLTest.java`	Expanded test coverage to verify redaction works for various query types and added tests for quoted identifiers
`test/unit/org/apache/cassandra/db/ReadCommandCQLTester.java`	Modified helper method to test both redacted and non-redacted CQL output
`test/unit/org/apache/cassandra/db/PartitionRangeReadCommandCQLTest.java`	Added comprehensive tests for redaction across different query patterns including index-based queries
`test/unit/org/apache/cassandra/db/MultiRangeReadCommandCQLTest.java`	New test file for multi-range read command CQL string generation with redaction support
`test/unit/org/apache/cassandra/cql3/validation/entities/SecondaryIndexTest.java`	Fixed error message templates to remove placeholder syntax
`test/distributed/org/apache/cassandra/distributed/test/SlowQueryLoggerTest.java`	New integration test verifying that slow query logger redacts sensitive data
`src/java/org/apache/cassandra/schema/ColumnMetadata.java`	Fixed CQL string generation to use proper column name quoting
`src/java/org/apache/cassandra/index/sai/plan/QueryController.java`	Updated tracing to use redacted plan strings
`src/java/org/apache/cassandra/index/sai/plan/Plan.java`	Added redaction support to plan string generation methods
`src/java/org/apache/cassandra/db/rows/AbstractRow.java`	Updated to use redacted clustering string generation
`src/java/org/apache/cassandra/db/marshal/AbstractType.java`	Added new `toCQLString` method with redaction parameter
`src/java/org/apache/cassandra/db/filter/RowFilter.java`	Enhanced filter expressions to support redacted CQL string generation
`src/java/org/apache/cassandra/db/filter/DataLimits.java`	Fixed limit clause formatting for proper CQL syntax
Multiple filter and query classes	Updated CQL string generation methods to support redaction parameter throughout the call chain

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/java/org/apache/cassandra/schema/ColumnMetadata.java

src/java/org/apache/cassandra/db/filter/RowFilter.java

k-rus

This PR modifies AbstractReadQuery.toCQLString so it doesn't include column values. There is a boolean flag to opt-out from redaction, since seeing the queried values can be useful while debugging.

I would like to request to have two exposed methods with different names. And give better name to the methods, which targets logs and excludes column values.

Also can you go through the PR checklist? Some items can be visited already now.

k-rus

Thank you for the PR and improving maitnainability by reducing code duplication. It's great to see such improvements.

I am requesting to separate fixes to producing correct CQL string from the implementation of redaction into a separate PR. There are many practical reasons, and splitting will help to do review and also address review comments. I have plenty comments just on fixing correctness of produced CQL string. And I also have comments to redaction implementation. I think it will be useful to have the redaction PR be based on the correctness PR.

My comments to the first part is maintly about improving code maintainability. I probably noticed few uncrutial misses too.

For the redaction part I have an issue that the decision when to restrict and when not, e.g., in impelemntations of toString, is arbitrary. It will be great to summarize where and why redaction is applied and where not in the PR discription. I also suggest to bring this summary to Slack for visibility and potential discussion.
Also I would like to see improvements to the names of methods toCQLString() (without argument) to provide semantics of redaction if they do so.

My review is currently partial as the PR is big.

src/java/org/apache/cassandra/db/filter/AbstractClusteringIndexFilter.java

k-rus · 2025-09-18T14:21:53Z

test/unit/org/apache/cassandra/index/sai/plan/PlanTest.java

+                     "                    predicate: RANGE(pred4)\n",
+                     limit.toStringRecursive(false));
+
+        assertEquals("Limit 3 (rows: 3.0, cost/row: 3895.8, cost: 44171.3..55858.7)\n" +


Is this plan equivalent to the above with the difference on the second line? This is difficult to verify, thus it might be good construct string plans from the common parts and avoid copy-paste.

src/java/org/apache/cassandra/schema/ColumnMetadata.java

k-rus · 2025-09-18T14:32:20Z

src/java/org/apache/cassandra/index/sai/plan/Plan.java

@@ -288,16 +288,37 @@ enum ControlFlow { Continue, Break }
     */
    public final String toStringRecursive()


I find it important improve the name of the method to make its new semantics clear, especially this PR changes its semantics. E.g., toRedactedStringRecursive.

Adding toRedactedStringRecursive and toUnredactedStringRecursive.

src/java/org/apache/cassandra/db/filter/RowFilter.java

k-rus · 2025-09-20T15:47:05Z

src/java/org/apache/cassandra/db/filter/RowFilter.java

+                    return String.format("ORDER BY %s %s", column.name, operator);
+                case ANN:
+                    return String.format("ORDER BY %s ANN OF %s", column.name, truncateValue(type.toCQLString(value, redact)));
+                case LIKE_PREFIX:
+                    return likeToCQLString("'%s%%'", type, redact);
+                case LIKE_SUFFIX:
+                    return likeToCQLString("'%%%s'", type, redact);
+                case LIKE_CONTAINS:
+                    return likeToCQLString("'%%%s%%'", type, redact);
+                case LIKE_MATCHES:
+                    return likeToCQLString("'%s'", type, redact);


Mixing break and return smells to me. However, I haven't got a good idea to suggest here. I thought if likeToCQLString can return formatting strings to be used in the common return, however, serializing like operators to strings can cause issues.

Is there anything actionable here?

I don't have a suggestion, thus no action to do in this PR. I am wonder what do you think about this mix?

I don't have anything to say, that's why I did it this way.

adelapena · 2025-10-01T11:56:15Z

I am requesting to separate fixes to producing correct CQL string from the implementation of redaction into a separate PR. There are many practical reasons, and splitting will help to do review and also address review comments. I have plenty comments just on fixing correctness of produced CQL string. And I also have comments to redaction implementation. I think it will be useful to have the redaction PR be based on the correctness PR.

@k-rus this PR is already split into separate commits, one for each thing, which took quite a bit of work to do because generally one realizes things about correctness when working on masking, since both things are tightly related. Should we spend more time in dividing this in two PRs? Also, there is review feedback for both things in this PR.

adelapena · 2025-10-02T12:43:01Z

@k-rus thanks for reviewing. I think I have addressed all the feedback on this partial review round. Please let me know when the rest of the review is ready.

k-rus · 2025-10-06T12:56:13Z

I am requesting to separate fixes to producing correct CQL string from the implementation of redaction into a separate PR. There are many practical reasons, and splitting will help to do review and also address review comments. I have plenty comments just on fixing correctness of produced CQL string. And I also have comments to redaction implementation. I think it will be useful to have the redaction PR be based on the correctness PR.

@k-rus this PR is already split into separate commits, one for each thing, which took quite a bit of work to do because generally one realizes things about correctness when working on masking, since both things are tightly related. Should we spend more time in dividing this in two PRs? Also, there is review feedback for both things in this PR.

I’m not sure I understand why having two PRs would be more work than keeping two clean commits.

My suggestion is to create the first PR focused solely on fixing the CQL string issues (as in the current commit) and then base the redaction PR on top of that one. I specifically don’t suggest basing the second PR on main, since that would introduce unnecessary extra work.

Splitting the work into two PRs helps reduce the review size and avoid mixing semantically different changes. It also makes it easier to review each part separately with the existing tooling.

Since code quality is a current focus, having two clear, purpose-specific PRs will definitely help with review efficiency and maintainability.

adelapena · 2025-10-07T10:30:46Z

I am requesting to separate fixes to producing correct CQL string from the implementation of redaction into a separate PR. There are many practical reasons, and splitting will help to do review and also address review comments. I have plenty comments just on fixing correctness of produced CQL string. And I also have comments to redaction implementation. I think it will be useful to have the redaction PR be based on the correctness PR.

@k-rus this PR is already split into separate commits, one for each thing, which took quite a bit of work to do because generally one realizes things about correctness when working on masking, since both things are tightly related. Should we spend more time in dividing this in two PRs? Also, there is review feedback for both things in this PR.

I’m not sure I understand why having two PRs would be more work than keeping two clean commits.

My suggestion is to create the first PR focused solely on fixing the CQL string issues (as in the current commit) and then base the redaction PR on top of that one. I specifically don’t suggest basing the second PR on main, since that would introduce unnecessary extra work.

Splitting the work into two PRs helps reduce the review size and avoid mixing semantically different changes. It also makes it easier to review each part separately with the existing tooling.

Since code quality is a current focus, having two clear, purpose-specific PRs will definitely help with review efficiency and maintainability.

I think we have separate commits as a requirement for this kind of thing. This PR already has abundant review discussion on both correctness and refactoring that will be missing in the separate PR. Maybe that feedback should have been hold until the PR was split to help with review efficiency and maintainability. Anyway, I have created that separate PR if that helps unblock this, leaving this one for correctness: #2038

k-rus

Thank you for splitting into a separate PR as it helps a lot, since the PR is still not small with many changes.
It might be helpful in future if there is a separate ticket for fixing issues with generated CQL strings. Having a separate issue will also allow to remove connection to redaction. What do you think?

The PR looks good to me. I think I found a couple of places where toCQLstring is missing. Also I am not sure if generated CQL doesn't need to be fixed for ANN queries. (if it's so and it requires work, I don't mind to postpone into a separate issue, and this PR will just introduce tests demonstrating the current issue)
Otherwise, my comments are nits to improve maintainability of the code.

k-rus · 2025-10-12T09:00:19Z

src/java/org/apache/cassandra/db/DecoratedKey.java

     * For multi-column keys: "k1 = 1 AND k2 = 2"
+     *
+     * @param metadata the table metadata
+     * @param redact whether to redact the key value, as in "k1 = ? AND k2 = ?".


It's good to move this comment about param redact to the follow up PR.

k-rus · 2025-10-12T09:15:55Z

src/java/org/apache/cassandra/db/MultiPartitionReadQuery.java

+        if (ranges.size() == 1)
+        {
+            String rangeString = ranges.get(0).toCQLString(metadata());
+            if (!rangeString.isEmpty())


Since this can only happen with single range, can it be useful to have an assert for non-empty range string in the case of many ranges? (this is because I had earlier the question about when it can be empty)

Another approach is suggested in #1985 (comment)

k-rus · 2025-10-12T09:17:16Z

src/java/org/apache/cassandra/db/MultiPartitionReadQuery.java

+        }
+        else
+        {
+            builder.append(" WHERE (");


This suggestion looks stupid, but it can help to notice the opening parentheses:

Suggested change

builder.append(" WHERE (");

builder.append(" WHERE ").append('(');

You can ignore the suggestion. And I am not sure if linter will complain about it.

I'll add it, it might help a bit with readability

k-rus · 2025-10-12T09:23:01Z

src/java/org/apache/cassandra/db/MultiPartitionReadQuery.java

+    private boolean appendRanges(CqlBuilder builder)
+    {
+        List<DataRange> ranges = ranges();
+        boolean hasRangeRestrictions = false;


I have to check if the purpose of this variable to store the result or if it is also read to guide logic. Since it's only to store the result, what do you think about replacing it with return clauses?

I think it can be replaced by returns, yes.

k-rus · 2025-10-12T09:30:50Z

src/java/org/apache/cassandra/db/MultiPartitionReadQuery.java

+    default void appendCQLWhereClause(CqlBuilder builder)
+    {
+        List<DataRange> ranges = ranges();
+        if (ranges.size() == 1 && ranges.get(0).isUnrestricted() && rowFilter().isEmpty())


Am I correct that the condition ranges.size() == 1 && ranges.get(0).isUnrestricted() means that there are no ranges?
What do you think about refactoring this class and use this condition already here instead of calling appendRanges in any case?

Does the condition rowFilter().isEmpty() correspond to the condition on line 43 if (!filter.isEmpty())?

Does the condition rowFilter().isEmpty() correspond to the condition on line 43 if (!filter.isEmpty())?

Not really, because rowFilter().isEmpty() doesn't include the clustering.

I'm simplifying this a bit, anyway.

k-rus · 2025-10-12T12:41:31Z

src/java/org/apache/cassandra/db/filter/RowFilter.java

+                    return String.format("ORDER BY %s %s", column.name, operator);
+                case ANN:
+                    return String.format("ORDER BY %s ANN OF %s", column.name, valueAsCQLString(type, value));


Should two places with column.name be column.name.toCQLString() too?

k-rus · 2025-10-12T13:19:07Z

test/unit/org/apache/cassandra/db/MultiRangeReadCommandCQLTest.java

+        createIndex("CREATE CUSTOM INDEX ON %s(v) USING 'StorageAttachedIndex'");
+        createIndex("CREATE CUSTOM INDEX ON %s(n) USING 'StorageAttachedIndex'");
+        assertToCQLString("SELECT * FROM %s ORDER BY v ANN OF [1, 2] LIMIT 10",
+                          "SELECT * FROM %s WHERE (token(k) <= -1 OR token(k) > -1) ORDER BY v ANN OF [1.0, ... LIMIT 10 ALLOW FILTERING");


What does ... mean here? Is it correct that there is no closing bracket?

This is due to the generic truncation done in RowFilter.SimpleExpression.toCQLString, which doesn't consider the data type. That might be delegated to the type in a separate ticket, if there is value on it.

k-rus · 2025-10-12T13:20:46Z

test/unit/org/apache/cassandra/db/MultiRangeReadCommandCQLTest.java

+        createTable("CREATE TABLE %s (k int, c int, n int, PRIMARY KEY (k, c))");
+        createIndex("CREATE CUSTOM INDEX ON %s(n) USING 'StorageAttachedIndex'");
+        assertToCQLString("SELECT * FROM %s ORDER BY n LIMIT 10",
+                          "SELECT * FROM %s WHERE (token(k) <= -1 OR token(k) > -1) ORDER BY n ASC LIMIT 10 ALLOW FILTERING");


Is it possible to check the correctness of the expected CQL? Is it executable or parsable?
It's difficult to check the correctness by human eyes.

It is. I have added some validation checks, although toCQLString doesn't always produces valid data. The cases where it doesn't produce valid data are:

Truncation of long expression operands

MultRangeReadCommands like the ones tested here. This commands don't have a supported CQL representation. They are the result of internally splitting other commands, and I have tried to add a kind of understandable CQL representation of them, even if it's not supported.

k-rus · 2025-10-12T13:34:40Z

test/unit/org/apache/cassandra/db/PartitionRangeReadCommandCQLTest.java

+        // test literals
+        createTable("CREATE TABLE %s (k text, c text, m map<text, text>, PRIMARY KEY (k, c))");
+        assertToCQLString("SELECT m['key'] FROM %s WHERE m['key'] = 'value' ALLOW FILTERING",
+                          "SELECT m FROM %s WHERE m['key'] = 'value' ALLOW FILTERING");


Is it correct that projection is missing the collection access, i.e., m instead of m['key']? Or is it something to fix in future?

This is what is received internally, due to the differences between fetched/queries columns, etc.

k-rus · 2025-10-12T13:42:05Z

test/unit/org/apache/cassandra/db/SinglePartitionReadCommandCQLTest.java

+                          "SELECT * FROM %s WHERE k = 0 ALLOW FILTERING");
+        assertToCQLString("SELECT * FROM %s WHERE k = 0 ORDER BY c1 DESC",
+                          "SELECT * FROM %s WHERE k = 0 ORDER BY c1 DESC, c2 DESC ALLOW FILTERING");
+        assertToCQLString("SELECT * FROM %s WHERE k = 0 ORDER BY c1 DESC, c2 DESC",


I have a curious question, which, I believe, is outside the scope of this PR. Is it allowed to have mixing ASC and DESC for c1 and c2? E.g., ORDER BY c1 DESC, c2 ASC

I think it's not supported, due to how clusterings are built.

Various fixes for toCQLString mainly coming from CASSANDRA-16510, and removal of code duplication.

…String

sonarqubecloud · 2025-10-15T16:18:04Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
92.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-10-15T16:27:26Z

❌ Build ds-cassandra-pr-gate/PR-1985 rejected by Butler

1 regressions found
See build details here

Found 1 new test failures

Test	Explanation	Runs	Upstream
o.a.c.sensors.SensorsIndexWriteTest.BeforeFirstTest	NEW	🔴⚪	1 / 10

Found 3 known test failures

adelapena self-assigned this Sep 4, 2025

adelapena marked this pull request as draft September 4, 2025 19:23

adelapena force-pushed the CNDB-15280-main branch from ef1ea2c to 3073576 Compare September 9, 2025 12:51

adelapena marked this pull request as ready for review September 9, 2025 12:51

adelapena marked this pull request as draft September 9, 2025 18:40

adelapena force-pushed the CNDB-15280-main branch from 3073576 to 8441449 Compare September 10, 2025 11:27

adelapena marked this pull request as ready for review September 10, 2025 11:31

adelapena force-pushed the CNDB-15280-main branch 4 times, most recently from a84996a to f9d1f3d Compare September 12, 2025 12:09

adelapena requested a review from a team September 12, 2025 13:49

k-rus requested a review from Copilot September 15, 2025 05:49

Copilot AI reviewed Sep 15, 2025

View reviewed changes

src/java/org/apache/cassandra/schema/ColumnMetadata.java Outdated Show resolved Hide resolved

src/java/org/apache/cassandra/db/filter/RowFilter.java Outdated Show resolved Hide resolved

k-rus reviewed Sep 17, 2025

View reviewed changes

k-rus requested changes Sep 20, 2025

View reviewed changes

adelapena force-pushed the CNDB-15280-main branch 2 times, most recently from d7a1449 to dfdecf0 Compare October 2, 2025 10:51

adelapena force-pushed the CNDB-15280-main branch from dfdecf0 to ae827cd Compare October 7, 2025 10:08

adelapena mentioned this pull request Oct 7, 2025

CNDB-15280: Remove user data from AbstractReadQuery.toCQLString #2038

Open

adelapena changed the title ~~CNDB-15280: Remove user data from AbstractReadQuery.toCQLString~~ CNDB-15280: Remove user data from AbstractReadQuery.toCQLString (fixes and refactoring) Oct 7, 2025

k-rus self-requested a review October 7, 2025 14:11

k-rus reviewed Oct 12, 2025

View reviewed changes

adelapena added 2 commits October 15, 2025 14:49

CNDB-15280: Fix AbstractReadQuery.toCQLString

3923b6c

Various fixes for toCQLString mainly coming from CASSANDRA-16510, and removal of code duplication.

CNDB-15280: Address review feedback on fixing AbstractReadQuery.toCQL…

c38d3cf

…String

CNDB-15280: Address review feedback

c2702c7

adelapena force-pushed the CNDB-15280-main branch from ae827cd to c2702c7 Compare October 15, 2025 14:52

CNDB-15280: Add some basic validation to ReadCommandCQLTester

5852307

adelapena requested a review from k-rus October 17, 2025 10:45

adelapena changed the title ~~CNDB-15280: Remove user data from AbstractReadQuery.toCQLString (fixes and refactoring)~~ CNDB-15280: Fix AbstractReadQuery.toCQLString to produce correct CQL when possible Oct 20, 2025

		@@ -288,16 +288,37 @@ enum ControlFlow { Continue, Break }
		*/
		public final String toStringRecursive()

	builder.append(" WHERE (");
	builder.append(" WHERE ").append('(');

CNDB-15280: Fix AbstractReadQuery.toCQLString to produce correct CQL when possible #1985

Are you sure you want to change the base?

CNDB-15280: Fix AbstractReadQuery.toCQLString to produce correct CQL when possible #1985

Conversation

adelapena commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 4, 2025 • edited by adelapena Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist before you submit for review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

k-rus left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-rus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adelapena commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adelapena commented Oct 2, 2025

Uh oh!

k-rus commented Oct 6, 2025

Uh oh!

adelapena commented Oct 7, 2025

Uh oh!

k-rus left a comment

Choose a reason for hiding this comment

Uh oh!

k-rus Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adelapena commented Sep 4, 2025 •

edited

Loading

github-actions bot commented Sep 4, 2025 •

edited by adelapena

Loading

k-rus left a comment •

edited

Loading

adelapena commented Oct 1, 2025 •

edited

Loading

k-rus Oct 12, 2025 •

edited

Loading