Skip to content

Conversation

@adelapena
Copy link

@adelapena adelapena commented Aug 28, 2025

SAI's tracks query metrics providing info about latency, partition reads, filtered rows, etc. These metrics are for all the SAI queries for a certain table. When we see problems in queries, for example a high latency, questions that can usually come next are what query is producing this high latency? Are there vector queries? Are queries for a single partition? In that case, what is the latency of those queries? How many rows are they selecting? To answer those questions, I think it can be useful to track metrics for specific types of query.

The types of query depend on multiple characteristics, like being top-k, containing disjunctions, etc. We can make an overwhelming number of combinations of query characteristics conforming types of query. Storing so many metrics is probably not feasible, so we should choose the ones we consider more interesting.

This PR only adds a few type of queries: filtering, top-k, single-partition and range queries. If we need new types in the future, this patch should make it easy to add separate metrics for them, possibly with an one-liner.

The general query metrics remain untouched, this PR only adds new ones.

Here is how the SAI metrics look like without this patch:
metrics_old
And this is with this patch:
metrics_new

@adelapena adelapena self-assigned this Aug 28, 2025
@github-actions
Copy link

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@adelapena adelapena marked this pull request as draft August 28, 2025 14:43

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all counters, this will work well on Astra and we create the metric only when needed

we have to ensure that the metrics will be unregistered in CNDB when the table is unloaded, but there is already a mechanism, we will plug into it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also ten histograms per type of query, in TableQueryMetrics.PerQuery. I still have to add config/system properties to optionally disable them in CC, independently of CNDB.

@adelapena adelapena marked this pull request as ready for review September 1, 2025 11:59
@adelapena
Copy link
Author

I'm also adding a separate metrics category for hybrid queries, meaning queries that are both filtering and top-k at the same time.

I'm also adding system properties to separately enable/disable SAI query type metrics for table (counters) and for query (histograms). The per-table counters are enabled by default, whereas the per-query histograms are disabled by default.

Copy link

@pkolaczk pkolaczk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me.
Just my usual nagging about naming ;)

Comment on lines 139 to 142
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (dataRanges.size() != 1)
return false;
return dataRanges.get(0).isSinglePartition();
return dataRanges.size() == 1 && dataRanges.get(0).isSinglePartition();

Comment on lines 67 to 69
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion: maybe name them "FilterOnly" and "TopKOnly" ?
Maybe I'm oversensitive, but I bet someone will misunderstand "Filter" as any query which does filtering (strictly adhering to the logic of usesIndexFiltering). I'm not strong on this one. Feel free to discard.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was fence between those two and I choose Filter/TopK just for brevity, but it it has make you wonder too probably it's safer to use FilterOnly/TopKOnly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not MultiPartition ?
Aren't we overloading the term "range" query? Because in Cassandra, single partition queries can be also range, can't they?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SinglePartitionReadCommand is always single partition, and PartitionRangeReadCommand can be both single partition and multipartition. So indeed Range is confusing; I'm changing it to MultiPartition.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need type.isEmpty() ?
Aaaah, it's the name of the group of metrics / query type. I initially thought this was a CQL type.
Maybe rename to queryType or queryKind?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to QueryKind, and replaced "type" by "kind" in all related usages around. Not that I have ever been very good at discerning which one to use, but I think "kind" makes more sense for these kind of vague query subcategories.

@adelapena
Copy link
Author

It seems this was breaking tracing by sending repeated messages for every kind of query, when we only want one. Just pushed a fix. I'm also removing an IMO redundant check in TraceTest.

Add separate SAI query metrics for filtering, top-k, single-partition and range queries.
The new metrics are identical to the previous ones, but they are only applied to the
relevant queries. The previous query metrics remain unaltered.
@sonarqubecloud
Copy link

@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-1969 approved by Butler


Approved by Butler
See build details here

@adelapena adelapena merged commit 550263f into main Sep 17, 2025
493 checks passed
@adelapena adelapena deleted the CNDB-15155-main branch September 17, 2025 16:27
michaelsembwever pushed a commit that referenced this pull request Sep 25, 2025
…1969)

Add SAI query metrics per kind of query.

There are metrics for:
 * Filtering only queries.
 * Top-k only queries, such as ANN, BM25 and generic ordering.
 * Hybrid queries, combining both top-k and filtering.
 * Single-partition queries.
 * Multi-partition queries.

Each of these groups has the same records as the general query metrics.

These new metrics can be enabled/disabled with the system properties:
* cassandra.sai.metrics.query_kind.per_table.enabled
* cassandra.sai.metrics.query_kind.per_query.enabled

Rebase notes:
 * required to re-implement CASSANDRA-18940
 * will be replaced (made redundant) by CASSANDRA-20923
michaelsembwever pushed a commit that referenced this pull request Sep 25, 2025
…1969)

Add SAI query metrics per kind of query.

There are metrics for:
 * Filtering only queries.
 * Top-k only queries, such as ANN, BM25 and generic ordering.
 * Hybrid queries, combining both top-k and filtering.
 * Single-partition queries.
 * Multi-partition queries.

Each of these groups has the same records as the general query metrics.

These new metrics can be enabled/disabled with the system properties:
* cassandra.sai.metrics.query_kind.per_table.enabled
* cassandra.sai.metrics.query_kind.per_query.enabled

Rebase notes:
 * required to re-implement CASSANDRA-18940
 * will be replaced (made redundant) by CASSANDRA-20923
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants