-
Couldn't load subscription status.
- Fork 21
CNDB-15155: Segregate SAI's query metrics per query type #1969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checklist before you submit for review
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all counters, this will work well on Astra and we create the metric only when needed
we have to ensure that the metrics will be unregistered in CNDB when the table is unloaded, but there is already a mechanism, we will plug into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also ten histograms per type of query, in TableQueryMetrics.PerQuery. I still have to add config/system properties to optionally disable them in CC, independently of CNDB.
c107c88 to
dcea2e0
Compare
|
I'm also adding a separate metrics category for hybrid queries, meaning queries that are both filtering and top-k at the same time. I'm also adding system properties to separately enable/disable SAI query type metrics for table (counters) and for query (histograms). The per-table counters are enabled by default, whereas the per-query histograms are disabled by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good to me.
Just my usual nagging about naming ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (dataRanges.size() != 1) | |
| return false; | |
| return dataRanges.get(0).isSinglePartition(); | |
| return dataRanges.size() == 1 && dataRanges.get(0).isSinglePartition(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: maybe name them "FilterOnly" and "TopKOnly" ?
Maybe I'm oversensitive, but I bet someone will misunderstand "Filter" as any query which does filtering (strictly adhering to the logic of usesIndexFiltering). I'm not strong on this one. Feel free to discard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was fence between those two and I choose Filter/TopK just for brevity, but it it has make you wonder too probably it's safer to use FilterOnly/TopKOnly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not MultiPartition ?
Aren't we overloading the term "range" query? Because in Cassandra, single partition queries can be also range, can't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SinglePartitionReadCommand is always single partition, and PartitionRangeReadCommand can be both single partition and multipartition. So indeed Range is confusing; I'm changing it to MultiPartition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need type.isEmpty() ?
Aaaah, it's the name of the group of metrics / query type. I initially thought this was a CQL type.
Maybe rename to queryType or queryKind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it to QueryKind, and replaced "type" by "kind" in all related usages around. Not that I have ever been very good at discerning which one to use, but I think "kind" makes more sense for these kind of vague query subcategories.
|
It seems this was breaking tracing by sending repeated messages for every kind of query, when we only want one. Just pushed a fix. I'm also removing an IMO redundant check in |
56fc429 to
3c8fdc0
Compare
Add separate SAI query metrics for filtering, top-k, single-partition and range queries. The new metrics are identical to the previous ones, but they are only applied to the relevant queries. The previous query metrics remain unaltered.
3c8fdc0 to
e20af50
Compare
|
✔️ Build ds-cassandra-pr-gate/PR-1969 approved by ButlerApproved by Butler |
…1969) Add SAI query metrics per kind of query. There are metrics for: * Filtering only queries. * Top-k only queries, such as ANN, BM25 and generic ordering. * Hybrid queries, combining both top-k and filtering. * Single-partition queries. * Multi-partition queries. Each of these groups has the same records as the general query metrics. These new metrics can be enabled/disabled with the system properties: * cassandra.sai.metrics.query_kind.per_table.enabled * cassandra.sai.metrics.query_kind.per_query.enabled Rebase notes: * required to re-implement CASSANDRA-18940 * will be replaced (made redundant) by CASSANDRA-20923
…1969) Add SAI query metrics per kind of query. There are metrics for: * Filtering only queries. * Top-k only queries, such as ANN, BM25 and generic ordering. * Hybrid queries, combining both top-k and filtering. * Single-partition queries. * Multi-partition queries. Each of these groups has the same records as the general query metrics. These new metrics can be enabled/disabled with the system properties: * cassandra.sai.metrics.query_kind.per_table.enabled * cassandra.sai.metrics.query_kind.per_query.enabled Rebase notes: * required to re-implement CASSANDRA-18940 * will be replaced (made redundant) by CASSANDRA-20923



SAI's tracks query metrics providing info about latency, partition reads, filtered rows, etc. These metrics are for all the SAI queries for a certain table. When we see problems in queries, for example a high latency, questions that can usually come next are what query is producing this high latency? Are there vector queries? Are queries for a single partition? In that case, what is the latency of those queries? How many rows are they selecting? To answer those questions, I think it can be useful to track metrics for specific types of query.
The types of query depend on multiple characteristics, like being top-k, containing disjunctions, etc. We can make an overwhelming number of combinations of query characteristics conforming types of query. Storing so many metrics is probably not feasible, so we should choose the ones we consider more interesting.
This PR only adds a few type of queries: filtering, top-k, single-partition and range queries. If we need new types in the future, this patch should make it easy to add separate metrics for them, possibly with an one-liner.
The general query metrics remain untouched, this PR only adds new ones.
Here is how the SAI metrics look like without this patch:


And this is with this patch: