Add support for a per-operation context #1619

pcmanus · 2025-03-04T09:54:41Z

The goal of this commit is to allow low-level code, typically at the
level of file reads (say, a custom FileChannel implementation), to
obtain context on which higher level operation they correspond to. A
typical use case may for tiered storage where FileChannel
implementations may forward reads to remote storage, and where having
context on the overall operation could help, say, take prior work done
to set timeout, or aggregate metrics per "user-level" operations (rather
than per FileChannel operation).

To do this, we reuse the existing ExecutorLocals mechanism, and simply
have the top-level operation setup the proper context as an
ExecutorLocal, allowing it to be accessed by any low-level operations
operating on behalf of that operation. This is, in many way, similar to
tracing, but instead of a TraceState that collect what happens during
the operation, it is a relatively flexible notion of OperationContext.

As of this patch, this feature is fairly barebone and mostly exists for
extensions in the following sense:

only user reads (ReadCommand execution) currently sets up such a
context. The code is written in such a way that adding support for
other operations should be easy, but this is not done.
the context set by reads by default has barely any information: it
merely has a toString() method that can roughly tell what the
operation itself is, and so could have use for debugging, but not
much more. Further, that context is not read by anything in this
patch. However, said context are created through a "factory" and
the factory class is configurable through a system property. So
extension can override the factory in order to create contexts
with more information/state, and they fetch/use those context where
they see fit.

github-actions · 2025-03-04T09:54:57Z

Checklist before you submit for review

Make sure there is a PR in the CNDB project updating the Converged Cassandra version
Use NoSpamLogger for log lines that may appear frequently in the logs
Verify test results on Butler
Test coverage for new/modified code is > 80%
Proper code formatting
Proper title for each commit staring with the project-issue number, like CNDB-1234
Each commit has a meaningful description
Each commit is not very long and contains related changes
Renames, moves and reformatting are in distinct commits

eolivelli · 2025-03-06T12:07:43Z

I think that Counter Mutations perform reads from storage but they won't be tracked with this patch.

I don't know about LWTs or other statements that require reads

pcmanus · 2025-03-06T15:12:47Z

I think that Counter Mutations perform reads from storage but they won't be tracked with this patch.

Good point. It should be easy enough to also set the context on writes to cover all the case of "writes that end up doing reads", so I'll add it.

pcmanus · 2025-03-06T16:00:11Z

Actually, looking more closely, the current patch handles every execution of a ReadCommand, and everything that require a read (counter, LWT) ultimately does so by executing a ReadCommand, so those use case are covered in that sense.

What is true is that currently there is no good way to look at the context and tell if the read is part of a counter update, or a LWT, or it's a user reads, but:

I don't think it matters for our current use cases, be it https://github.com/riptano/cndb/pull/13209 (nothing in the timeout handling really depends on why the request is done, not probably should it) or https://github.com/riptano/cndb/issues/13219 (in theory we could split the metric further by "kind of request", but cardinality is already a concern for that metric and this would make this worst, so in practice I don't think we want to do that even if we could).
there is actually no simple way to add this information in the context: when we do a read for the condition of a LWT write for instance, then the nodes that concretely execute the read are not the same as the one that coordinate the LWT (the LWT coordinator may happen to also be a read replica but it's not guarantee, and even when that's the case, that it at best 1 node out of the quorum that is requested). And there is genuinely nothing currently on the write protocol that allows a replica that receives a read to tell if that read belong to a LWT write, a counter update, or other. Long story short, we'd have to make write protocol modification to even get that information available, and I don't think we care that badly at the moment.

eolivelli · 2025-03-06T16:11:38Z

I think that we don't care much about the type of query (counters, single read....), the only important thing is that the new mechanism covers all the types (in order to completely cover the CNDB side implementation)

I would keep this in the current form, no other changes. Thanks for checking

The goal of this commit is to allow low-level code, typically at the level of file reads (say, a custom `FileChannel` implementation), to obtain context on which higher level operation they correspond to. A typical use case may for tiered storage where `FileChannel` implementations may forward reads to remote storage, and where having context on the overall operation could help, say, take prior work done to set timeout, or aggregate metrics per "user-level" operations (rather than per `FileChannel` operation). To do this, we reuse the existing `ExecutorLocals` mechanism, and simply have the top-level operation setup the proper context as an `ExecutorLocal`, allowing it to be accessed by any low-level operations operating on behalf of that operation. This is, in many way, similar to tracing, but instead of a `TraceState` that collect what happens during the operation, it is a relatively flexible notion of `OperationContext`. As of this patch, this feature is fairly barebone and mostly exists for extensions in the following sense: 1. only user reads (`ReadCommand` execution) currently sets up such a context. The code is written in such a way that adding support for other operations should be easy, but this is not done. 2. the context set by reads by default has barely any information: it merely has a `toString()` method that can roughly tell what the operation itself is, and so could have use for debugging, but not much more. Further, that context is not read by anything in this patch. However, said context are created through a "factory" and the factory class is configurable through a system property. So extension can override the factory in order to create contexts with more information/state, and they fetch/use those context where they see fit.

sonarqubecloud · 2025-03-14T08:48:03Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
91.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-03-14T08:55:55Z

❌ Build ds-cassandra-pr-gate/PR-1619 rejected by Butler

1 new test failure(s) in 2 builds
See build details here

Found 1 new test failures

Test	Explanation	Branch history	Upstream history
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS...	regression	🔴🔴	🔵🔵🔵🔵🔵🔵🔵

Found 2 known test failures

The goal of this commit is to allow low-level code, typically at the level of file reads (say, a custom `FileChannel` implementation), to obtain context on which higher level operation they correspond to. A typical use case may for tiered storage where `FileChannel` implementations may forward reads to remote storage, and where having context on the overall operation could help, say, take prior work done to set timeout, or aggregate metrics per "user-level" operations (rather than per `FileChannel` operation). To do this, we reuse the existing `ExecutorLocals` mechanism, and simply have the top-level operation setup the proper context as an `ExecutorLocal`, allowing it to be accessed by any low-level operations operating on behalf of that operation. This is, in many way, similar to tracing, but instead of a `TraceState` that collect what happens during the operation, it is a relatively flexible notion of `OperationContext`. As of this patch, this feature is fairly barebone and mostly exists for extensions in the following sense: 1. only user reads (`ReadCommand` execution) currently sets up such a context. The code is written in such a way that adding support for other operations should be easy, but this is not done. 2. the context set by reads by default has barely any information: it merely has a `toString()` method that can roughly tell what the operation itself is, and so could have use for debugging, but not much more. Further, that context is not read by anything in this patch. However, said context are created through a "factory" and the factory class is configurable through a system property. So extension can override the factory in order to create contexts with more information/state, and they fetch/use those context where they see fit.

pcmanus force-pushed the cndb-13000-ctx branch from 9eb0db6 to c6c0151 Compare March 5, 2025 16:53

pcmanus changed the title ~~[WIP] Add support for a per-operation context~~ Add support for a per-operation context Mar 5, 2025

eolivelli approved these changes Mar 6, 2025

View reviewed changes

pcmanus force-pushed the cndb-13000-ctx branch from c6c0151 to 2481e88 Compare March 10, 2025 15:44

pcmanus force-pushed the cndb-13000-ctx branch from 2481e88 to fe91ed3 Compare March 14, 2025 07:58

pcmanus merged commit 384ba23 into main Mar 14, 2025
474 of 480 checks passed

pcmanus deleted the cndb-13000-ctx branch March 14, 2025 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for a per-operation context #1619

Add support for a per-operation context #1619

Uh oh!

pcmanus commented Mar 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 4, 2025

Uh oh!

eolivelli commented Mar 6, 2025

Uh oh!

pcmanus commented Mar 6, 2025

Uh oh!

pcmanus commented Mar 6, 2025

Uh oh!

eolivelli commented Mar 6, 2025

Uh oh!

sonarqubecloud bot commented Mar 14, 2025

Uh oh!

cassci-bot commented Mar 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add support for a per-operation context #1619

Add support for a per-operation context #1619

Uh oh!

Conversation

pcmanus commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 4, 2025

Checklist before you submit for review

Uh oh!

eolivelli commented Mar 6, 2025

Uh oh!

pcmanus commented Mar 6, 2025

Uh oh!

pcmanus commented Mar 6, 2025

Uh oh!

eolivelli commented Mar 6, 2025

Uh oh!

sonarqubecloud bot commented Mar 14, 2025

Quality Gate passed

Uh oh!

cassci-bot commented Mar 14, 2025

❌ Build ds-cassandra-pr-gate/PR-1619 rejected by Butler

Found 1 new test failures

Found 2 known test failures

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pcmanus commented Mar 4, 2025 •

edited

Loading