Skip to content

Conversation

driftx
Copy link

@driftx driftx commented Oct 8, 2025

What is the issue

ClientState.isSystem() only checks the isInternal field, which is false for external client states created via forExternalCalls(user). This causes the method to return false even when the authenticated user is AuthenticatedUser.SYSTEM_USER.

This leads to incorrect behavior in isOrdinaryUser(), which calls isSuper() for SYSTEM_USER, triggering expensive database queries via Roles.hasSuperuserStatus(). During system initialization when auth tables aren't fully ready, this causes NullPointerException because CassandraRoleManager.loadRoleStatement may not be initialized yet.

What does this PR fix and why was it fixed

Change isSystem() to check both the internal flag and the user identity, change isOrdinaryUser() to check system users before super users.

Copy link

github-actions bot commented Oct 8, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@driftx driftx force-pushed the CNDB-15617 branch 2 times, most recently from 440ec9f to a63caea Compare October 8, 2025 19:16
return !isSuperIgnoreAnonymousUser() && !isSystem();
return !isSuper() && !isSystem();
return !isSystem() && !isSuperIgnoreAnonymousUser();
return !isSystem() && !isSuper();
Copy link

@djatnieks djatnieks Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here may be nice to raise awareness that isSystem is "cheap" while isSuper and isSuperIgnoreAnonymousUser may/will trigger possibly expensive database calls (if not cached) or be called before auth tables are fully initialized.

Though the fact that auth tables are not initialized makes me wonder - is that a situation arising only with CNDB or also with C*, regardless if the auth user is SYSTEM_USER or not? Does it imply any precondition has not been satisfied?

This leads to ask how this affects HCD or even OSS C*?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment about how system users should bypass all guardrails and permissions, which was the real issue here (though avoiding the possibly more expensive query is nice to have.) In 5.0, ENABLE_GUARDRAILS_FOR_ANONYMOUS_USER defaults to true, so external calls by system users would get caught there a) if auth wasn't ready and b) because isSystem wouldn't recognize them.

So it requires external calls by system users to be an issue, which is a situation CNDB satisfies but I can't think of others off the top of my head. I don't think it impacts HCD or OSS until that situation occurs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auth tables are not initialized

I don't know without testing it again but I think accessing DatabaseDescriptor.getAuthenticator().requireAuthentication() in the isSuper call was problematic for CNDB specifically at that point. It may have been fixable some other way there, but we'd still have to fix isSystem.

public boolean isSystem()
{
return isInternal;
return isInternal || (user != null && user.isSystem());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an explanatory comment please? I tend to find the code in this class simple and terse which can be confusing because the code paths that use these methods are not always to clear to understand. I don't know if others feel the same though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added javadoc saying this returns true for both internal and external calls, does that help?

Copy link

@djatnieks djatnieks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good.

Suggested addition of some code comments to help document/explain things.

Also left a question if the auth tables being 'not ready' is of any concern and if there is any impact for HCD/OSS C* in addition to CNDB.

Unless there is reason to believe CNDB is not starting C* correctly somehow (e.g. waiting for auth tables to be init'd) I don't think there is a reason to hold the changes here. This seems like a good defensive change even if that is the case.

@driftx driftx merged commit 43dbe78 into main-5.0 Oct 10, 2025
214 of 521 checks passed
@driftx driftx deleted the CNDB-15617 branch October 10, 2025 18:39
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2039 rejected by Butler


2 regressions found
See build details here


Found 2 new test failures

Test Explanation Runs Upstream
o.a.c.cql3.validation.operations.AggregationQueriesTest.testAggregationQueryShouldNotTimeoutWhenItExceedesReadTimeout (compression) REGRESSION 🔴🔴 2 / 10
o.a.c.distributed.test.RepairErrorsTest.testNoSuchRepairSessionAnticompaction REGRESSION 🔴🔵 0 / 10

Found 5 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants