Skip to content

Conversation

michaelsembwever
Copy link
Member

https://github.com/riptano/cndb/issues/15498

Port into main-5.0 commit 5ea1221

CNDB-15141: add LogTransaction#validate API to perform validation before txn prepare (https://github.com/datastax/cassandra/pull/2008)
### What is the issue
https://github.com/riptano/cndb/issues/15141

### What does this PR fix and why was it fixed
Added compaction task verification before transaction commit to validate
if all source sstables' min/max keys are present in output sstables or
expired if missing from output sstables.

This is mainly to detect data loss caused by compaction strategy
skipping some subranges of source sstables HCD-130.

It works as following:
1. Extract all boundary keys (min/max) from source sstable
2. Check if boundary keys exist in output sstables
  a If all present, validation passes
  b. If any keys are missing, continue validation
3. For missing keys, read partition data from source sstables
4. Apply tombstone purging by using `gc_grace_seconds=0`
5. Check purged content
a If there is no live data, validation passes. Keys are properly
obsoleted
b If there is live data, validation fails and throws to abort compaction
if it's configured to abort


Configs:
* `cassandra.compaction_validation_mode=NONE`, available options:
NONE/WARN/ABORT

@github-actions
Copy link

github-actions bot commented Oct 9, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

…lidation before txn prepare (#2008)

riptano/cndb#15141

Added compaction task verification before transaction commit to validate
if all source sstables' min/max keys are present in output sstables or
expired if missing from output sstables.

This is mainly to detect data loss caused by compaction strategy
skipping some subranges of source sstables HCD-130.

It works as following:
1. Extract all boundary keys (min/max) from source sstable
2. Check if boundary keys exist in output sstables
  a If all present, validation passes
  b. If any keys are missing, continue validation
3. For missing keys, read partition data from source sstables
4. Apply tombstone purging by using `gc_grace_seconds=0`
5. Check purged content
a If there is no live data, validation passes. Keys are properly
obsoleted
b If there is live data, validation fails and throws to abort compaction
if it's configured to abort

Configs:
* `cassandra.compaction_validation_mode=NONE`, available options:
NONE/WARN/ABORT
@michaelsembwever michaelsembwever merged commit e815ad9 into main-5.0 Oct 15, 2025
6 of 354 checks passed
@michaelsembwever michaelsembwever deleted the mck-cndb-15498-main-5.0 branch October 15, 2025 10:01
@sonarqubecloud
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2056 rejected by Butler


3 regressions found
See build details here


Found 3 new test failures

Test Explanation Runs Upstream
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.datamodels.QueryWriteLifecycleTest-_jdk11 REGRESSION 🔵🔴 0 / 11
o.a.c.cql3.validation.operations.AggregationQueriesTest.testAggregationQueryShouldNotTimeoutWhenItExceedesReadTimeout (compression) REGRESSION 🔴🔴 2 / 11
o.a.c.distributed.test.repair.ForceRepairTest.forceWithDifference () NEW 🔴 6 / 11

Found 4 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants