Skip to content

Conversation

@ZuebeyirEser
Copy link
Contributor

Purpose

Linked issue: close #2526

This PR addresses the issue where rebalancing a primary key table leads to a KvSnapshotNotExistException. The problem was that when a bucket replica was moved during rebalancing, the tablet server being stopped was incorrectly deleting shared remote snapshots from the filesystem, which the new leader still needed to function.

Brief change log

  • Fixed ReplicaManager#stopReplica by removing the logic that triggered kvManager.deleteRemoteKvSnapshot.
  • The previous implementation caused nodes to perform a global deletion of remote snapshots during a local replica cleanup if the delete flag was set.
  • This ensures that rebalancing or moving replicas only affects local disk state. Remote snapshots are now correctly treated as part of the global table lifecycle rather than being managed by individual nodes during migration.

Tests

  • Added KvSnapshotDeletionBugReplicationTest to fluss-server.
  • The test confirms that when a replica is dropped locally (simulating rebalance behavior), the remote snapshot directory and its metadata remain untouched.
  • Verified that this prevents the KvSnapshotNotExistException previously seen by leaders after a rebalance.

API and Format

no

Documentation

no

@ZuebeyirEser
Copy link
Contributor Author

@luoyuxia I actually also provided PR for this spesific flaky test as well. Please see: #2524 currently under review by @swuferhong
FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rebalance may cause kv snapshot deleted by accident

1 participant