-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add restore validation feature: restores to special keyspace allowing validating backup/restore in single cluster (space willing) #12573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
dcca4b7 to
77618b6
Compare
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
cafb643 to
ac07a89
Compare
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
fdbserver/storageserver.actor.cpp
Outdated
| } else { | ||
| // Extra key in restored data (skip it, as per design doc: one-directional comparison) | ||
| // Log the extra key for visibility without failing the validation | ||
| TraceEvent(SevInfo, "SSAuditRestoreExtraKey", thisServerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add it ass 'TraceEvent(SevInfo, "SSAuditRestoreExtraKey", just to be in-sync with other failures.
Let also push it in errors.push_back(error);
We want the user to know that there are some keys and it's not valid data. Lets treat this error as other errors found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. Done.
fdbserver/storageserver.actor.cpp
Outdated
| } | ||
|
|
||
| // Check for any remaining source keys that are missing from restored data | ||
| if (errors.empty() && sourceIdx < sourceReply.data.size() && !sourceReply.more) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this !sourceReply.more ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking end of the src and dest... Want to see if one side has more than the other and error if so (But above should be better.... I added && !restoredReply.more and only then error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did no understand, why it it not an error if we have sourceReply.more is true ? Is it still be an error?
* If extra key in restored data, report as error rather than as info. * Tighten up check on end where one side -- src or dest -- might have more data... if so, error.
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
fdbserver/storageserver.actor.cpp
Outdated
| // Extra key in restored data | ||
| std::string error = | ||
| format("Extra key in restored data: %s", Traceable<StringRef>::toString(restoredKV.key).c_str()); | ||
| TraceEvent(SevError, "SSAuditRestoreExtraKey", thisServerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional to use SSAuditRestoreExtraKey instead of SSAuditRestoreError
fdbserver/storageserver.actor.cpp
Outdated
| // Check for any remaining restored keys that don't have matching source keys | ||
| if (errors.empty() && restoredIdx < restoredReply.data.size() && !restoredReply.more) { | ||
| // Log the extra keys found in restored data | ||
| TraceEvent(SevInfo, "SSAuditRestoreExtraRestoredKeys", thisServerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we change this also to SevError and push them into errors?
SSAuditRestoreExtraRestoredKeys->SSAuditRestoreError
fdbserver/storageserver.actor.cpp
Outdated
| tr.setVersion(version); | ||
| tr.setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS); | ||
| tr.setOption(FDBTransactionOptions::LOCK_AWARE); | ||
| state RangeResult restoredData = wait(tr.getRange(restoredRange, limit, Snapshot::False, Reverse::False)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the scenario of limitBytes for reply reaches, does the getRange(restoreRange) will give less keys than getRange(rangeToRead) because of bigger key size in restoreRange?
If yes, how is it handled in the code?
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
This is an implementation of a Neethu design (design is included in the PR).
Here is running the new simulation included here 100k times:
20251121-184138-stack-1458b890ad727389 compressed=True data_size=55635346 duration=3717292 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:44:53 sanity=False started=100000 stopped=20251121-192631 submitted=20251121-184138 timeout=5400 username=stackI ran all tests 100k times and looks like hangs on the end at 99975 or so. Looking to see if related.
Also verified the feature manually.