Skip to content

Conversation

@saintstack
Copy link
Contributor

Adds support for using BulkDump to create backup snapshots and
BulkLoad to restore them, as an alternative to traditional range file backups.

Backup (BulkDump):
- Add BulkDumpTaskFunc that submits a BulkDump job to DD and monitors completion
- Write keyspace snapshot file with bulkDumpJobId in metadata after job completes
- Set latestSnapshotEndVersion and firstSnapshotEndVersion for backup state
- Derive transport method (CP vs BLOBSTORE) from backup URL
- Store originalBulkDumpMode in BackupConfig for crash recovery

Restore (BulkLoad):
- Add BulkLoadRestoreTaskFunc that submits a BulkLoad job using the bulkDumpJobId
- Register range lock owner before submitting job
- Chain with RestoreDispatchTaskFunc for log application after BulkLoad completes
- Set firstConsistentVersion and update progress counters
- Store originalBulkLoadMode in RestoreConfig for crash recovery

Configuration:
- Add snapshotMode parameter (0=RANGEFILE, 1=BULKDUMP) to control backup type
- Add useRangeFileRestore parameter to control restore method
- Add BULKDUMP_JOB_TIMEOUT and BULKLOAD_JOB_TIMEOUT client knobs
- Add getBulkLoadMode() to ManagementAPI

Testing:
- Add BackupS3BlobBulkLoadRestore.toml test with assertions verifying both
  BulkDump and BulkLoad task completion

Documentation:
- Add design/bulkload-restore-integration.md with detailed design document

@saintstack saintstack requested review from akankshamahajan15, Copilot, jzhou77, kakaiu and neethuhaneesha and removed request for Copilot December 24, 2025 20:44
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: b47782e
  • Duration 0:08:53
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: b47782e
  • Duration 0:34:47
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: b47782e
  • Duration 0:43:01
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: b47782e
  • Duration 0:49:42
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: b47782e
  • Duration 1:06:04
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: b47782e
  • Duration 1:08:15
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@saintstack
Copy link
Contributor Author

20251224-202551-stack_all-198ecd48a4a5577a compressed=True data_size=35409138 duration=5395736 ended=100000 fail=2 fail_fast=10 max_runs=100000 pass=99998 priority=100 remaining=0 runtime=1:07:05 sanity=False started=100000 stopped=20251224-213256 submitted=20251224-202551 timeout=5400 username=stack_all

Failures are in AtomicBackupToDBCorrectness.toml and VersionStampSwitchover.toml. Don't seem related but will dig more.

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: b47782e
  • Duration 1:51:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

This commit adds the ability to use BulkDump for creating backup snapshots
and BulkLoad for restoring them, providing faster backup/restore operations
for large databases.

Key changes:
- Add BulkDumpTaskFunc to create SST file snapshots during backup
- Add BulkLoadRestoreTaskFunc to restore from BulkDump snapshots
- Store bulkDumpJobId in snapshot metadata for restore coordination
- Add snapshotMode parameter (0=RANGEFILE, 1=BULKDUMP) to control backup type
- Add useRangeFileRestore parameter to control restore method
- Add CLIENT_KNOBS for configurable job timeouts
- Add test assertions to verify BulkDump/BulkLoad execution
- Check for existing running jobs to avoid conflicts when multiple agents run
- Properly scope state variables for error handling in Flow actors

New test: tests/slow/BackupS3BlobBulkLoadRestore.toml
@saintstack saintstack force-pushed the bulkload_snapshot_clean branch from b47782e to 4a7a00f Compare December 24, 2025 23:32
@saintstack
Copy link
Contributor Author

Fix compile failure.

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 4a7a00f
  • Duration 0:22:18
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 4a7a00f
  • Duration 0:35:28
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 4a7a00f
  • Duration 0:44:37
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 4a7a00f
  • Duration 0:48:35
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 4a7a00f
  • Duration 0:51:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 4a7a00f
  • Duration 0:51:26
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@saintstack
Copy link
Contributor Author

Joshua for this latest PR:

20251224-233509-stack_all-9c3cc03bf245f137 compressed=True data_size=35409063 duration=5617518 ended=100000 fail=1 fail_fast=10 max_runs=100000 pass=99999 priority=100 remaining=0 runtime=0:54:59 sanity=False started=100000 stopped=20251225-003008 submitted=20251224-233509 timeout=5400 username=stack_all

Failure is this (No changes to DD in the patch...)

RandomSeed="3488465039" SourceVersion="4a7a00f4c1c7b1734ef2048da4e4745c6aafce6c" Time="1766620030" BuggifyEnabled="1" DeterminismCheck="0" FaultInjectionEnabled="1" TestFile="tests/slow/DDBalanceAndRemove.toml"

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 4a7a00f
  • Duration 1:53:53
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants