Skip to content

Conversation

@martintomazic
Copy link
Contributor

@martintomazic martintomazic commented Oct 1, 2025

Closes #6321

Motivation:

When enabling aggressive pruning node may fall behind. Whilst unlikely, the consequence may be the node breaking its commitment, which could results in lost rewards, possibly registering its availability when it is actually not ready, or worst case penalties (don't think this would happen actually).

Solution:

To prevent it, we should offer validators a maintenance command that should be called if pruning is enabled later on, before starting the node. This way freshly started node is guaranteed to be healthy.

As with compaction command, I have started with consensus databases only and suggest to add runtime databases for the follow-up. Moreover, I quickly tested that this command does not prune past the minimum reindexed height, in case of configured runtimes.

Automatically prune on the node startup?

For now this is not the case:

  • Just pruning without compacting is not particularly useful. Unless we also automate compaction as part of the startup, the operator is forced to stop the node once the pruning finishes.
    • Bad ux, also possibly wait if the node already registered.
  • I don't want to automatically enable forced compaction on the startup as this is inefficient assuming the operators restarts twice in a row, e.g. due to fixing config, possibly triggering corner case observed.
  • Given this command is experimental and a bit hackish, I would prefer if we do not automate pruning on the node startup, at least for few months until we know if there is anything unexpected that could happen.

How to test:

Have a node with valid state, doesn't have to be synced. Configure pruning to retain less data and run:

oasis-node storage prune-experimental --config etc/config.yml

Following that trigger compaction:

oasis-node storage compact-experimental --config etc/config.yml

Ensure the node is able to sync correctly and that disk space has been correctly reclaimed. You can also configure a runtime and test this command does not prune past the min reindexed height.

Follow-up:

@martintomazic martintomazic force-pushed the martin/feature/cli/compact-db-instances branch from 9a12d7f to c07cc2b Compare October 2, 2025 08:54
@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from 29b5874 to a38c165 Compare October 2, 2025 08:59
@martintomazic martintomazic force-pushed the martin/feature/cli/compact-db-instances branch 3 times, most recently from c4616bb to 9c2b25b Compare October 10, 2025 13:15
@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from a38c165 to f662ffc Compare October 10, 2025 13:16
@martintomazic martintomazic force-pushed the martin/feature/cli/compact-db-instances branch from 9c2b25b to ac4c227 Compare October 20, 2025 08:33
@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from f662ffc to 8ed2677 Compare October 20, 2025 08:46
Base automatically changed from martin/feature/cli/compact-db-instances to master October 20, 2025 23:09
@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from 8ed2677 to f74deec Compare October 22, 2025 22:25
@netlify
Copy link

netlify bot commented Oct 22, 2025

Deploy Preview for oasisprotocol-oasis-core canceled.

Name Link
🔨 Latest commit ce38558
🔍 Latest deploy log https://app.netlify.com/projects/oasisprotocol-oasis-core/deploys/692977d5c9c0020008899ded

@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from f74deec to 08e8f6c Compare October 23, 2025 11:37
@martintomazic martintomazic changed the title go/oasis-node/cmd/storage: Add command for offline pruning (POC) go/oasis-node/cmd/storage: Add command for offline pruning Oct 23, 2025
@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch 2 times, most recently from 503b59c to b0318c0 Compare October 31, 2025 10:18
@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from b0318c0 to a2c2244 Compare October 31, 2025 11:11
@martintomazic martintomazic marked this pull request as ready for review October 31, 2025 11:39
@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from a2c2244 to 5842af8 Compare November 1, 2025 21:23
@codecov
Copy link

codecov bot commented Nov 1, 2025

Codecov Report

❌ Patch coverage is 10.75949% with 141 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.06%. Comparing base (d2cc5c8) to head (ce38558).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
go/oasis-node/cmd/storage/storage.go 0.78% 127 Missing ⚠️
go/oasis-node/cmd/common/common.go 0.00% 7 Missing ⚠️
go/consensus/cometbft/db/init.go 64.28% 3 Missing and 2 partials ⚠️
go/consensus/cometbft/full/archive.go 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6341      +/-   ##
==========================================
- Coverage   64.92%   64.06%   -0.86%     
==========================================
  Files         698      698              
  Lines       68065    68207     +142     
==========================================
- Hits        44190    43700     -490     
- Misses      18872    19479     +607     
- Partials     5003     5028      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from 5842af8 to 7e3cb2a Compare November 14, 2025 09:47
@martintomazic
Copy link
Contributor Author

Rebased (no changes).

Too see how this would fit in the big picture check the new (pending) pruning section: oasisprotocol/docs#1526

@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch 2 times, most recently from 492bb68 to c9e9bd7 Compare November 21, 2025 10:32
@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from c9e9bd7 to fe6d85e Compare November 28, 2025 10:11
When enabling aggresive pruning on a previously synced node and
restarting it immediately, node may start lagging behind (minutes
to hours) and still believe its status is ready.

We should offer validators a maintenance command that can be called
offline, when increasing or possibly enabling the pruning for the
first time, to ensure only healthy nodes join the network.
@martintomazic martintomazic force-pushed the martin/feature/cli/prune-offline branch from fe6d85e to ce38558 Compare November 28, 2025 10:22
@martintomazic martintomazic merged commit 7576ee2 into master Nov 28, 2025
5 of 7 checks passed
@martintomazic martintomazic deleted the martin/feature/cli/prune-offline branch November 28, 2025 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

go/oasis-node: Previously synced node starts lagging after enabling aggressive pruning

3 participants