-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Context
There is an alternative way to make machine snapshots, by running them directly from files mapped disk on COW (copy-on-write) filesystems. This can be both useful for reader nodes (to quickly take snapshots of a state drive) and validator nodes (to save a rolling state efficiently, or to perform rollbacks). It would also be an alternative way of rolling back machines without fork()
calls, so you could perform in local machines (and even without a Linux host, although slow).
A potential nice side effect of the solution would be making the machine work well with sparse files, saving host disk space.
Possible solutions
I've been thinking of adding a backing_storage
runtime option to use with cm_load
/cm_create
APIs, so it can run a machine directly from disk while committing its state to disk. Also cm_store
API would just do a reflink file copy on COW filesystems from a previous machine loaded with backing_storage
enabled. In case the filesystem has no support for reflink copy, it would do a slow normal copy.
To perform advances while taking snapshots you just need to create machine using backing_storage
option and store each checkpoint using cm_store
to a directory.
Overall there would be very minimal changes to the API (just include backing_storage
and copy_reflink
runtime options), and the feature would be orthogonal to remote jsonrpc fork.
Here is a small example how it would look like in the Lua API:
do
-- copy PMAs from base snapshot using cp --reflink
local machine <close> = cartesi.machine('btrfs/base', {backing_storage="btrfs/snap", copy_reflink=true})
-- advance machine multiple tiles while taking multiple snapshots
for i=1,10 do
machine:run(i)
-- use copy --reflink on COW filesystems to perform a quick snapshot
machine:store('btrfs/snap'..i)
end
-- when the machine is destroyed, the backing storage is also removed
end
Later we can then generalize and create a fork()
like method on top of COW filesystems, to not abuse on naming, I will call it clone()
. Its implementation would look like:
local function clone_machine(machine, dir)
machine:store(dir) -- PMAs are copied with cp --reflink
return cartesi.machine(dir, {backing_storage=dir})
-- when the machine is destroyed, the backing storage is also removed
end
Notice he case of using the dir
for both load and backing storage, this is a special case and supported to allow an existing snapshot to advance its state while skipping file copies.
Additiona ideas
- We could use
flock
for files inside the storage directory for the cases of a backing storage for a machine that is not destroyed yet, then attempting to load a snapshot for a machine that is running would fail. - We could use
ftruncate
to create sparse backing storage. - If we introduce read-only flash drives, the copy file could use hard links (refer same inode), and be faster than COW.
Subtasks
To really make this efficient by default we have to optimize merkle tree loading/saving, so it would be nice to fix #258 afterwards. But this feature can be done orthogonal to that, by using the runtime option to skip root hash calculation.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status