brrrrrr
Note
this database is a research experiment building on first principles of architecting and specializing a database, inspired by a few techniques mentioned in Database Internals: A Deep Dive Into How Distributed Data Systems Work
Most of the privacy protocols reuse popular/production ready databases, such as rocksdb, postgres, etc, however, they may not be suitable for high performance use-cases, which is also why layerzero created qmdb to achieve high throughput. their design is different than rotortree, because we focus on append only merkle trees here, and we do not support updating any leaves in-place.
Warning
this approach makes MANY tradeoffs, and is not suitable for production AT ALL!!!
the tree design itself is heavily inspired by lean-imt based on the great work by cedoor & vivian @ PSE, this design was chosen so that it can have functional equivalents in zk dsls' and solidity. however, the main deviation is that here we implement an n-ary leanimt :) the intuition here is to reduce the depth, but maintain the same amount of total leaves. this also allows us to efficiently make use of on-disk storage blocks by grouping leaves together.
- you should have a k-v database in tandem with this to ensure you don't insert the same nullifier twice.
- you should constrain node values to the finite field you're using before insertion
- generic hasher, blake3 default
- batteries included for playing with different branching factors and max depths
- wal for persistence and recovery, with checkpointing to prevent unbounded wal growth
- wincode & regular serde for compatibility with ecosystem (e.g the jolt prover needs serde)
- no_std by default, persistence requires std
- benchmarks driven and configured by divan + crabtime
- by default your tree lives in memory, but with the
storagefeature you can tier cold levels to mmap'd data files viaTieringConfig::pin_above_level- with N=4, MAX_DEPTH=16, you can fit ~4.3B nullifiers in 41 GiB
- with N=8, MAX_DEPTH=10, you can fit ~1B nullifiers in 37 GiB
- which are quite feasible, but expensive. just use a new tree per generation and encode your nullifiers with the generation pls
- in most cases, you just need the tree in memory without crash persistence (as long as there is a bootstrapping sync mechanism), just use the single threaded variant, its MUCH better if you have a low number of insertions
- writes go to the wal + memory, reads are always from memory or mmap. one cannot do an apples to apples comparison with other merkle tree dbs that read from disk on every query
- assumes that leaves are prehashed, partial support for rfc 6962 only for internal nodes
- few dependencies ~ 65 (active + transitive, excluding dev deps), zero deps for just the algorithm
use LeanIMT
use rotortree::{LeanIMT, Blake3Hasher, TreeHasher};
// N=4 branching factor, MAX_DEPTH=20
let mut tree = LeanIMT::<Blake3Hasher, 4, 20>::new(Blake3Hasher);
// single insert
let leaf = [1u8; 32];
let root = tree.insert(leaf).unwrap();
// batch insert
let leaves: Vec<[u8; 32]> = (0..1000u32)
.map(|i| {
let mut h = [0u8; 32];
h[..4].copy_from_slice(&i.to_le_bytes());
h
})
.collect();
let root = tree.insert_many(&leaves).unwrap();
// proof generation & verification
let snap = tree.snapshot();
let proof = snap.generate_proof(0).unwrap();
let th = TreeHasher::new(Blake3Hasher);
assert!(proof.verify(&th).unwrap());optional feature flags for the in-memory mode:
concurrent: switches to&selfmethods with internalRwLock(viaparking_lot)parallel: enables rayon-parallelizedinsert_manyfor large batches (this works really well)wincode: adds wincode serde derives to proof types
use rotortree::{
Blake3Hasher, RotorTree, RotorTreeConfig, TreeHasher,
FlushPolicy, CheckpointPolicy, TieringConfig,
};
use std::path::PathBuf;
let config = RotorTreeConfig {
path: PathBuf::from("/tmp/my-tree"),
flush_policy: FlushPolicy::default(), // fsync every 10ms
checkpoint_policy: CheckpointPolicy::default(), // manual
tiering: TieringConfig::default(), // all in memory
verify_checkpoint: true, // recompute root on recovery
};
// opens existing WAL or creates a new one
let tree = RotorTree::<Blake3Hasher, 4, 20>::open(Blake3Hasher, config).unwrap();
// insert: returns root + a durability token
let (root, token) = tree.insert([42u8; 32]).unwrap();
// token.wait() blocks until the entry is fsynced
// or insert + wait for fsync in one call
let root = tree.insert_durable([43u8; 32]).unwrap();
// batch insert
let leaves = vec![[1u8; 32]; 500];
let (root, token) = tree.insert_many(&leaves).unwrap();
// lock-free snapshot for proof generation (same as in-memory)
let snap = tree.snapshot();
let proof = snap.generate_proof(0).unwrap();
let th = TreeHasher::new(Blake3Hasher);
assert!(proof.verify(&th).unwrap());
// explicit flush & close
tree.flush().unwrap();
tree.close().unwrap();
// (also flushes + releases file lock on drop)FlushPolicy options:
Interval(Duration): background thread fsyncs periodicallyManual: caller controls flushing viatree.flush()(works well if you're following a blockchain as the canonical source of state transitions)
CheckpointPolicy options (materializes wal state to data files, allowing wal truncation):
Manual: caller callscheckpoint()explicitlyEveryNEntries(n): auto-checkpoint after everynwal entriesMemoryThreshold(bytes): auto-checkpoint when uncheckpointed memory exceeds thresholdOnClose: checkpoint only on graceful close
TieringConfig controls which levels stay in memory vs get mmap'd after checkpoint:
pin_above_level: levels below this value have their committed chunks mmap'd from data files after checkpoint. set tousize::MAXto mmap everything (default),0to keep everything in memory
of course, you'll have to benchmark your unique workload to see if this database suits your use case and requirements. here are some constants and env vars you can play around with to alter behaviour:
compile-time constants (change in source and recompile):
CHUNK_SIZE(default128): hashes per chunk for structural sharing. 128 × 32 bytes = 4 KB per chunk. affects snapshot copy cost and arc granularityCHUNKS_PER_SEGMENT(default256): chunks per immutable segment. controls how many chunks are frozen into a single arc slabPAR_CHUNK_SIZE(default64,parallelfeature): parents per rayon work unit. smaller values = more parallelism but more scheduling overheadMAX_FRAME_PAYLOAD(default128 MB): maximum wal/checkpoint frame payload size. change this if you insert_many more than 128 MB worth of leaves
runtime env vars:
ROTORTREE_PARALLEL_THRESHOLD(default1024,parallelfeature): minimum parent count before rayon parallelism kicks in
- cargo-hack: to test all combinations of feature flags
- cargo-nextest: rust test runner
cargo hack check --feature-powerset
cargo hack clippy --feature-powerset
cargo +nightly fmt
😉 if you know where i grabbed the .rustfmt.toml from
cargo hack nextest run --feature-powerset
cargo bench -- --list
there are some feature flagged benchmarks, refer to the Cargo.toml entry for more details
Head over to https://rymnc.github.io/rotortree/benchmarks which has the latest benchmark results (~380 benchmarks)
the pure in-memory benchmark (tree_bench_parallel) exhibits lesser variance in the benchmarks, and achieves peak throughput upto ~190M leaves/sec; why would anyone need this much? i do not know myself. single threaded by far has the best performance characteristic in terms of variance though, useful to keep in mind if that is a constraint; trading off performance for predictability under load.
Note
There are more realistic benchmarks that simulate performance under load, i.e concurrent reads / proof generation + insertions
test bench:
- m4 pro, 14c, 48gig
- 1B leaves inserted in 1M chunks
- manual checkpoint after each chunk insertion
we can see here that verification is almost constant, with proof generation varying based on snapshot acquisition cost
- optimize
ceil_log_nby precomputing the table - run benchmarks in an isolated environment for better estimations

