Skip to content

perf(trie): optimize get_block_trie_updates to avoid duplicate database work #19076

@yongkangc

Description

@yongkangc

Problem

During block persistence, get_block_trie_updates performs expensive database operations that duplicate work already done:

  1. Calls trie_reverts(block_number + 1) which walks through all trie changesets from block_number + 1 onwards
  2. Creates new cursors for account and storage trie tables
  3. Re-reads trie node data that may have been read previously during block execution

This is called in the critical path during save_blocks (tree/mod.rs:1752), contributing to the 5-second persistence latency reported in blocks with ~5,000 transactions.

Current Flow

The implementation in provider.rs:2284:

  • Calls trie_reverts() which performs an expensive walk through changesets
  • Creates new InMemoryTrieCursorFactory with new database cursors
  • Walks through account and storage trie changesets again to look up node values

Proposed Implementation

1. Pass Reusable Resources to get_block_trie_updates

Modify the signature to accept pre-created cursors and optionally cached trie reverts:

fn get_block_trie_updates_with_cursors(
    &self,
    block_number: BlockNumber,
    cached_reverts: Option<&TrieUpdatesSorted>,
    cursor_factory: &impl TrieCursorFactory,
) -> ProviderResult<TrieUpdatesSorted>

2. Batch Processing for Multiple Blocks

When persisting multiple blocks in PersistenceService::on_save_blocks, create cursors once and reuse:

// Create cursors once for the batch
let tx = provider_rw.tx_ref();
let db_cursor_factory = DatabaseTrieCursorFactory::new(tx);
let mut accounts_trie_cursor = tx.cursor_dup_read::<tables::AccountsTrieChangeSets>()?;
let mut storages_trie_cursor = tx.cursor_dup_read::<tables::StoragesTrieChangeSets>()?;

// Process all blocks with shared cursors
for block in blocks {
    let trie_updates = get_block_trie_updates_with_cursors(
        block.number(), 
        None, 
        &db_cursor_factory
    )?;
    // ... save block with trie_updates
}

3. Cache Trie Reverts for Sequential Blocks

When processing blocks N, N+1, N+2, ...:

  • Compute trie_reverts(N+1) once
  • Incrementally update it for subsequent blocks instead of recomputing from scratch
  • Store in a reusable buffer to avoid allocations

Alternative: Pre-compute During Execution

Instead of calling get_block_trie_updates during persistence, consider computing trie updates during block execution and storing them in ExecutedBlock:

pub struct ExecutedBlock<N: NodePrimitives = EthPrimitives> {
    pub block: BlockWithSenders<N::Block>,
    pub senders: Vec<Address>,
    pub execution_output: ExecutionOutcome<N::Receipt>,
    pub hashed_state: HashedPostState,
    pub trie: TrieUpdates,
    pub trie_updates: TrieUpdatesSorted,  // ← Add this
}

This eliminates the need to re-read from the database during persistence entirely.

Impact

  • Reduce cursor creation overhead from O(num_blocks) to O(1) per batch
  • Eliminate redundant changeset walks when processing sequential blocks
  • Reduce memory allocations by reusing buffers
  • Lower overall block persistence latency, particularly for transaction-heavy blocks

Metadata

Metadata

Assignees

Labels

A-dbRelated to the databaseC-perfA change motivated by improving speed, memory usage or disk footprint

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions