Skip to content

Conversation

@rjl493456442
Copy link
Member

It's a pull request based on the #32523 , implementing the structure of trienode history.

@rjl493456442 rjl493456442 force-pushed the trie-archive-p3 branch 2 times, most recently from d4b1023 to ca6e68c Compare September 17, 2025 06:31
@MariusVanDerWijden MariusVanDerWijden self-assigned this Sep 17, 2025
@rjl493456442 rjl493456442 force-pushed the trie-archive-p3 branch 2 times, most recently from 4129367 to 60dbb64 Compare September 22, 2025 06:19
value := h.nodes[owner][path]

// key section
n := binary.PutUvarint(buf[0:], uint64(prefixLen)) // key length shared (varint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get why this is needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside of the restart, these rules are applied:

  • the first entry is always encoded with full key
  • for all the subsequent entries, the key is encoded in a "compressed" format.
    In which, only the difference between the key and preceding one is stored. Given
    that we store the trie nodes here and the key is essentially the node path. By storing
    the diff can effectively compress the entry key.

Therefore, a few additional metadata are tracked in the key section:

  • shared key length
  • unshared key length
  • value length

These information can support us to recover the key from the byte stream.

)
for i, path := range h.nodeList[owner] {
key := []byte(path)
if i%trienodeDataBlockRestartLen == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this, I don't understand why we need the block restarts and why the sharedPrefix will be 0 if we chunk

Comment on lines +350 to +355
if len(keySection) < int(8*nRestarts)+4 {
return nil, fmt.Errorf("key section too short, restarts: %d, size: %d", nRestarts, len(keySection))
}
for i := 0; i < int(nRestarts); i++ {
o := len(keySection) - 4 - (int(nRestarts)-i)*8
keyOffset := binary.BigEndian.Uint32(keySection[o : o+4])
if i != 0 && keyOffset <= keyOffsets[i-1] {
return nil, fmt.Errorf("key offset is out of order, prev: %v, cur: %v", keyOffsets[i-1], keyOffset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many magic numbers here... its hard to comprehend

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nodes from different tries are aggregated and concatenated within the key and value sections.
The offsets of the keys and values belonging to each trie are recorded in the header section.

For each trie, a list of internal chunks, called as restarts, is maintained.
At the end of the key section corresponding to a given trie, the offsets of these restarts are recorded.
These codes are for resolving the offsets of these restarts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two main reasons for maintaining these restarts:

(1) compress the entry key
Given that the key length of the entry (trie node) is not negligible. By maintaining the difference with the preceding one (usually parent node) can compress the key length effectively. Usually 1 byte diff is sufficient.

(2) enhance the lookup efficiency
The first entry in the restart is always stored with full key (no shared part with the preceding one). Therefore the binary search can be performed at the boundary of restarts.

Copy link
Member

@MariusVanDerWijden MariusVanDerWijden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, small nits and a few questions

Copy link
Member

@MariusVanDerWijden MariusVanDerWijden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@rjl493456442 rjl493456442 added this to the 1.16.5 milestone Oct 10, 2025
@rjl493456442 rjl493456442 merged commit de24450 into ethereum:master Oct 10, 2025
7 of 9 checks passed
@ethereumorg092-arch ethereumorg092-arch mentioned this pull request Oct 10, 2025
Sahil-4555 pushed a commit to Sahil-4555/go-ethereum that referenced this pull request Oct 12, 2025
It's a pull request based on the ethereum#32523 , implementing the structure of
trienode history.
atkinsonholly pushed a commit to atkinsonholly/ephemery-geth that referenced this pull request Nov 24, 2025
It's a pull request based on the ethereum#32523 , implementing the structure of
trienode history.
prestoalvarez pushed a commit to prestoalvarez/go-ethereum that referenced this pull request Nov 27, 2025
It's a pull request based on the ethereum#32523 , implementing the structure of
trienode history.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants