-
Notifications
You must be signed in to change notification settings - Fork 21.6k
core/rawdb, triedb/pathdb: introduce trienode history #32596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d4b1023 to
ca6e68c
Compare
4129367 to
60dbb64
Compare
| value := h.nodes[owner][path] | ||
|
|
||
| // key section | ||
| n := binary.PutUvarint(buf[0:], uint64(prefixLen)) // key length shared (varint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why this is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside of the restart, these rules are applied:
- the first entry is always encoded with full key
- for all the subsequent entries, the key is encoded in a "compressed" format.
In which, only the difference between the key and preceding one is stored. Given
that we store the trie nodes here and the key is essentially the node path. By storing
the diff can effectively compress the entry key.
Therefore, a few additional metadata are tracked in the key section:
- shared key length
- unshared key length
- value length
These information can support us to recover the key from the byte stream.
| ) | ||
| for i, path := range h.nodeList[owner] { | ||
| key := []byte(path) | ||
| if i%trienodeDataBlockRestartLen == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with this, I don't understand why we need the block restarts and why the sharedPrefix will be 0 if we chunk
| if len(keySection) < int(8*nRestarts)+4 { | ||
| return nil, fmt.Errorf("key section too short, restarts: %d, size: %d", nRestarts, len(keySection)) | ||
| } | ||
| for i := 0; i < int(nRestarts); i++ { | ||
| o := len(keySection) - 4 - (int(nRestarts)-i)*8 | ||
| keyOffset := binary.BigEndian.Uint32(keySection[o : o+4]) | ||
| if i != 0 && keyOffset <= keyOffsets[i-1] { | ||
| return nil, fmt.Errorf("key offset is out of order, prev: %v, cur: %v", keyOffsets[i-1], keyOffset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are so many magic numbers here... its hard to comprehend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nodes from different tries are aggregated and concatenated within the key and value sections.
The offsets of the keys and values belonging to each trie are recorded in the header section.
For each trie, a list of internal chunks, called as restarts, is maintained.
At the end of the key section corresponding to a given trie, the offsets of these restarts are recorded.
These codes are for resolving the offsets of these restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two main reasons for maintaining these restarts:
(1) compress the entry key
Given that the key length of the entry (trie node) is not negligible. By maintaining the difference with the preceding one (usually parent node) can compress the key length effectively. Usually 1 byte diff is sufficient.
(2) enhance the lookup efficiency
The first entry in the restart is always stored with full key (no shared part with the preceding one). Therefore the binary search can be performed at the boundary of restarts.
MariusVanDerWijden
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, small nits and a few questions
60dbb64 to
6ab502d
Compare
MariusVanDerWijden
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
It's a pull request based on the ethereum#32523 , implementing the structure of trienode history.
It's a pull request based on the ethereum#32523 , implementing the structure of trienode history.
It's a pull request based on the ethereum#32523 , implementing the structure of trienode history.
It's a pull request based on the #32523 , implementing the structure of trienode history.