Skip to content

Commit f6aa147

Browse files
committed
blob sidecar era/erb proposal
1 parent 13a70e9 commit f6aa147

File tree

1 file changed

+38
-19
lines changed

1 file changed

+38
-19
lines changed

docs/e2store.md

Lines changed: 38 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,14 @@ data: snappyFramed(ssz(BeaconState))
116116

117117
The fork and thus the exact format of the `BeaconState` should be derived from the `slot`.
118118

119+
## BlobCompressedSidecars
120+
```
121+
type: [0x02, 0x00]
122+
data: ssz(List[snappyFramed(BlobSidecar), MAX_BLOBS_PER_BLOCK])
123+
```
124+
125+
`BlobCompressedSidecars` contain a list of `BlobSidecar` objects encoded using `SSZ` then compressed using the snappy [framing format](https://github.com/google/snappy/blob/master/framing_format.txt).
126+
119127
## Empty
120128

121129
```
@@ -169,7 +177,9 @@ def read_slot_index(f):
169177
return (start_slot, record_start, slot_offsets)
170178
```
171179

172-
# Era files
180+
# Erb files
181+
182+
Stand-in: like .era files, but blobs instead of blocks.
173183

174184
`.era` files are special instances of `.e2s` files that follow a more strict content format optimised for reading and long-term storage and distribution.
175185

@@ -183,11 +193,10 @@ Each era is identified by when it ends. Thus, the genesis era is era `0`, follow
183193

184194
`.era` file names follow a simple convention: `<config-name>-<era-number>-<era-count>-<short-historical-root>.era`:
185195

186-
* `config-name` is the `CONFIG_NAME` field of the runtime configation (`mainnet`, `prater`, `sepolia`, `holesky`, etc)
196+
* `config-name` is the `CONFIG_NAME` field of the runtime configation (`mainnet`, `sepolia`, `holesky`, etc)
187197
* `era-number` is the number of the _first_ era stored in the file - for example, the genesis era file has number 0 - as a 5-digit 0-filled decimal integer
188198
* `short-era-root` is the first 4 bytes of the last historical root in the _last_ state in the era file, lower-case hex-encoded (8 characters), except the genesis era which instead uses the `genesis_validators_root` field from the genesis state.
189-
* The root is available as `state.historical_roots[era - 1]` except for genesis, which is `state.genesis_validators_root`
190-
* Post-Capella, the root must be computed from `state.historical_summaries[era - state.historical_roots.len - 1]`
199+
* The root is available as `state.historical_summaries[era - state.historical_roots.len - 1]`
191200

192201
Era files with multiple eras use the era number of the lowest era stored in the file, and the root of the highest era.
193202

@@ -199,9 +208,8 @@ An `.era` file is structured in the following way:
199208

200209
```
201210
era := group+
202-
group := Version | block* | era-state | other-entries* | slot-index(block)? | slot-index(state)
203-
block := CompressedSignedBeaconBlock
204-
era-state := CompressedBeaconState
211+
group := Version | blobs* | other-entries* | slot-index(block)?
212+
blobs := BlobCompressedSidecars
205213
```
206214

207215
The `block` entries of a group include all blocks leading up to the era transition in slot order. For example, the group representing era `1` contains blocks from slot `0` up to and including block `8191`. Empty slots are skipped.
@@ -228,7 +236,7 @@ def read_era_file(name):
228236
# Print contents of an era file, backwards
229237
with open(name, "rb") as f:
230238

231-
# Seek to end of file to figure out the indices of the state and blocks
239+
# Seek to end of file to figure out the indices of the blobs
232240
f.seek(0, 2)
233241

234242
groups = 0
@@ -252,16 +260,16 @@ def read_era_file(name):
252260
(block_slot, block_index_start, block_slot_offsets) = read_slot_index(f)
253261

254262
print(
255-
"Block start slot:", block_slot,
256-
"block index start:", block_index_start,
263+
"Blob start slot:", block_slot,
264+
"blob index start:", block_index_start,
257265
"offsets", len(block_slot_offsets))
258266

259267
if any((x for x in block_slot_offsets if x != 0)):
260268
# This can underflow! Python should complain when seeking - ymmv
261269
prev_group = block_index_start + [x for x in block_slot_offsets if x != 0][0] - 8
262270

263271
print("Previous group starts at:", prev_group)
264-
# The beginning of the first block (or the state, if there are no blocks)
272+
# The beginning of the first blob list # TODO or the state, if there are no blobs
265273
# is the end of the previous group
266274
f.seek(prev_group) # Skip header
267275

@@ -273,24 +281,19 @@ def read_era_file(name):
273281

274282
To verify the internal consistency of an era file, the following checks should be made to verify that an era file is valid for a given network:
275283

276-
* each group follows the given structure of era files with regards to blocks, states and their indices
284+
* each group follows the given structure of era files with regards to blobs and their indices
277285
* offsets within indices must point to entries of the correct kind that can be decompressed and deserialized
278286
* era file readers must be prepared to handle malicious inputs, including out-of-range offsets, invalid length prefixes and other trivial errors
279287
* unknown record types should be ignored, but it is recommended that verifiers report their size and tag
288+
* all blobs are consistent with regard to blocks to which the point
280289
* the state is loadable and consistent with the given runtime configuration
281-
* the root of each block in the era file matches that of `state.block_roots` - if a slot is empty according to the block index, this should be confirmed by verifying that
282-
`state.get_block_root_at_slot(empty_slot - 1) == state.get_block_root_at_slot(empty_slot)` except for the first slot of the era which, if possible, should be verified against `era - 1`
283-
* the genesis era file does not have any blocks
284-
* the signature of each block can be verified by the keys in the given state (or any newer state).
290+
* TODO need the block era file here; in general, blobs can only be verified to a limited existent standalone
285291

286292
Extended verification consists of verifying a list of era files against a particular history anchored in a checkpoint or a head block. Verification starts from a well-known finalized checkpoint for a slot within the era, using `anchor_state_root = checkpoint_state.state_roots[0]` as anchor and walking the era files as a linked list.
287293

288294
For each era file:
289295

290-
* verify that `hash_tree_root(state) == anchor_state_root`
291-
* this anchors the era in a particular history, starting from the given state root - the state root is available from any state within the anchor era.
292296
* verify the internal consistency of the era, as above
293-
* set `anchor_state_root == state.state_roots[0]`
294297

295298
# FAQ
296299

@@ -351,3 +354,19 @@ Each era file contains a full `BeaconState` object whose `block_roots` field cor
351354
Offsets in `SSZ` are `uint32` thus from a practical point of view, any one SSZ object may generally not exceed that size.
352355

353356
A future entry type can introduce chunking should larger entries be needed, or spill the remaining size bytes into `reserved`, effectively turning the encoding of the length into a fictive `uint48` type.
357+
358+
## Why are are entire BlobSidecar sets per blob stored together?
359+
360+
SlotIndex only allows one index per slot, and this also allows exact one-to-one correspondence with block-based .era files, while avoiding adding special cases for blocks without blobs.
361+
362+
## Why use a single SSZ structure for this BlobSidecar set per blob?
363+
364+
It similarly creates a mirrored e2s structure between Era and Erb, while reusing existing SSZ parsing and loading code.
365+
366+
## Why use lists of compressed blob sidecars rather than either compressed lists of blob sidecars or uncompressed lists of uncompressed blob sidecars?
367+
368+
This enables req/resp copying directly, which operates on a per-blob-sidecar-basis rather than fetching all blob sidecars at once, without additional Snappy decompression.
369+
370+
## Why separate BlobSidecar from block storage?
371+
372+
Blob sidecars aren't, properly, part of the consensus record. It is reasonable for an archival node to archive only blocks, not blob sidecars. This isn't unique to Era/Erb files, but occurs, e.g., while syncing, where blob verification only must occur within the blob retention window.

0 commit comments

Comments
 (0)