Skip to content

Commit 79fc094

Browse files
committed
feat: rs2-chunked
1 parent 2dfa852 commit 79fc094

File tree

41 files changed

+4194
-183
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+4194
-183
lines changed

README.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,3 +144,122 @@ yourself with our [contributing workflow](./CONTRIBUTING.md).
144144

145145
This project is licensed under the Apache License, Version 2.0 ([LICENSE](LICENSE) or
146146
<https://www.apache.org/licenses/LICENSE-2.0>).
147+
148+
## RS2Chunked Blob Structure
149+
150+
For large blobs that exceed memory limits, Walrus uses a chunked encoding scheme (RS2Chunked) with a two-level Merkle tree structure:
151+
152+
```mermaid
153+
graph TD
154+
subgraph "Blob Level"
155+
BlobID["Blob ID<br/>(Root of blob-level Merkle tree)"]
156+
BlobMerkle["Blob-level Merkle Tree"]
157+
BlobID --> BlobMerkle
158+
end
159+
160+
subgraph "Sliver Pair Level (Blob)"
161+
BlobMerkle --> SPM0["Sliver Pair 0<br/>Metadata"]
162+
BlobMerkle --> SPM1["Sliver Pair 1<br/>Metadata"]
163+
BlobMerkle --> SPMDots["..."]
164+
BlobMerkle --> SPMN["Sliver Pair N<br/>Metadata"]
165+
end
166+
167+
subgraph "Each Sliver Pair Metadata"
168+
SPM0 --> SPM0Root["Merkle Root over<br/>Chunk Hashes"]
169+
end
170+
171+
subgraph "Chunk Level (for Sliver Pair 0)"
172+
SPM0Root --> C0H["Chunk 0 Hash"]
173+
SPM0Root --> C1H["Chunk 1 Hash"]
174+
SPM0Root --> CDots["..."]
175+
SPM0Root --> CMH["Chunk M Hash"]
176+
end
177+
178+
subgraph "Chunk 0 Structure"
179+
C0H --> C0Primary["Primary Sliver<br/>(Merkle Root)"]
180+
C0H --> C0Secondary["Secondary Sliver<br/>(Merkle Root)"]
181+
end
182+
183+
subgraph "Storage Indexing"
184+
Storage["Storage Node Indexing"]
185+
Storage --> Key1["(blob_id, chunk_0, sliver_pair_0)"]
186+
Storage --> Key2["(blob_id, chunk_0, sliver_pair_1)"]
187+
Storage --> Key3["(blob_id, chunk_1, sliver_pair_0)"]
188+
Storage --> KeyDots["..."]
189+
end
190+
191+
style BlobID fill:#e1f5ff
192+
style BlobMerkle fill:#ffe1e1
193+
style SPM0Root fill:#fff4e1
194+
style Storage fill:#e1ffe1
195+
```
196+
197+
### Smart Defaults - Automatic Chunk Size Selection
198+
199+
The automatic chunk size selection behavior is based on these key parameters:
200+
201+
1. When Chunking Kicks In
202+
203+
Chunking is automatically used when:
204+
blob_size > max_blob_size_for_n_shards(n_shards, encoding_type)
205+
206+
Where:
207+
- max_blob_size_for_n_shards = source_symbols_per_blob × max_symbol_size
208+
- max_symbol_size = 65,534 bytes (u16::MAX - 1) for RS2 encoding
209+
- source_symbols_per_blob = n_primary × n_secondary (depends on shard count)
210+
211+
Example for 1000 shards:
212+
- Primary source symbols: 334
213+
- Secondary source symbols: 667
214+
- Total source symbols: 334 × 667 = 222,778
215+
- Max single-chunk size: 222,778 × 65,534 = ~13.9 GB
216+
217+
So for a typical network with 1000 shards, chunking automatically kicks in for blobs larger than
218+
~13.9 GB.
219+
220+
2. Default Chunk Size
221+
222+
When chunking is needed, the system uses:
223+
pub const DEFAULT_CHUNK_SIZE: u64 = 10 * 1024 * 1024; // 10 MB
224+
225+
This was chosen based on several factors documented in the code:
226+
- Memory efficiency: 10 MB chunks keep memory usage reasonable during encoding/decoding
227+
- Metadata overhead: At 10 MB per chunk with 1000 shards, metadata is only 0.64% overhead (64 KB
228+
metadata per 10 MB chunk)
229+
- Streaming performance: Smaller chunks enable faster initial data delivery
230+
- Storage granularity: Reasonable balance between network round-trips and overhead
231+
232+
3. Constraints
233+
234+
The system enforces:
235+
- Minimum chunk size: 10 MB (prevents excessive metadata overhead)
236+
- Maximum chunks per blob: 1000 (bounds total metadata size to ~64 MB)
237+
238+
4. Practical Examples
239+
240+
Small blob (< 13.9 GB with 1000 shards):
241+
walrus store --epochs 5 small_file.bin # 1 GB file
242+
→ Uses standard RS2 encoding (single chunk)
243+
→ No chunking needed
244+
245+
Large blob (> 13.9 GB with 1000 shards):
246+
walrus store --epochs 5 large_file.bin # 50 GB file
247+
→ Automatically uses RS2Chunked encoding
248+
→ Chunk size: 10 MB (DEFAULT_CHUNK_SIZE)
249+
→ Number of chunks: 5120 (50 GB / 10 MB)
250+
251+
Manual override:
252+
walrus store --epochs 5 --chunk-size 20971520 large_file.bin # 50 GB with 20 MB chunks
253+
→ Forces RS2Chunked encoding
254+
→ Chunk size: 20 MB (user specified)
255+
→ Number of chunks: 2560 (50 GB / 20 MB)
256+
→ Useful for systems with more memory available
257+
258+
5. Why Manual Override is Useful
259+
260+
- Memory-constrained environments: Use smaller chunks (e.g., 5 MB) to reduce peak memory usage
261+
- Performance tuning: Larger chunks (e.g., 20-50 MB) may improve throughput when memory is abundant
262+
- Testing: Validate chunking behavior with smaller test files by forcing chunked mode
263+
264+
The smart defaults ensure that most users never need to think about chunking—it "just works" when
265+
blobs exceed single-chunk limits, while still giving advanced users control when needed.

contracts/subsidies/sources/subsidies.move

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,7 @@ public fun register_blob(
330330
root_hash: u256,
331331
size: u64,
332332
encoding_type: u8,
333+
chunk_size: u64,
333334
deletable: bool,
334335
write_payment: &mut Coin<WAL>,
335336
ctx: &mut TxContext,
@@ -341,6 +342,7 @@ public fun register_blob(
341342
root_hash,
342343
size,
343344
encoding_type,
345+
chunk_size,
344346
deletable,
345347
write_payment,
346348
ctx,

contracts/subsidies/tests/subsidies_tests.move

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -636,6 +636,7 @@ fun register_default_blob(
636636
ROOT_HASH,
637637
UNENCODED_SIZE,
638638
RS2,
639+
0, // chunk_size: 0 for RS2 encoding (non-chunked)
639640
deletable,
640641
&mut fake_coin,
641642
ctx,

contracts/walrus/sources/system.move

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ public fun certify_event_blob(
8181
root_hash: u256,
8282
size: u64,
8383
encoding_type: u8,
84+
chunk_size: u64,
8485
ending_checkpoint_sequence_num: u64,
8586
epoch: u32,
8687
ctx: &mut TxContext,
@@ -93,6 +94,7 @@ public fun certify_event_blob(
9394
root_hash,
9495
size,
9596
encoding_type,
97+
chunk_size,
9698
ending_checkpoint_sequence_num,
9799
epoch,
98100
ctx,
@@ -129,13 +131,15 @@ public fun reserve_space_for_epochs(
129131
/// Registers a new blob in the system.
130132
/// `size` is the size of the unencoded blob. The reserved space in `storage` must be at
131133
/// least the size of the encoded blob.
134+
/// For RS2_CHUNKED encoding, `chunk_size` specifies the chunk size. For RS2, pass 0.
132135
public fun register_blob(
133136
self: &mut System,
134137
storage: Storage,
135138
blob_id: u256,
136139
root_hash: u256,
137140
size: u64,
138141
encoding_type: u8,
142+
chunk_size: u64,
139143
deletable: bool,
140144
write_payment: &mut Coin<WAL>,
141145
ctx: &mut TxContext,
@@ -148,6 +152,7 @@ public fun register_blob(
148152
root_hash,
149153
size,
150154
encoding_type,
155+
chunk_size,
151156
deletable,
152157
write_payment,
153158
ctx,

contracts/walrus/sources/system/blob.move

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ public struct Blob has key, store {
5555
storage: Storage,
5656
// Marks if this blob can be deleted.
5757
deletable: bool,
58+
// Chunk size for RS2_CHUNKED encoding. For RS2, this is 0.
59+
chunk_size: u64,
5860
}
5961

6062
// === Accessors ===
@@ -91,11 +93,16 @@ public fun is_deletable(self: &Blob): bool {
9193
self.deletable
9294
}
9395

96+
public fun chunk_size(self: &Blob): u64 {
97+
self.chunk_size
98+
}
99+
94100
public fun encoded_size(self: &Blob, n_shards: u16): u64 {
95101
encoding::encoded_blob_length(
96102
self.size,
97103
self.encoding_type,
98104
n_shards,
105+
self.chunk_size,
99106
)
100107
}
101108

@@ -140,12 +147,14 @@ public fun derive_blob_id(root_hash: u256, encoding_type: u8, size: u64): u256 {
140147
/// Creates a new blob in `registered_epoch`.
141148
/// `size` is the size of the unencoded blob. The reserved space in `storage` must be at
142149
/// least the size of the encoded blob.
150+
/// For RS2_CHUNKED encoding, `chunk_size` specifies the chunk size. For RS2, pass 0.
143151
public(package) fun new(
144152
storage: Storage,
145153
blob_id: u256,
146154
root_hash: u256,
147155
size: u64,
148156
encoding_type: u8,
157+
chunk_size: u64,
149158
deletable: bool,
150159
registered_epoch: u32,
151160
n_shards: u16,
@@ -162,6 +171,7 @@ public(package) fun new(
162171
size,
163172
encoding_type,
164173
n_shards,
174+
chunk_size,
165175
);
166176
assert!(encoded_size <= storage.size(), EResourceSize);
167177

@@ -189,6 +199,7 @@ public(package) fun new(
189199
certified_epoch: option::none(),
190200
storage,
191201
deletable,
202+
chunk_size,
192203
}
193204
}
194205

contracts/walrus/sources/system/encoding.move

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ use walrus::redstuff;
88
// Supported Encoding Types
99
// RedStuff with Reed-Solomon
1010
const RS2: u8 = 1;
11+
// RedStuff with Reed-Solomon supporting chunked encoding
12+
const RS2_CHUNKED: u8 = 2;
1113

1214
// Error codes
1315
// Error types in `walrus-sui/types/move_errors.rs` are auto-generated from the Move error codes.
@@ -16,8 +18,15 @@ const EInvalidEncodingType: u64 = 0;
1618

1719
/// Computes the encoded length of a blob given its unencoded length, encoding type
1820
/// and number of shards `n_shards`.
19-
public fun encoded_blob_length(unencoded_length: u64, encoding_type: u8, n_shards: u16): u64 {
20-
// Currently only supports the two RedStuff variants.
21-
assert!(encoding_type == RS2, EInvalidEncodingType);
22-
redstuff::encoded_blob_length(unencoded_length, n_shards)
21+
/// For RS2_CHUNKED, `chunk_size` specifies the size of each chunk. For RS2, it's ignored.
22+
public fun encoded_blob_length(
23+
unencoded_length: u64,
24+
encoding_type: u8,
25+
n_shards: u16,
26+
chunk_size: u64,
27+
): u64 {
28+
// Both RS2 and RS2_CHUNKED use RedStuff Reed-Solomon encoding.
29+
// RS2_CHUNKED adds additional metadata for chunk-level hashes.
30+
assert!(encoding_type == RS2 || encoding_type == RS2_CHUNKED, EInvalidEncodingType);
31+
redstuff::encoded_blob_length(unencoded_length, n_shards, encoding_type, chunk_size)
2332
}

0 commit comments

Comments
 (0)