@@ -144,3 +144,122 @@ yourself with our [contributing workflow](./CONTRIBUTING.md).
144144
145145This project is licensed under the Apache License, Version 2.0 ([ LICENSE] ( LICENSE ) or
146146< https://www.apache.org/licenses/LICENSE-2.0 > ).
147+
148+ ## RS2Chunked Blob Structure
149+
150+ For large blobs that exceed memory limits, Walrus uses a chunked encoding scheme (RS2Chunked) with a two-level Merkle tree structure:
151+
152+ ``` mermaid
153+ graph TD
154+ subgraph "Blob Level"
155+ BlobID["Blob ID<br/>(Root of blob-level Merkle tree)"]
156+ BlobMerkle["Blob-level Merkle Tree"]
157+ BlobID --> BlobMerkle
158+ end
159+
160+ subgraph "Sliver Pair Level (Blob)"
161+ BlobMerkle --> SPM0["Sliver Pair 0<br/>Metadata"]
162+ BlobMerkle --> SPM1["Sliver Pair 1<br/>Metadata"]
163+ BlobMerkle --> SPMDots["..."]
164+ BlobMerkle --> SPMN["Sliver Pair N<br/>Metadata"]
165+ end
166+
167+ subgraph "Each Sliver Pair Metadata"
168+ SPM0 --> SPM0Root["Merkle Root over<br/>Chunk Hashes"]
169+ end
170+
171+ subgraph "Chunk Level (for Sliver Pair 0)"
172+ SPM0Root --> C0H["Chunk 0 Hash"]
173+ SPM0Root --> C1H["Chunk 1 Hash"]
174+ SPM0Root --> CDots["..."]
175+ SPM0Root --> CMH["Chunk M Hash"]
176+ end
177+
178+ subgraph "Chunk 0 Structure"
179+ C0H --> C0Primary["Primary Sliver<br/>(Merkle Root)"]
180+ C0H --> C0Secondary["Secondary Sliver<br/>(Merkle Root)"]
181+ end
182+
183+ subgraph "Storage Indexing"
184+ Storage["Storage Node Indexing"]
185+ Storage --> Key1["(blob_id, chunk_0, sliver_pair_0)"]
186+ Storage --> Key2["(blob_id, chunk_0, sliver_pair_1)"]
187+ Storage --> Key3["(blob_id, chunk_1, sliver_pair_0)"]
188+ Storage --> KeyDots["..."]
189+ end
190+
191+ style BlobID fill:#e1f5ff
192+ style BlobMerkle fill:#ffe1e1
193+ style SPM0Root fill:#fff4e1
194+ style Storage fill:#e1ffe1
195+ ```
196+
197+ ### Smart Defaults - Automatic Chunk Size Selection
198+
199+ The automatic chunk size selection behavior is based on these key parameters:
200+
201+ 1 . When Chunking Kicks In
202+
203+ Chunking is automatically used when:
204+ blob_size > max_blob_size_for_n_shards(n_shards, encoding_type)
205+
206+ Where:
207+ - max_blob_size_for_n_shards = source_symbols_per_blob × max_symbol_size
208+ - max_symbol_size = 65,534 bytes (u16::MAX - 1) for RS2 encoding
209+ - source_symbols_per_blob = n_primary × n_secondary (depends on shard count)
210+
211+ Example for 1000 shards:
212+ - Primary source symbols: 334
213+ - Secondary source symbols: 667
214+ - Total source symbols: 334 × 667 = 222,778
215+ - Max single-chunk size: 222,778 × 65,534 = ~ 13.9 GB
216+
217+ So for a typical network with 1000 shards, chunking automatically kicks in for blobs larger than
218+ ~ 13.9 GB.
219+
220+ 2 . Default Chunk Size
221+
222+ When chunking is needed, the system uses:
223+ pub const DEFAULT_CHUNK_SIZE: u64 = 10 * 1024 * 1024; // 10 MB
224+
225+ This was chosen based on several factors documented in the code:
226+ - Memory efficiency: 10 MB chunks keep memory usage reasonable during encoding/decoding
227+ - Metadata overhead: At 10 MB per chunk with 1000 shards, metadata is only 0.64% overhead (64 KB
228+ metadata per 10 MB chunk)
229+ - Streaming performance: Smaller chunks enable faster initial data delivery
230+ - Storage granularity: Reasonable balance between network round-trips and overhead
231+
232+ 3 . Constraints
233+
234+ The system enforces:
235+ - Minimum chunk size: 10 MB (prevents excessive metadata overhead)
236+ - Maximum chunks per blob: 1000 (bounds total metadata size to ~ 64 MB)
237+
238+ 4 . Practical Examples
239+
240+ Small blob (< 13.9 GB with 1000 shards):
241+ walrus store --epochs 5 small_file.bin # 1 GB file
242+ → Uses standard RS2 encoding (single chunk)
243+ → No chunking needed
244+
245+ Large blob (> 13.9 GB with 1000 shards):
246+ walrus store --epochs 5 large_file.bin # 50 GB file
247+ → Automatically uses RS2Chunked encoding
248+ → Chunk size: 10 MB (DEFAULT_CHUNK_SIZE)
249+ → Number of chunks: 5120 (50 GB / 10 MB)
250+
251+ Manual override:
252+ walrus store --epochs 5 --chunk-size 20971520 large_file.bin # 50 GB with 20 MB chunks
253+ → Forces RS2Chunked encoding
254+ → Chunk size: 20 MB (user specified)
255+ → Number of chunks: 2560 (50 GB / 20 MB)
256+ → Useful for systems with more memory available
257+
258+ 5 . Why Manual Override is Useful
259+
260+ - Memory-constrained environments: Use smaller chunks (e.g., 5 MB) to reduce peak memory usage
261+ - Performance tuning: Larger chunks (e.g., 20-50 MB) may improve throughput when memory is abundant
262+ - Testing: Validate chunking behavior with smaller test files by forcing chunked mode
263+
264+ The smart defaults ensure that most users never need to think about chunking—it "just works" when
265+ blobs exceed single-chunk limits, while still giving advanced users control when needed.
0 commit comments