MystenLabs
diff --git a/‎README.md‎
Lines changed: 119 additions & 0 deletions b/‎README.md‎
Lines changed: 119 additions & 0 deletions
diff --git a/‎contracts/subsidies/sources/subsidies.move‎
Lines changed: 2 additions & 0 deletions b/‎contracts/subsidies/sources/subsidies.move‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎contracts/subsidies/tests/subsidies_tests.move‎
Lines changed: 1 addition & 0 deletions b/‎contracts/subsidies/tests/subsidies_tests.move‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎contracts/walrus/sources/system.move‎
Lines changed: 5 additions & 0 deletions b/‎contracts/walrus/sources/system.move‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎contracts/walrus/sources/system/blob.move‎
Lines changed: 11 additions & 0 deletions b/‎contracts/walrus/sources/system/blob.move‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎contracts/walrus/sources/system/encoding.move‎
Lines changed: 13 additions & 4 deletions b/‎contracts/walrus/sources/system/encoding.move‎
Lines changed: 13 additions & 4 deletions
@@ -144,3 +144,122 @@ yourself with our [contributing workflow](./CONTRIBUTING.md).
 
 This project is licensed under the Apache License, Version 2.0 ([LICENSE](LICENSE) or
 <https://www.apache.org/licenses/LICENSE-2.0>).
+
+## RS2Chunked Blob Structure
+
+For large blobs that exceed memory limits, Walrus uses a chunked encoding scheme (RS2Chunked) with a two-level Merkle tree structure:
+
+```mermaid
+graph TD
+    subgraph "Blob Level"
+        BlobID["Blob ID<br/>(Root of blob-level Merkle tree)"]
+        BlobMerkle["Blob-level Merkle Tree"]
+        BlobID --> BlobMerkle
+    end
+
+    subgraph "Sliver Pair Level (Blob)"
+        BlobMerkle --> SPM0["Sliver Pair 0<br/>Metadata"]
+        BlobMerkle --> SPM1["Sliver Pair 1<br/>Metadata"]
+        BlobMerkle --> SPMDots["..."]
+        BlobMerkle --> SPMN["Sliver Pair N<br/>Metadata"]
+    end
+
+    subgraph "Each Sliver Pair Metadata"
+        SPM0 --> SPM0Root["Merkle Root over<br/>Chunk Hashes"]
+    end
+
+    subgraph "Chunk Level (for Sliver Pair 0)"
+        SPM0Root --> C0H["Chunk 0 Hash"]
+        SPM0Root --> C1H["Chunk 1 Hash"]
+        SPM0Root --> CDots["..."]
+        SPM0Root --> CMH["Chunk M Hash"]
+    end
+
+    subgraph "Chunk 0 Structure"
+        C0H --> C0Primary["Primary Sliver<br/>(Merkle Root)"]
+        C0H --> C0Secondary["Secondary Sliver<br/>(Merkle Root)"]
+    end
+
+    subgraph "Storage Indexing"
+        Storage["Storage Node Indexing"]
+        Storage --> Key1["(blob_id, chunk_0, sliver_pair_0)"]
+        Storage --> Key2["(blob_id, chunk_0, sliver_pair_1)"]
+        Storage --> Key3["(blob_id, chunk_1, sliver_pair_0)"]
+        Storage --> KeyDots["..."]
+    end
+
+    style BlobID fill:#e1f5ff
+    style BlobMerkle fill:#ffe1e1
+    style SPM0Root fill:#fff4e1
+    style Storage fill:#e1ffe1
+```
+
+### Smart Defaults - Automatic Chunk Size Selection
+
+The automatic chunk size selection behavior is based on these key parameters:
+
+1. When Chunking Kicks In
+
+Chunking is automatically used when:
+blob_size > max_blob_size_for_n_shards(n_shards, encoding_type)
+
+Where:
+- max_blob_size_for_n_shards = source_symbols_per_blob × max_symbol_size
+- max_symbol_size = 65,534 bytes (u16::MAX - 1) for RS2 encoding
+- source_symbols_per_blob = n_primary × n_secondary (depends on shard count)
+
+Example for 1000 shards:
+- Primary source symbols: 334
+- Secondary source symbols: 667
+- Total source symbols: 334 × 667 = 222,778
+- Max single-chunk size: 222,778 × 65,534 = ~13.9 GB
+
+So for a typical network with 1000 shards, chunking automatically kicks in for blobs larger than
+~13.9 GB.
+
+2. Default Chunk Size
+
+When chunking is needed, the system uses:
+pub const DEFAULT_CHUNK_SIZE: u64 = 10 * 1024 * 1024; // 10 MB
+
+This was chosen based on several factors documented in the code:
+- Memory efficiency: 10 MB chunks keep memory usage reasonable during encoding/decoding
+- Metadata overhead: At 10 MB per chunk with 1000 shards, metadata is only 0.64% overhead (64 KB
+  metadata per 10 MB chunk)
+- Streaming performance: Smaller chunks enable faster initial data delivery
+- Storage granularity: Reasonable balance between network round-trips and overhead
+
+3. Constraints
+
+The system enforces:
+- Minimum chunk size: 10 MB (prevents excessive metadata overhead)
+- Maximum chunks per blob: 1000 (bounds total metadata size to ~64 MB)
+
+4. Practical Examples
+
+Small blob (< 13.9 GB with 1000 shards):
+walrus store --epochs 5 small_file.bin  # 1 GB file
+  → Uses standard RS2 encoding (single chunk)
+  → No chunking needed
+
+Large blob (> 13.9 GB with 1000 shards):
+walrus store --epochs 5 large_file.bin  # 50 GB file
+  → Automatically uses RS2Chunked encoding
+  → Chunk size: 10 MB (DEFAULT_CHUNK_SIZE)
+  → Number of chunks: 5120 (50 GB / 10 MB)
+
+Manual override:
+walrus store --epochs 5 --chunk-size 20971520 large_file.bin  # 50 GB with 20 MB chunks
+  → Forces RS2Chunked encoding
+  → Chunk size: 20 MB (user specified)
+  → Number of chunks: 2560 (50 GB / 20 MB)
+  → Useful for systems with more memory available
+
+5. Why Manual Override is Useful
+
+- Memory-constrained environments: Use smaller chunks (e.g., 5 MB) to reduce peak memory usage
+- Performance tuning: Larger chunks (e.g., 20-50 MB) may improve throughput when memory is abundant
+- Testing: Validate chunking behavior with smaller test files by forcing chunked mode
+
+The smart defaults ensure that most users never need to think about chunking—it "just works" when
+blobs exceed single-chunk limits, while still giving advanced users control when needed.
@@ -330,6 +330,7 @@ public fun register_blob(
     root_hash: u256,
     size: u64,
     encoding_type: u8,
+    chunk_size: u64,
     deletable: bool,
     write_payment: &mut Coin<WAL>,
     ctx: &mut TxContext,
@@ -341,6 +342,7 @@ public fun register_blob(
         root_hash,
         size,
         encoding_type,
+        chunk_size,
         deletable,
         write_payment,
         ctx,
 
@@ -636,6 +636,7 @@ fun register_default_blob(
         ROOT_HASH,
         UNENCODED_SIZE,
         RS2,
+        0, // chunk_size: 0 for RS2 encoding (non-chunked)
         deletable,
         &mut fake_coin,
         ctx,
 
@@ -81,6 +81,7 @@ public fun certify_event_blob(
     root_hash: u256,
     size: u64,
     encoding_type: u8,
+    chunk_size: u64,
     ending_checkpoint_sequence_num: u64,
     epoch: u32,
     ctx: &mut TxContext,
@@ -93,6 +94,7 @@ public fun certify_event_blob(
             root_hash,
             size,
             encoding_type,
+            chunk_size,
             ending_checkpoint_sequence_num,
             epoch,
             ctx,
@@ -129,13 +131,15 @@ public fun reserve_space_for_epochs(
 /// Registers a new blob in the system.
 /// `size` is the size of the unencoded blob. The reserved space in `storage` must be at
 /// least the size of the encoded blob.
+/// For RS2_CHUNKED encoding, `chunk_size` specifies the chunk size. For RS2, pass 0.
 public fun register_blob(
     self: &mut System,
     storage: Storage,
     blob_id: u256,
     root_hash: u256,
     size: u64,
     encoding_type: u8,
+    chunk_size: u64,
     deletable: bool,
     write_payment: &mut Coin<WAL>,
     ctx: &mut TxContext,
@@ -148,6 +152,7 @@ public fun register_blob(
             root_hash,
             size,
             encoding_type,
+            chunk_size,
             deletable,
             write_payment,
             ctx,
 
@@ -55,6 +55,8 @@ public struct Blob has key, store {
     storage: Storage,
     // Marks if this blob can be deleted.
     deletable: bool,
+    // Chunk size for RS2_CHUNKED encoding. For RS2, this is 0.
+    chunk_size: u64,
 }
 
 // === Accessors ===
@@ -91,11 +93,16 @@ public fun is_deletable(self: &Blob): bool {
     self.deletable
 }
 
+public fun chunk_size(self: &Blob): u64 {
+    self.chunk_size
+}
+
 public fun encoded_size(self: &Blob, n_shards: u16): u64 {
     encoding::encoded_blob_length(
         self.size,
         self.encoding_type,
         n_shards,
+        self.chunk_size,
     )
 }
 
@@ -140,12 +147,14 @@ public fun derive_blob_id(root_hash: u256, encoding_type: u8, size: u64): u256 {
 /// Creates a new blob in `registered_epoch`.
 /// `size` is the size of the unencoded blob. The reserved space in `storage` must be at
 /// least the size of the encoded blob.
+/// For RS2_CHUNKED encoding, `chunk_size` specifies the chunk size. For RS2, pass 0.
 public(package) fun new(
     storage: Storage,
     blob_id: u256,
     root_hash: u256,
     size: u64,
     encoding_type: u8,
+    chunk_size: u64,
     deletable: bool,
     registered_epoch: u32,
     n_shards: u16,
@@ -162,6 +171,7 @@ public(package) fun new(
         size,
         encoding_type,
         n_shards,
+        chunk_size,
     );
     assert!(encoded_size <= storage.size(), EResourceSize);
 
@@ -189,6 +199,7 @@ public(package) fun new(
         certified_epoch: option::none(),
         storage,
         deletable,
+        chunk_size,
     }
 }
 
 
@@ -8,6 +8,8 @@ use walrus::redstuff;
 // Supported Encoding Types
 // RedStuff with Reed-Solomon
 const RS2: u8 = 1;
+// RedStuff with Reed-Solomon supporting chunked encoding
+const RS2_CHUNKED: u8 = 2;
 
 // Error codes
 // Error types in `walrus-sui/types/move_errors.rs` are auto-generated from the Move error codes.
@@ -16,8 +18,15 @@ const EInvalidEncodingType: u64 = 0;
 
 /// Computes the encoded length of a blob given its unencoded length, encoding type
 /// and number of shards `n_shards`.
-public fun encoded_blob_length(unencoded_length: u64, encoding_type: u8, n_shards: u16): u64 {
-    // Currently only supports the two RedStuff variants.
-    assert!(encoding_type == RS2, EInvalidEncodingType);
-    redstuff::encoded_blob_length(unencoded_length, n_shards)
+/// For RS2_CHUNKED, `chunk_size` specifies the size of each chunk. For RS2, it's ignored.
+public fun encoded_blob_length(
+    unencoded_length: u64,
+    encoding_type: u8,
+    n_shards: u16,
+    chunk_size: u64,
+): u64 {
+    // Both RS2 and RS2_CHUNKED use RedStuff Reed-Solomon encoding.
+    // RS2_CHUNKED adds additional metadata for chunk-level hashes.
+    assert!(encoding_type == RS2 || encoding_type == RS2_CHUNKED, EInvalidEncodingType);
+    redstuff::encoded_blob_length(unencoded_length, n_shards, encoding_type, chunk_size)
 }