Skip to content

Conversation

sluongng
Copy link
Collaborator

@sluongng sluongng commented Jul 8, 2025

Based on #282

Introduce ChunkingFunction which enum is a set of known chunking
algorithms that the server can recommend to the client.

Provide FastCDC_2020 as the first explicit chunking algorithm.

The server advertises these through a new chunking_configuration field in
CacheCapabilities message. There, the server may set the chunking
functions that it supports as well as the relevant configuration
parameters for that chunking algorithm.


I recommend reading https://joshleeb.com/posts/fastcdc.html to understand more about the available FastCDC configuration parameters.

@sluongng sluongng force-pushed the sluongng/chunking-algo branch from 456e902 to b25c8e4 Compare July 8, 2025 14:33
@sluongng sluongng force-pushed the sluongng/chunking-algo branch from b25c8e4 to 27b0d6c Compare July 15, 2025 13:38
@sluongng
Copy link
Collaborator Author

@mostynb I think most of your comments are meant for #282, which is what this PR is based on. Since #282 is merged, I have rebased this PR on top of the latest changes.

I would recommend creating a separate PR with the suggestions above.

@sluongng sluongng force-pushed the sluongng/chunking-algo branch from 27b0d6c to 39bbe03 Compare July 15, 2025 13:55
Introduce ChunkingFunction which enum is a set of known chunking
algorithms that the server can recommend to the client.

Provide FastCDC_2020 as the first explicit chunking algorithm.

The server advertise these through a new chunking_configuration field in
CacheCapabilities message. There, the server may set the chunking
functions that it supports as well as the relevant configuration
parameters for that chunking algorithm.
Comment on lines +1980 to +1984
// The chunking function that the client prefers to use.
//
// The server MAY use a different chunking function. The client MUST check
// the chunking function used in the response.
ChunkingFunction.Value chunking_function = 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is this field intended to be mandatory or optional (with the latter giving the server complete leeway in choosing a function)? If optional, can we document it?
  2. When the field is present, should it be required to match one of the functions declared in the server capabilities?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it defaults to UNKNOWN if unset, because this is proto3. (But perhaps it's still worth spelling it out - up to you.)

Comment on lines +2036 to +2037
// The chunking function that the client used to split the blob.
ChunkingFunction.Value chunking_function = 5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Isn't the result of a splice completely independent from the function originally used to do the splitting?

It also imposes a requirement that the chunks must have necessarily originated from a split operation, which seems counter to the spirit of the original proposal (that split and splice are independent optimizations).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants