-
Notifications
You must be signed in to change notification settings - Fork 128
Chunking Algorithms #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Chunking Algorithms #336
Conversation
456e902
to
b25c8e4
Compare
b25c8e4
to
27b0d6c
Compare
27b0d6c
to
39bbe03
Compare
Introduce ChunkingFunction which enum is a set of known chunking algorithms that the server can recommend to the client. Provide FastCDC_2020 as the first explicit chunking algorithm. The server advertise these through a new chunking_configuration field in CacheCapabilities message. There, the server may set the chunking functions that it supports as well as the relevant configuration parameters for that chunking algorithm.
39bbe03
to
19f1152
Compare
// The chunking function that the client prefers to use. | ||
// | ||
// The server MAY use a different chunking function. The client MUST check | ||
// the chunking function used in the response. | ||
ChunkingFunction.Value chunking_function = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is this field intended to be mandatory or optional (with the latter giving the server complete leeway in choosing a function)? If optional, can we document it?
- When the field is present, should it be required to match one of the functions declared in the server capabilities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it defaults to UNKNOWN if unset, because this is proto3. (But perhaps it's still worth spelling it out - up to you.)
// The chunking function that the client used to split the blob. | ||
ChunkingFunction.Value chunking_function = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? Isn't the result of a splice completely independent from the function originally used to do the splitting?
It also imposes a requirement that the chunks must have necessarily originated from a split operation, which seems counter to the spirit of the original proposal (that split and splice are independent optimizations).
Based on #282
Introduce ChunkingFunction which enum is a set of known chunking
algorithms that the server can recommend to the client.
Provide FastCDC_2020 as the first explicit chunking algorithm.
The server advertises these through a new chunking_configuration field in
CacheCapabilities message. There, the server may set the chunking
functions that it supports as well as the relevant configuration
parameters for that chunking algorithm.
I recommend reading https://joshleeb.com/posts/fastcdc.html to understand more about the available FastCDC configuration parameters.