-
Notifications
You must be signed in to change notification settings - Fork 306
Description
So first of all I think your current design of focusing on Blake3 and its tree hashing for verified streaming of large objects is very good. Unquestionably on technical grounds, this is the way forward.
However, I also think that there is a lot of existing git-content-addressed data out there, and that content-addressing works best and most simply when the same content-addressing format works end-to-end. I am pretty convinced that the best way for IPFS-family stuff to get adoption is to work with this data and its current addressing scheme.
Concretely, any "linear" hash function we can instead also think of as a tree hash, just one that uses really shitty unbalanced binary trees. So the same techniques by which Blake3 hashing's intermediate steps can be looked at as a Merkle DAG, SHA-1's can too.
(For some background, I have worked on https://www.softwareheritage.org/2022/02/10/building-bridge-to-the-software-heritage-archive/ https://github.com/ipfs/devgrants/blob/master/open-grants/open-proposal-nix-ipfs.md. The former is completely done, the latter was also completely but only more recently is getting upstreamed, see https://github.com/NixOS/rfcs/blob/master/rfcs/0133-git-hashing.md and NixOS/nix#8919. The stumbling blocks have always been (1) pipeline latency with vanilla bitswap, the (2) MTU inducing a max object size. (2) is the more fundamental issue. I have brought up protocol/beyond-bitswap#30 what I am proposing here before, but lack the ability to make it happen on my own. I have also co-mentored the GSOC project for https://github.com/theupdateframework/tap19-ipfs-poc)
I get what I am asking for might sound like "hi I see you support IPv6, can you also please support IPv4", but I maintain it is not that bad because SHA-1 cannot zombie onward in perpetuity they way IPv6 can. And likewise, I am not asking for SHA-256, but because SHA-256 being much healthier than SHA-1 does have that "zombie onward" potential.
If you are willing to do this, as a token of my gratitude I would gladly do what I can to help convince Nix, Software Heritage, The Update Framework, and even Git to support Blake3 hashing for content-addressing source code. Again, I totally believe that proper balanced tree hashing is the right way forward on technical grounds. I just think people need to see how nice end-to-end content-addressing is in order to overcome all the technical debt to get us there.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status