Skip to content

Conversation

@BenWestgate
Copy link

@BenWestgate BenWestgate commented Aug 19, 2025

Closes #67
WIP

Deterministically generates codex32 share sets from entropy using the BIP85-DRNG instead of physical dice rolls.

I still need to add support to the cli, add BIP93 test vectors and keep the BIP85 pull request concordant with this implementation.

The BIP85 pull request diff (also draft) can be viewed here:
bitcoin/bips@master...BenWestgate:bips:codex32

@BenWestgate BenWestgate marked this pull request as draft August 19, 2025 19:59
Unsure to find these, haven't looked, may have to make them from the BIP93 text.
Checks the derived entropy, codex32 strings, hrp, threshold, identifier, bytes length, share count match the vector and that the set is valid.
@BenWestgate BenWestgate closed this Sep 2, 2025
@akarve
Copy link
Owner

akarve commented Sep 2, 2025

@BenWestgate howdy i didn't see this until now. contributions are welcome. mentioning this because it's now closed and i was not sure if you closed because it wasn't reviewed? in any case, that was not a silent no.

@BenWestgate
Copy link
Author

BenWestgate commented Sep 2, 2025

@akarve I closed this because I ended up forking ethankosakovsky/bip85@master...BenWestgate:bip85:master to create a reference implementation of #67 so merging this is no longer necessary. It should save you a few minutes updating bipsea by reopening this if it appears likely bip85 application 93' (codex32) will merge to bips/master

@akarve
Copy link
Owner

akarve commented Sep 2, 2025

@BenWestgate copy. idk if ethan will ever review your PR tho as we were unable to get a hold of him for recent BIP-85 changes. in any case i'm open to a new app (still grokking details though). did you look at and/or consider the dice application? it can generate passwords of any length over any character set (e.g. bech32)?

@BenWestgate
Copy link
Author

BenWestgate commented Sep 3, 2025

@akarve I don't expect ethan to ever review ethankosakovsky/bip85@master...BenWestgate:bip85:master, in the BIP-0085 text PR I linked to benwestgate/bip85 as the v1.4.0 reference.

Yes, I did consider using the dice application. However that could have lead to some differences between implementations. Also we need derivation path indexing to make sure the DRNG is seeded with a unique entropy when the identifier, threshold, n, bitlength or hrp parameters change. Using dice application would mean having to store which index' indices were already used, rather feeding the entire header and payload length to the derivation path. This would be a huge security flaw if you generated two child backups that were the same unintentionally.

And also a codex32 secret is analogous to bip39 mnemonic. Same objective: so if both are BIPs they both belong in BIP85 it seems.

@BenWestgate
Copy link
Author

BenWestgate commented Sep 10, 2025

@akarve Thank you for offering to help! I just saw your ML feedback.

It's unavoidable codex32 needs the most parameters of BIP85 applications. So what would help me most is you adapt the cli to accept more parameters and write the tests for my codex32 library and then I'll finish bipsea's bip93 application implementation making the it as easy to read as possible, with a simplified derivation path, add the application's tests and test vectors and then update the BIP-0085.mediawiki document.

Regarding path simplification:

we can eliminate t, n, byte_length and the 4 identifier indices and have implementations sum them into one value.

I'm partial to a derivation path where {hrp}'/{t*5 + n + byte_length}'/{index}' and identifier is computed from the index, index=0 gives qqqq, index=1 gives qqqp, etc this prevents reusing the identifier for different sets of shares.

We can drop the complication of allowing individual identifier characters to be defaulted, so once the index hits 1048576 and beyond, the whole thing can default to the fingerprint.

@akarve
Copy link
Owner

akarve commented Sep 13, 2025

@BenWestgate

So what would help me most is you adapt the cli to accept more parameters

easiest thing to do is probably take raw paths (arbitrary length and structure) with a new escape hatch cli switch. it'll be a while before i can get to this; lots to do at day job :/

Regarding path simplification:

no need to retrofit the spec to the CLI. i just want to make sure that we have exactly as many segments as needed and no more? if it's already at a minimum, yolo.

i had some comments on byte draws and unpacking as well, wdyt?

Drawing and truncating a single byte per character is certainly clear to understand but I keep wondering > if implementers shouldn't draw ((chars * 5) // 8) bytes in one shot? These bytes aren't expensive or
anything but it's less iterations, less total reads, etc. If my suggestion becomes hard to read or write
then feel free to keep as is.

@BenWestgate
Copy link
Author

BenWestgate commented Sep 14, 2025

just want to make sure that we have exactly as many segments as needed and no more? if it's already at a minimum, yolo.

The only segment that could be dropped is the identifier:

  1. It SHOULD be unique across different seeds like the bip85 {index} and we could encode the bip85 index to 4 bech32 characters and use it as the identifier. Then default to a fingerprint as identifier after index 32^4.
  2. Doesn't require 4 paths to encode 20 bits, a single path can do.

My other consolidation was to combine threshold, n and byte_length into a single path segment, by concatenating them as decimals. I don't see the downside of this versus separate path depths for each.

Give me your thoughts on encoding the bip85 index into the codex32 identifier.

It's 4 characters intended to disambiguate different backups. You want them to all have unique identifiers, so an identifier in the derivation path seems to enable risky behavior (different backups with the same identifier) when the bip85 index is incremented.

My original idea dropped the index for this reason but that breaks bip85 conventions.

i had some comments on byte draws and unpacking as well, wdyt?

I prefer your suggestion to draw threshold * byte_length bytes in one shot, because then our initial t byte strings can be padded to a multiple of 5 bits with a CRC for more error detection.

@akarve
Copy link
Owner

akarve commented Oct 6, 2025

@BenWestgate howdy i didn't forget about this and hoping to get to it later this quarter

@BenWestgate
Copy link
Author

@akarve I haven't forgotten about it either. I will adapt your feedback into a bipsea pull request this month.

@BenWestgate
Copy link
Author

I got a start on it, added a codex32 application

ben@zenbook15:~/Documents/GitHub/bipsea$ poetry run bipsea codex32 --help
Usage: bipsea codex32 [OPTIONS]

  Generate a BIP-93 codex32 backup from `secrets.randbits`.

Options:
  -h, --hrp TEXT                  Codex32 human-readable prefix.
  -l, --length INTEGER RANGE      Number of secret bytes.  [16<=x<=64]
  -t, --threshold INTEGER RANGE   Number of shares required to reconstruct the
                                  secret.  [0<=x<=9]
  -n, --num-shares INTEGER RANGE  Total number of shares to generate.
                                  [1<=x<=31]
  -i, --identifier TEXT           Optional identifier to include in each
                                  share.
  -p, --indices TEXT              String of unique characters to use as share
                                  indices.
  --pretty / --not-pretty         Print a number before, and a newline after,
                                  each codex32 share.
  --help                          Show this message and exit.
image

While this grabs from /dev/urandom and not bip85 entropy it's directly applying the byte_length bytes rather than drawing characters. Most of it will be reused when I do the bip85 app.

@akarve
Copy link
Owner

akarve commented Oct 6, 2025

one thing to consider especially if it makes your life easier: what about just implementing:

bipsea apply -a arbitrary -p "what/ever/derivation/path/the/user/wants"

i mention this because 1-2 other new applications are in PR now and i plan to implement the same (but you are welcome to tackle it) because it basically makes all future apps just work. and then we don't need to even touch the reference app for most cases.

i have not thought carefully about other params and switches per application so there might be more to plan here. we could possibly soft warn if it's not a recognized application code.

@akarve
Copy link
Owner

akarve commented Oct 6, 2025

there might be more to plan here

maybe we need a python protocol or something and then new implementations just have their own path handlers under arbitrary? and they could even register proper apps if they want to. just thinking out loud, do whatever is best for codex32.

@BenWestgate
Copy link
Author

BenWestgate commented Oct 7, 2025

one thing to consider especially if it makes your life easier: what about just implementing:

bipsea apply -a arbitrary -p "what/ever/derivation/path/the/user/wants"
...and then we don't need to even touch the reference app for most cases.

That sounds nice for bipsea maintenance but the point of applications is they standardize how the derived is encoded.

The derivation itself is not useful, any BIP32 library will provide that.

@BenWestgate
Copy link
Author

BenWestgate commented Oct 7, 2025

do whatever is best for codex32.

I published a codex32 project on PyPI so Bipsea can import it for the proposed bip85 application 93'.

recover recovers a codex32 secret from a list of strings, hopefully. should also support space and common separation.  optional --target to derive a share other than the secret.  Probably should warn somewhere to be sure the target index is unused in this backup.

--codex32 flag for xprv accepts a validated set of codex32 strings or Pipe from bipsea recover and
@BenWestgate
Copy link
Author

BenWestgate commented Oct 13, 2025

See if these changes to the README.md are amenable to you before I implement them all.

BenWestgate@965f020

If you want the scope limited then I'll move them to codex32[cli] however they're analogous to the mnemonic, validate and xprv commands for BIP39.

You can skim the diff below to see what functionality is new.

My python-codex32 library has full passing tests from bip93.

The sooner you let me know functionality to leave out of bipsea, the better.
https://github.com/BenWestgate/bipsea/pull/2/files

@akarve
Copy link
Owner

akarve commented Oct 14, 2025

haven't looked at this yet what i'm hoping to do is create some kind of python protocol for apps and you just register callbacks. it might be easier if yours is just a PyPI package that bipsea depends on but I haven't thought about that yet. the protocol would then call into your module.

more importantly did you see these comments on your PR? that's a more fundamental bridge for us to cross first.

@@ -0,0 +1,358 @@
#!/bin/python3
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this for sure should be a module dependency. idk if the authors have it on PyPI but you could perhaps host it as part of a codex module?

Copy link
Author

@BenWestgate BenWestgate Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it on PyPI this week! And a much cleaner library at that with full BIP-93 tests.

https://pypi.org/project/codex32/

Do you want me to put the codex32[cli] stuff in there so we can drop commands like:
bipsea codex32
bipsea recover

and just PR in the bipsea derive -a codex32 needed for the BIP-0085 application? And some tests for that new bip85 application?

I'll try to make it composable so with both libraries installed users can do things like:

bipsea derive -a codex32 -t3 | codex32 recover to print a codex32 secret from a derived codex32 backup

@BenWestgate
Copy link
Author

BenWestgate commented Oct 14, 2025

what i'm hoping to do is create some kind of python protocol for apps and you just register callbacks.

That sounds a great way to do this.

more importantly did you see these comments on your PR? that's a more fundamental bridge for us to cross first.

Yes, he convinced me to drop some parameters.

The bip85 app should just produce the initial k (threshold) shares, those are the only ones that need deterministic entropy.

We only need: hrp in the first derivation level, the second can be length (bytes) like other applications, and the final can be bip85 index like other applications.

Index should affect the encoded identifier IF we allow users to provide their own, otherwise the default bip32 fingerprint should be unique for each seed and vice versa.

num_shares is dropped and so is share_idx, we don't need those to get the minimum data that defines a set (the initial threshold of strings).

Comment on lines +100 to +127
elif app == APPLICATIONS["codex32"]:
header, n_bytes = indexes[:2]
hrp, data = bech32_decode(header)
if hrp not in INDEX_TO_HRP:
raise ValueError(f"Unsupported human-readable prefix: {hrp}.")
k = int(CHARSET[data[0]])
if k == 1:
raise ValueError(
f"Threshold '{k}' is not an allowed value (2 through 9, or 0)."
)
ident = header[1:5]
byte_length = int(n_bytes.rstrip("'"))
if not 16 <= byte_length <= 64:
raise ValueError(
f"Byte length '{byte_length}' is not an allowed value (16 through 64)."
)
drng = DRNG(entropy)
alphabetized_charset = "sacdefghjk" # threshold above 9 is invalid
shares = []
for share_idx in alphabetized_charset[bool(k) : k + 1]:
shares += Codex32String.from_seed(
drng.read(byte_length), ident, hrp, k, share_idx
).s

return {
"entropy": entropy,
"application": " ".join(shares),
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this new proposed path and codex32 derivation address both yours and scgbckbone feedback:

Fewer derivation levels: {header}, {n_bytes}, {index}
Fewer parameters: {share_idx} and {num_shares} are dropped

Indices output are deterministic based on k. No derived shares, just the initial k

{header} is the first 8 characters of a codex32 string, it would be some serialization of {hrp}|{threshold}{identifier} and fits if converted from bech32 to an int.

I also want to feed the bip85 app {index} into the ident as it should be unique for different seeds

@BenWestgate
Copy link
Author

more importantly did you see these comments on your PR? that's a more fundamental bridge for us to cross first.

@akarve I gave his excellent feedback a reply. If you also like the proposal I can implement it in bipsea in the coming week or so.

I'm now thinking: each bip85 index = each seed. Regardless of the threshold and identifier. Threshold only matters when extracting from the DRNG, so share payloads are unique at different thresholds.

Identifier we either default to a bip32 fingerprint and let users relabel the strings to change it or if we want "resharing" functionality (multiple sets, same threshold) then it would feed into the DRNG seed so different identifiers produced different share payloads (but still recover the same secret.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Add Codex32 (BIP93) as a BIP85 Application

2 participants