[newchem-cpp] Introduce FrozenKeyIdxBiMap#451
Merged
brittonsmith merged 22 commits intograckle-project:newchem-cppfrom Jan 22, 2026
Merged
[newchem-cpp] Introduce FrozenKeyIdxBiMap#451brittonsmith merged 22 commits intograckle-project:newchem-cppfrom
FrozenKeyIdxBiMap#451brittonsmith merged 22 commits intograckle-project:newchem-cppfrom
Conversation
There is some unforunate consequences, but if we don't do this you end up with some pretty confusing looking code... (the confusion primarily arises if new_FrozenKeyIdxBiMap or drop_FrozenKeyIdxBiMap has different behavior from other functions with similar names). I really think it would be better to make this into a simple C++ class with a constructor and destructor, but thats a topic for another time
Collaborator
Author
|
@brittonsmith This is currently failing because of a bug that has nothing to do with this PR and that is fixed in #481 (the bug pertains to a sphinx extension version update) I also updated this PR to make it a little more clear how exactly the type gets used. Essentially I added a new test case at the top |
Both changes essentially affected 2 constants, which prior to this commit were known as `keylen_max` & `invalid_val`. The 2 contained constants are really just implementation details that users of the inferface for `FrozenKeyIdxBiMap` shouldn't need to know anything about
…mirror source-code location
brittonsmith
approved these changes
Jan 22, 2026
Contributor
brittonsmith
left a comment
There was a problem hiding this comment.
This looks good and was easy to follow. I'm glad you started with the simple implementation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
To be reviewed after #471 and #469 (these PRs are relatively small and I got tired of fixing the same merge conflict over and over again)
This introduces a data structure called
FrozenStringIdxBiMap. This is a bidirectional map (aka a bidirectional dictionary), that can be used to map between a unique set ofnstrings (keys) and a unique set of indexes (with values of0throughn-1) and vice-versa.Note
The implementation of the data structure in this PR is overly simplistic (it has
O(N)complexity rather thanO(1)complexity). This was intentional to make the PR easier to review.PR #484 is a followup that replaces the implementation (without modifying the interface) to use an actual hash table
I intend to use this as a building block to implement other types (namely types with a map-like interface) and initialization code. There are some obvious applications related to the dynamic API. There are a couple of times where I have wanted this kind of data structure and instead I wrote crude workarounds
The underlying implementation is based on a hash table. To implement this, I ported a class (from C++ to C) that I previously implemented in Enzo-E to perform the same function (called StringIndRdOnlyMap). In addition to porting the logic from C++ to C1, I also added logic to handle a few edge cases. For context, I originally wrote the Enzo-E class so that to make use of more specialized logic, which should make this faster that a solution that makes use of more generic types like
std::mapfrom the C++ standard library.Footnotes
With the benefit of hindsight, the effort to convert from C++ to C took way more time than I expected and it probably wasn’t worth it. In fact, it may make more sense ↩