Skip to content

[newchem-cpp] Introduce FrozenKeyIdxBiMap#451

Merged
brittonsmith merged 22 commits intograckle-project:newchem-cppfrom
mabruzzo:ncc/FrozenKeyIdxBiMap
Jan 22, 2026
Merged

[newchem-cpp] Introduce FrozenKeyIdxBiMap#451
brittonsmith merged 22 commits intograckle-project:newchem-cppfrom
mabruzzo:ncc/FrozenKeyIdxBiMap

Conversation

@mabruzzo
Copy link
Collaborator

@mabruzzo mabruzzo commented Nov 19, 2025

To be reviewed after #471 and #469 (these PRs are relatively small and I got tired of fixing the same merge conflict over and over again)


This introduces a data structure called FrozenStringIdxBiMap. This is a bidirectional map (aka a bidirectional dictionary), that can be used to map between a unique set of n strings (keys) and a unique set of indexes (with values of 0 through n-1) and vice-versa.

Note

The implementation of the data structure in this PR is overly simplistic (it has O(N) complexity rather than O(1) complexity). This was intentional to make the PR easier to review.

PR #484 is a followup that replaces the implementation (without modifying the interface) to use an actual hash table

I intend to use this as a building block to implement other types (namely types with a map-like interface) and initialization code. There are some obvious applications related to the dynamic API. There are a couple of times where I have wanted this kind of data structure and instead I wrote crude workarounds

The underlying implementation is based on a hash table. To implement this, I ported a class (from C++ to C) that I previously implemented in Enzo-E to perform the same function (called StringIndRdOnlyMap). In addition to porting the logic from C++ to C1, I also added logic to handle a few edge cases. For context, I originally wrote the Enzo-E class so that to make use of more specialized logic, which should make this faster that a solution that makes use of more generic types like std::map from the C++ standard library.

Footnotes

  1. With the benefit of hindsight, the effort to convert from C++ to C took way more time than I expected and it probably wasn’t worth it. In fact, it may make more sense

There is some unforunate consequences, but if we don't do this you end
up with some pretty confusing looking code... (the confusion primarily
arises if new_FrozenKeyIdxBiMap or drop_FrozenKeyIdxBiMap has different
behavior from other functions with similar names). I really think it
would be better to make this into a simple C++ class with a constructor
and destructor, but thats a topic for another time
@mabruzzo mabruzzo changed the base branch from main to newchem-cpp November 19, 2025 15:59
@mabruzzo
Copy link
Collaborator Author

mabruzzo commented Jan 5, 2026

@brittonsmith This is currently failing because of a bug that has nothing to do with this PR and that is fixed in #481 (the bug pertains to a sphinx extension version update)

I also updated this PR to make it a little more clear how exactly the type gets used. Essentially I added a new test case at the top test_unit_FrozenKeyIdxBiMap.cpp that provides a bunch of commentary. Hopefully this makes the intent a little more clear

mabruzzo and others added 9 commits January 6, 2026 22:27
Both changes essentially affected 2 constants, which prior to this
commit were known as `keylen_max` & `invalid_val`.

The 2 contained constants are really just implementation details that
users of the inferface for `FrozenKeyIdxBiMap` shouldn't need to know
anything about
Copy link
Contributor

@brittonsmith brittonsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and was easy to follow. I'm glad you started with the simple implementation.

@brittonsmith brittonsmith merged commit c2b34f4 into grackle-project:newchem-cpp Jan 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants