In https://github.com/HostByBelle/ip-db-test-data#data-processing you say this:
Unfortunately, this final step is proving to be quite slow due to it's time complexity which reduces the data size we can easily build. If you have ideas on how to optimize this, please share!
Have you considered using interval trees? https://en.wikipedia.org/wiki/Interval_tree That data structure is special-made for this kind of use case. I will caution that Portion's IntervalDict does not implement an optimized data structure. It uses a sorted dict, but without leveraging the very thing sorted dicts could provide.
In https://github.com/HostByBelle/ip-db-test-data#data-processing you say this:
Have you considered using interval trees? https://en.wikipedia.org/wiki/Interval_tree That data structure is special-made for this kind of use case. I will caution that Portion's
IntervalDictdoes not implement an optimized data structure. It uses a sorted dict, but without leveraging the very thing sorted dicts could provide.