Skip to content

Data set build slowdown #33

@dead-claudia

Description

@dead-claudia

In https://github.com/HostByBelle/ip-db-test-data#data-processing you say this:

Unfortunately, this final step is proving to be quite slow due to it's time complexity which reduces the data size we can easily build. If you have ideas on how to optimize this, please share!

Have you considered using interval trees? https://en.wikipedia.org/wiki/Interval_tree That data structure is special-made for this kind of use case. I will caution that Portion's IntervalDict does not implement an optimized data structure. It uses a sorted dict, but without leveraging the very thing sorted dicts could provide.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions