Skip to content

Add min/max statistics to the metastore for pruning #5989

@rdettai-sk

Description

@rdettai-sk

Is your feature request related to a problem? Please describe.
The current pruning mechanisms are time, tag and partitioning. We often run into queries that end up having to target every split in the index because we can leverage neither of these. A typical example is when we have two time dimensions, event time and ingestion time. Only one can be used as time in QW, but we might want to query on the second one.

Describe the solution you'd like
I would like to add a configuration (e.g stats_fields) in the doc mapping similar to tag_fields.

  • it would be only compatible with numerical fields
  • when packaging the split, we would compute min and max and add it to the metadata
  • we would store the min and max in the metastore by either
    • adding a field to split_metadata_json. This is simple to setup, we could probably still push down the pruning to the metastore but it would be quite expensive.
    • using an encoding similar to tags, with something like a Vec where min and max values would be encoded
    • use a JSON type

Describe alternatives you've considered
We can come up with workarounds using tags. For instance for the problem of the secondary time dimension, we could record the ingestion day as tag. But its less flexible, harder to setup for users, and more costly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions