Skip to content

Automatically determine parquet workload record_count #50

@daverigby

Description

@daverigby

For parquet file workloads we can determine the number of records from the actual parquet files, it is not necessary to manually specify it as we currently do:

class Mnist(MnistBase):
    def __init__(self, name: str, cache_dir: str):
        super().__init__(name, "mnist", cache_dir=cache_dir)

    @property
    def record_count(self) -> int:
        return 60000

Similarly for any workload which limits the count such as *-test variants, we also know the limit:

class MnistTest(MnistBase):
    """Reduced, "test" variant of mnist; with 1% of the full dataset (600
    passages and 20 queries)."""

    def __init__(self, name: str, cache_dir: str):
        super().__init__(name, "mnist", cache_dir=cache_dir, limit=600, query_limit=20)

    @property
    def record_count(self) -> int:
        return 600

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions