Skip to content

Reported load times of some systems are impossibly low #678

@lukasvogel

Description

@lukasvogel

Some systems report load times below the theoretical minimum.

For example, gp2 provides ~250MB/s throughput.
With the smallest data size anyone reports being ~10 GB, any load time below 40 seconds should be phyiscally impossible (if we assume the input file to be cached which is fair imho for the big instances like c6a.metal).

Yet, several systems report far lower load times. For example, ClickHouse on c6a.metal reports just 7 seconds.
We had a quick look at the docs and it seems that ClickHouse doesn't perform fsyncs by default, which means the file system actually spends about two minutes syncing data after the load but before the first query is executed.

So while the benchmark reports "7 seconds", the actual import process isn't complete at that point. It seems misleading to report that ClickHouse (or any other of those systems) loads the data in 7 seconds under those conditions.

By contrast, most of the other systems include the sync to disk in the load time before acknowledging the COMMIT to ensure crash consistency and durability, putting them at a clear disadvantage in this metric.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions