-
Notifications
You must be signed in to change notification settings - Fork 243
Description
Some systems report load times below the theoretical minimum.
For example, gp2 provides ~250MB/s throughput.
With the smallest data size anyone reports being ~10 GB, any load time below 40 seconds should be phyiscally impossible (if we assume the input file to be cached which is fair imho for the big instances like c6a.metal).
Yet, several systems report far lower load times. For example, ClickHouse on c6a.metal reports just 7 seconds.
We had a quick look at the docs and it seems that ClickHouse doesn't perform fsyncs by default, which means the file system actually spends about two minutes syncing data after the load but before the first query is executed.
So while the benchmark reports "7 seconds", the actual import process isn't complete at that point. It seems misleading to report that ClickHouse (or any other of those systems) loads the data in 7 seconds under those conditions.
By contrast, most of the other systems include the sync to disk in the load time before acknowledging the COMMIT to ensure crash consistency and durability, putting them at a clear disadvantage in this metric.