-
Notifications
You must be signed in to change notification settings - Fork 91
Description
The SNB Interactive benchmark is currently limited to:
- Data sets up to SF1000
- Append-only workloads without deletions
These could be amended by backporting the improvements made for the BI workload.
Larger data sets
Scaling the Interactive workload SF3000 is not trivial: the Hadoop-based Datagen breaks for SF1000+ data sets (with an NPE) and the old parameter generator has scalability issues (it's a single-threaded Python2 script – for SF1000, it already requires about a day to finish). Therefore, we should use the new Spark-based generator. However, this creates at least three development tasks:
- The existing Cypher and SQL solutions need to be updated to work with the new schemas produced by the Spark-based Datagen.
- The Interactive parameter generator has to be ported (effectively reimplemented) in Spark/SparkSQL (Factor generation for Interactive ldbc_snb_datagen_spark#219).
- The inserts generated by the new data generator (e.g.
inserts/dynamic/Person/part-*.csv) use a different format than the update streams produced by the old generator. To work around this, we would need to either adjust the driver or implement an "insert file to update stream converter". (The latter seems simpler and mostly doable in SQL.)
Introducing deletions
Deletions would be a realistic addition to an OLTP benchmark. The generator is capable of producing them, so it's only a matter of integrating them into the workload. The key challenges here are (1) figuring out the format -- maybe the deletes/dynamic/Person/part-*.csv files work well, maybe an updateStream-like delete stream would work better, (2) integrating them into the driver, (3) tuning their ratio, (4) determining how they should be reported in the benchmark results (e.g. a delete can be counted simply as another operation, contributing one operation to the throughput).
Timeline
These are plans for the mid-term future (late 2021 or early 2022), depending on the interest in such a benchmark.