-
Couldn't load subscription status.
- Fork 69
Description
Introduction:
After collecting feedback from engineers, clients, and press, NVIDIA presented a list of proposals that aim to improve the popularity of the MLPerf HPC benchmark suite. Please see our slide deck for more information on our feedback gathering process and insights.
Proposal: Exclude data movement from timing (start clock after data retrieval, before caching. Same as MLPerf-T).
Slide 14 in proposals slide deck.
This proposal aims to improve the popularity of the MLPerf HPC benchmark suite by improving on the following aspects:
- High submission overhead and cost [Affects participation and competition]
- Isolates benchmarking of compute from FS [Improves RFP interest]
- Simplifies "Throughput" benchmark (renamed from "weak scaling") [Affects participation and competition]
- Benchmark renaming proposal: [HPC] Proposal: Update terminology for clarity #511
- Throughput extrapolation proposal: [HPC] Proposal: Allow throughput extrapolation to large system size #508
Note: We strongly believe that the filesystem is extremely important part and we always advice potential clients to consider the interplay of all parts of a system (FS + compute + network). However, we received a strong signal from some clients that it makes it harder to use the MLPerf-HPC scores for apples-to-apples comparisons, as FS and compute are sometimes not purchased at the same time.
Discussion
Pros:
- By far the most common feedback we received was the unreasonably high submission overhead for MLPerf HPC (overhead, cost, engineering resources, system time)
- Submitter no longer needs to optimize data movement
Cons:
- Reduces the quality of the benchmark since it no longer considers the system as a whole.
- This will make it an “upper bound” because of storage not being timed, but MLPerf-T has the same issue.