Skip to content

[HPC] Proposal: Offer long, guaranteed benchmark stability #509

@nvaprodromou

Description

@nvaprodromou

Introduction:

After collecting feedback from engineers, clients, and press, NVIDIA presented a list of proposals that aim to improve the popularity of the MLPerf HPC benchmark suite. Please see our slide deck for more information on our feedback gathering process and insights.

Proposal: Offer long, guaranteed benchmark stability (guarantees submission longevity)

Slide 13 in proposals slide deck.

We propose to offer guaranteed benchmark stability for some agreed-upon duration:

  • First appearance of benchmark labeled "beta": A beta benchmark may or may not change by the next submission round.
  • Second submission round drops the beta status and freezes the benchmark for X years. We guarantee that code and dataset will not be modified at all during the guaranteed lifespan.
  • Since benchmark is guaranteed stable, this proposal allows carrying results from prior rounds.

This proposal aims to improve the popularity of the MLPerf HPC benchmark suite by improving on the following aspects:

  1. High submission overhead and cost [Affects participation and competition]
  2. Enables prioritizing of MLPerf-HPC for new systems [Affects participation and competition]

Discussion

Pros:

  1. Carrying prior results reduces overhead for participation
  2. Guaranteed benchmark lifespan reduces effective submission overhead/cost since one submission will be valid for the next X years
  3. Improves prioritizing MLPerf HPC for new systems, if it is designed to be more stable than MLPerf Training

Cons:

  1. Bugs in model code and dataset will not be corrected if they are identified after beta status drops

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions