You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/DOCUMENTATION.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -415,20 +415,20 @@ Instead of independent samples from a search space, submitters can also provide
415
415
Within each study, we select the fastest trial that reaches the validation target. The median of the three per-study best times is the submission's official _per-workload score_. These $8$ _per-workload runtimes_ are used in the scoring procedure (see the [**Scoring submissions**](#scoring) section). Trials that do not reach the target within `max_runtime` receive $\infty$, (which participates in the median).
416
416
Submissions may also perform on-the-clock self-tuning during timed training.
417
417
418
-
> [!IMPORTANT] Summary
418
+
> [!IMPORTANT]
419
419
>
420
-
>-**Trial**: One training run, with a fixed hyperparameter configuration until the target or`max_runtime` was reached. The first time the validation target is reached in a trial is denoted $\tilde{t}_{ij}$ (a miss scores $\tilde{t}_{ij} = \infty$).
420
+
>-**Trial**: One training run, with a fixed hyperparameter configuration until the target or`max_runtime` was reached. The first time the validation target is reached in a trial is denoted $t_{i,j}$ (a miss scores $\infty$).
421
421
>-**Study**: A set of $5$ trials, each run with distinct hyperparameter points. The studies are independent and capture variance. The study's score is the **fastest** (minimum) time among its trials.
422
-
>-**Per-Workload Runtime**: The per-workload runtime is given by the median across the per-study scores, i.e., $t_w \;=\; \operatorname{median}_{j=1..3}\Big(\min_{i=1..5} \; \tilde{t}_{ij}\Big)$, with $\tilde{t}_{ij}$ the score of trial $i$ in study $j$, i.e.
423
-
>$$\tilde{t}_{ij} \;=\;\begin{cases}\text{elapsed seconds to reach target}, & \text{if reached within } \texttt{max\_runtime} \\ \infty, & \text{otherwise} \end{cases}\,.$$
422
+
>-**Per-Workload Runtime**: The per-workload runtime of a submission is given by the median across the per-study scores, i.e., $t_{s,w} = median_{j=1..3} \left( \min_{i=1..5} (t_{i,j}) \right)$, with $t_{i,j}$ the score of trial $i$ in study $j$, i.e.
423
+
424
424
425
425
#### Self-Tuning Ruleset
426
426
427
427
Submissions under this ruleset are not allowed to expose user-defined hyperparameters.
428
-
Instead, submissions can either apply one "default" hyperparameter configuration forall workloads (e.g. Adam with default settings), or perform inner-loop tuning during their training run (e.g. SGDwith line searches).
428
+
Instead, submissions can either apply one "default" hyperparameter configuration forall workloads (e.g., Adam with default settings), or perform inner-loop tuning during their training run (e.g.,SGDwith line searches).
429
429
All workload adaptations, e.g. inner-loop tuning, will be part of the submission's score.
430
430
431
-
For each workload, a submission will run for**$3$ independent studies**, and the _per-workload score_ is the median time to reach the validation target, i.e., $t_{s,w} =\operatorname{median}_{j=1..3} \tilde{t}_j$.
431
+
For each workload, a submission will run for**$3$ independent studies**, and the _per-workload score_ is the median time to reach the validation target, i.e., $t_{s,w} = median_{j=1..3} \left(t_{j}\right)$.
432
432
To account for the lack of external tuning, submissions have a longer time budget to reach the target performance.
433
433
Compared to the [**external tuning ruleset**](#external-tuning-ruleset), the `max_runtime` is $1.5\times$ longer (i.e. multiply the `max_runtimes` from the [**workload overview table**](#workloads) by $1.5$).
434
434
As in the [**external tuning ruleset**](#external-tuning-ruleset), any run that fails to achieve the target within this window is assigned an infinite runtime.
@@ -477,7 +477,7 @@ To further reduce computational costs, the [**external tuning ruleset**](#extern
477
477
### Scoring
478
478
479
479
Submissions are scored based on the training time needed to reach the target performance on each workload's validation set.
480
-
The target metric may match the loss function or use another workload-specific metric such as error rate orBLEU score.
480
+
The target metric may match the loss function or use another workload-specific metric, such as error rate orBLEU score.
481
481
See the [**workload overview table**](#workloads) for the targets and metrics of each workload and the [**Defining target performance**](#defining-target-performance-and-max_runtime) section for how they were determined.
482
482
The overall ranking is then determined by the scalar _AlgoPerf Benchmark Score_, which summarizes the _per-workload_ runtimes across all [**workloads**](#workloads), using integrated [**performance profiles**](#algoperf-benchmark-score-via-integrated-performance-profiles), as explained below.
483
483
@@ -509,7 +509,9 @@ This performance ratio $r_{s,w}$ expresses the "time spent by submission $s$ on
509
509
510
510
Next, we compute how often a submission is within a factor $\tau \in [1,\infty)$ of the optimal submission. For this, we determine the following function for every submission $\bar{s}$:
\rho_{\bar{s}}(\tau) = \frac{1}{n} \cdot \left| \\{ w \text{ such that } r_{\bar{s},w}\leq \tau \\}\right| = \frac{1}{n} \cdot \left[\text{number of workloads where}\, r_{\bar{s},w}\leq \tau\right]
514
+
$$
513
515
514
516
In other words, we compute the fraction of workloads where a submission $\bar{s}$ is less than $\tau$ away from the optimal submission. The function $\rho_{\bar{s}}(\tau)$ is monotonically increasing with $\tau$ and bounded between $0$ and $1$.
0 commit comments