Skip to content

Commit a4e6d3a

Browse files
authored
Fix GitHub math [no ci]
1 parent 75fdc24 commit a4e6d3a

File tree

1 file changed

+10
-8
lines changed

1 file changed

+10
-8
lines changed

docs/DOCUMENTATION.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -415,20 +415,20 @@ Instead of independent samples from a search space, submitters can also provide
415415
Within each study, we select the fastest trial that reaches the validation target. The median of the three per-study best times is the submission's official _per-workload score_. These $8$ _per-workload runtimes_ are used in the scoring procedure (see the [**Scoring submissions**](#scoring) section). Trials that do not reach the target within `max_runtime` receive $\infty$, (which participates in the median).
416416
Submissions may also perform on-the-clock self-tuning during timed training.
417417

418-
> [!IMPORTANT] Summary
418+
> [!IMPORTANT]
419419
>
420-
> - **Trial**: One training run, with a fixed hyperparameter configuration until the target or `max_runtime` was reached. The first time the validation target is reached in a trial is denoted $\tilde{t}_{ij}$ (a miss scores $\tilde{t}_{ij} = \infty$).
420+
> - **Trial**: One training run, with a fixed hyperparameter configuration until the target or `max_runtime` was reached. The first time the validation target is reached in a trial is denoted $t_{i,j}$ (a miss scores $\infty$).
421421
> - **Study**: A set of $5$ trials, each run with distinct hyperparameter points. The studies are independent and capture variance. The study's score is the **fastest** (minimum) time among its trials.
422-
> - **Per-Workload Runtime**: The per-workload runtime is given by the median across the per-study scores, i.e., $t_w \;=\; \operatorname{median}_{j=1..3}\Big(\min_{i=1..5} \; \tilde{t}_{ij}\Big)$, with $\tilde{t}_{ij}$ the score of trial $i$ in study $j$, i.e.
423-
> $$\tilde{t}_{ij} \;=\;\begin{cases}\text{elapsed seconds to reach target}, & \text{if reached within } \texttt{max\_runtime} \\ \infty, & \text{otherwise} \end{cases}\,.$$
422+
> - **Per-Workload Runtime**: The per-workload runtime of a submission is given by the median across the per-study scores, i.e., $t_{s,w} = median_{j=1..3} \left( \min_{i=1..5} (t_{i,j}) \right)$, with $t_{i,j}$ the score of trial $i$ in study $j$, i.e.
423+
424424

425425
#### Self-Tuning Ruleset
426426

427427
Submissions under this ruleset are not allowed to expose user-defined hyperparameters.
428-
Instead, submissions can either apply one "default" hyperparameter configuration for all workloads (e.g. Adam with default settings), or perform inner-loop tuning during their training run (e.g. SGD with line searches).
428+
Instead, submissions can either apply one "default" hyperparameter configuration for all workloads (e.g., Adam with default settings), or perform inner-loop tuning during their training run (e.g., SGD with line searches).
429429
All workload adaptations, e.g. inner-loop tuning, will be part of the submission's score.
430430

431-
For each workload, a submission will run for **$3$ independent studies**, and the _per-workload score_ is the median time to reach the validation target, i.e., $t_{s,w} = \operatorname{median}_{j=1..3} \tilde{t}_j$.
431+
For each workload, a submission will run for **$3$ independent studies**, and the _per-workload score_ is the median time to reach the validation target, i.e., $t_{s,w} = median_{j=1..3} \left(t_{j}\right)$.
432432
To account for the lack of external tuning, submissions have a longer time budget to reach the target performance.
433433
Compared to the [**external tuning ruleset**](#external-tuning-ruleset), the `max_runtime` is $1.5\times$ longer (i.e. multiply the `max_runtimes` from the [**workload overview table**](#workloads) by $1.5$).
434434
As in the [**external tuning ruleset**](#external-tuning-ruleset), any run that fails to achieve the target within this window is assigned an infinite runtime.
@@ -477,7 +477,7 @@ To further reduce computational costs, the [**external tuning ruleset**](#extern
477477
### Scoring
478478

479479
Submissions are scored based on the training time needed to reach the target performance on each workload's validation set.
480-
The target metric may match the loss function or use another workload-specific metric such as error rate or BLEU score.
480+
The target metric may match the loss function or use another workload-specific metric, such as error rate or BLEU score.
481481
See the [**workload overview table**](#workloads) for the targets and metrics of each workload and the [**Defining target performance**](#defining-target-performance-and-max_runtime) section for how they were determined.
482482
The overall ranking is then determined by the scalar _AlgoPerf Benchmark Score_, which summarizes the _per-workload_ runtimes across all [**workloads**](#workloads), using integrated [**performance profiles**](#algoperf-benchmark-score-via-integrated-performance-profiles), as explained below.
483483

@@ -509,7 +509,9 @@ This performance ratio $r_{s,w}$ expresses the "time spent by submission $s$ on
509509

510510
Next, we compute how often a submission is within a factor $\tau \in [1,\infty)$ of the optimal submission. For this, we determine the following function for every submission $\bar{s}$:
511511

512-
$$\rho_{\bar{s}}(\tau) = \frac{1}{n} \!\cdot\mkern-28mu \underbrace{\left|\left\{w: \, r_{\bar{s},w}\leq \tau\right\}\right|}_{= \text{number of workloads with}\, r_{\bar{s},w}\leq \tau}$$
512+
$$
513+
\rho_{\bar{s}}(\tau) = \frac{1}{n} \cdot \left| \\{ w \text{ such that } r_{\bar{s},w}\leq \tau \\}\right| = \frac{1}{n} \cdot \left[\text{number of workloads where}\, r_{\bar{s},w}\leq \tau\right]
514+
$$
513515

514516
In other words, we compute the fraction of workloads where a submission $\bar{s}$ is less than $\tau$ away from the optimal submission. The function $\rho_{\bar{s}}(\tau)$ is monotonically increasing with $\tau$ and bounded between $0$ and $1$.
515517

0 commit comments

Comments
 (0)