Standalone score compute #433

pgmpablo157321 · 2025-08-28T14:44:02Z

github-actions · 2025-08-28T14:44:15Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

ShriyaRishab · 2025-09-04T14:57:10Z

I tested it locally on a few results

With scaling.json -

$ python3 -m mlperf_logging.result_summarizer.compute_score --benchmark llama31_405b  --system tyche_ngpu512_ngc25.04_nemo --benchmark_folder /training_results_v5.0/NVIDIA/results/tyche_ngpu512_ngc25.04_nemo/llama31_405b --usage training --ruleset 5.0.0 --scale
NOTICE: Applying scaling factor 1.1538461538461537 to dir /training_results_v5.0/NVIDIA/results/tyche_ngpu512_ngc25.04_nemo/llama31_405b
MLPerf training
Folder: /training_results_v5.0/NVIDIA/results/tyche_ngpu512_ngc25.04_nemo/llama31_405b
Version: 5.0.0
System: tyche_ngpu512_ngc25.04_nemo
Benchmark: llama31_405b
Score - Time to Train (minutes): 121.7573269230769

Without --scale but scaling.json file still exists from previous run in the folder -

$ python3 -m mlperf_logging.result_summarizer.compute_score --benchmark llama31_405b  --system tyche_ngpu512_ngc25.04_nemo --benchmark_folder /training_results_v5.0/NVIDIA/results/tyche_ngpu512_ngc25.04_nemo/llama31_405b --usage training --ruleset 5.0.0
NOTICE: Applying scaling factor 1.1538461538461537 to dir /training_results_v5.0/NVIDIA/results/tyche_ngpu512_ngc25.04_nemo/llama31_405b
MLPerf training
Folder: /training_results_v5.0/NVIDIA/results/tyche_ngpu512_ngc25.04_nemo/llama31_405b
Version: 5.0.0
System: tyche_ngpu512_ngc25.04_nemo
Benchmark: llama31_405b
Score - Time to Train (minutes): 121.7573269230769

After manually deleting scaling.json -

$ python3 -m mlperf_logging.result_summarizer.compute_score --benchmark llama31_405b  --system tyche_ngpu512_ngc25.04_nemo --benchmark_folder /training_results_v5.0/NVIDIA/results/tyche_ngpu512_ngc25.04_nemo/llama31_405b --usage training --ruleset 5.0.0
ruleset 5.0.0
MLPerf training
Folder: /training_results_v5.0/NVIDIA/results/tyche_ngpu512_ngc25.04_nemo/llama31_405b
Version: 5.0.0
System: tyche_ngpu512_ngc25.04_nemo
Benchmark: llama31_405b
Score - Time to Train (minutes): 105.52301666666666

But if I don't manually delete scaling.json and run it without the --scale flag, it still does automatic scaling because there is a preexisting scaling.json file in the folder. @pgmpablo157321 - is this expected behavior and should we add some information in the README about how to deal with the scaling.json files in the folder?

ShriyaRishab · 2025-09-04T15:03:16Z

Testing power scores -

With --has_power

$ python3 -m mlperf_logging.result_summarizer.compute_score --benchmark llama2_70b_lora  --system xyz --benchmark_folder /training_results_v5.0/Lenovo/results/SR780aV3-8xB200_SXM_180GB/llama2_70b_lora --usage training --ruleset 5.0.0 --has_power
NOTICE: Applying scaling factor 1.0188034188034187 to dir /training_results_v5.0/Lenovo/results/SR780aV3-8xB200_SXM_180GB/llama2_70b_lora
MLPerf training
Folder: /training_results_v5.0/Lenovo/results/SR780aV3-8xB200_SXM_180GB/llama2_70b_lora
Version: 5.0.0
System: xyz
Benchmark: llama2_70b_lora
Score - Time to Train (minutes): 11.324490299145298
Power Score - Energy (kJ): 6114237.986822284

Without --has-power -

$ python3 -m mlperf_logging.result_summarizer.compute_score --benchmark llama2_70b_lora  --system xyz --benchmark_folder /training_results_v5.0/Lenovo/results/SR780aV3-8xB200_SXM_180GB/llama2_70b_lora --usage training --ruleset 5.0.0
NOTICE: Applying scaling factor 1.0188034188034187 to dir /training_results_v5.0/Lenovo/results/SR780aV3-8xB200_SXM_180GB/llama2_70b_lora
MLPerf training
Folder: /training_results_v5.0/Lenovo/results/SR780aV3-8xB200_SXM_180GB/llama2_70b_lora
Version: 5.0.0
System: xyz
Benchmark: llama2_70b_lora
Score - Time to Train (minutes): 11.324490299145298

After deleting scaling.json and with --has_power -

$ python3 -m mlperf_logging.result_summarizer.compute_score --benchmark llama2_70b_lora  --system xyz --benchmark_folder training_results_v5.0/Lenovo/results/SR780aV3-8xB200_SXM_180GB/llama2_70b_lora --usage training --ruleset 5.0.0 --has_power
MLPerf training
Folder: training_results_v5.0/Lenovo/results/SR780aV3-8xB200_SXM_180GB/llama2_70b_lora
Version: 5.0.0
System: xyz
Benchmark: llama2_70b_lora
Score - Time to Train (minutes): 11.11548125
Power Score - Energy (kJ): 6001391.312568853

ShriyaRishab · 2025-09-04T15:09:58Z

Few more issues that need to be dealt with -

Trying to compute scores or just 1 or 2 files returns None although it would help to just print out the individual scores of each of the files in the folder -

$ ls /temp_results
result_0.txt  result_1.txt
$ python3 -m mlperf_logging.result_summarizer.compute_score --benchmark llama31_405b  --system xyz --benchmark_folder /temp_results --usage training --ruleset 5.0.0
MLPerf training
Folder: /temp_results
Version: 5.0.0
System: xyz
Benchmark: llama31_405b
Score - Time to Train (minutes): None

Changing file names to be anything other than result_*.txt does not compute scores although this is expected.

$ ls /temp_results
0.txt  1.txt  2.txt
$ python3 -m mlperf_logging.result_summarizer.compute_score --benchmark llama31_405b  --system xyz --benchmark_folder /temp_results --usage training --ruleset 5.0.0
MLPerf training
Folder: /temp_results
Version: 5.0.0
System: xyz
Benchmark: llama31_405b
Score - Time to Train (minutes): None

ShriyaRishab · 2025-09-04T16:37:41Z

@pgmpablo157321 TODO items as discussed in the training WG

Always delete scaling.json file so that the scores are computed without scaling unless --scale is passed in which case, scaling.json is created and scores are printed after scaling.
When m<N log files are present, print score per file and also add a NOTICE stating that N logs are needed but only m are provided

Additional piece for (2) would be to also print the samples to converge along with the scores for each log file so submitters get a sense of their convergence as well. Is that also something we can add?

mlperf_logging/result_summarizer/compute_score/__main__.py

matthew-frank · 2025-09-04T17:08:39Z

mlperf_logging/result_summarizer/compute_score/__main__.py

+        "--has_power", action="store_true", help="Compute power score as well"
+    )
+    parser.add_argument(
+        "--benchmark_folder",


I'd recommend taking a list of files rather than a folder name. then the user could specify the list of files as folder/result*.txt to get all the result.txt files in a folder, but could also specify a single file, and could specify log files and directories that are named differently than result*.txt, like foo/bar/baz/*.log

Unfortunately, this requires significant changes in the RCP checker, particularly changing the check_directory function and it's interactions:

logging/mlperf_logging/rcp_checker/rcp_checker.py

Line 475 in 497b7c1

def check_directory(dir, usage, version, verbose, bert_train_samples, rcp_file=None, rcp_pass='full_rcp', rcp_bypass=False, set_scaling=False):

Given the time to the next submission, I recommend that we postpone this change

mlperf_logging/result_summarizer/compute_score/README.md

pgmpablo157321 · 2025-09-09T00:10:46Z

Following changes were added:

Scaling factor gets reseted when computing the score. Recalculated in case --scale is passed
Per file score/results are included in the output
Olympic scoring is skipped if there are less results than needed for submission, a warning is raised when this happens.
Benchmark argument is no longer required, inferred from the result files

pgmpablo157321 · 2025-09-09T00:16:00Z

Sample run 1:

python -m mlperf_logging.result_summarizer.compute_score --system TEST \
    --benchmark_folder training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat \
    --usage training --ruleset 5.0.0 --scale

Output:

NOTICE: Applying scaling factor 1.0014814814814814 to dir training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat
INFO -------------------------------------------------------
MLPerf training
Folder: training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat
Version: 5.0.0
System: TEST
Benchmark: rgat
-------------------------------------------------------------
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_9.txt: 4.88135
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_8.txt: 5.164983333333334
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_6.txt: 5.131083333333333
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_7.txt: 5.11245
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_5.txt: 5.101683333333334
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_4.txt: 5.379166666666667
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_0.txt: 4.59125
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_1.txt: 5.06715
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_3.txt: 5.142033333333334
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_2.txt: 4.86545
Final score - Time to Train (minutes): 5.065766654320987

Sample run 2 (manually deleting result_3.txt):

python -m mlperf_logging.result_summarizer.compute_score --system TEST \
    --benchmark_folder training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat \
    --usage training --ruleset 5.0.0 --scale

Output:

WARNING: Not enough runs found for an official submission. Found: 9, required: 10
NOTICE: Applying scaling factor 1.0033927056827818 to dir training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat
INFO -------------------------------------------------------
MLPerf training
Folder: training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat
Version: 5.0.0
System: TEST
Benchmark: rgat
-------------------------------------------------------------
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_9.txt: 4.88135
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_8.txt: 5.164983333333334
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_6.txt: 5.131083333333333
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_7.txt: 5.11245
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_5.txt: 5.101683333333334
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_4.txt: 5.379166666666667
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_0.txt: 4.59125
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_1.txt: 5.06715
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_dgl/rgat/result_2.txt: 4.86545
WARNING: Olympic scoring skipped
Final score - Time to Train (minutes): 5.049804200043979

…argument

ShriyaRishab

Looks great, thanks!

pgmpablo157321 · 2025-09-09T21:01:01Z

Also added logging the sample count:

NOTICE: Applying scaling factor 1.0511463844797178 to dir training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora
INFO -------------------------------------------------------
MLPerf training
Folder: training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora
Version: 5.0.0
System: TEST
Benchmark: llama2_70b_lora
-------------------------------------------------------------
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_9.txt: 10.89865. Samples to converge: 3072
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_8.txt: 9.451066666666668. Samples to converge: 2688
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_6.txt: 10.892983333333333. Samples to converge: 3072
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_7.txt: 10.90115. Samples to converge: 3072
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_5.txt: 10.900933333333333. Samples to converge: 3072
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_4.txt: 9.4506. Samples to converge: 2688
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_0.txt: 10.90205. Samples to converge: 3072
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_1.txt: 10.899316666666667. Samples to converge: 3072
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_3.txt: 10.894383333333334. Samples to converge: 3072
Score - Time to Train (minutes) for training_results_v5.0/GigaComputing/results/G893-SD1_pytorch/llama2_70b_lora/result_2.txt: 10.89935. Samples to converge: 3072
Final score - Time to Train (minutes): 11.265376690182244

pgmpablo157321 force-pushed the standalone_score_compute branch 2 times, most recently from f5cffb2 to b917d71 Compare August 29, 2025 19:29

pgmpablo157321 marked this pull request as ready for review August 29, 2025 19:29

pgmpablo157321 requested review from a team as code owners August 29, 2025 19:29

ShriyaRishab previously approved these changes Sep 4, 2025

View reviewed changes

matthew-frank suggested changes Sep 4, 2025

View reviewed changes

mlperf_logging/result_summarizer/compute_score/__main__.py Outdated Show resolved Hide resolved

matthew-frank reviewed Sep 4, 2025

View reviewed changes

mlperf_logging/result_summarizer/compute_score/README.md Outdated Show resolved Hide resolved

pgmpablo157321 dismissed ShriyaRishab’s stale review via a47029e September 8, 2025 23:57

pgmpablo157321 force-pushed the standalone_score_compute branch from a47029e to 32e2eec Compare September 9, 2025 00:16

pgmpablo157321 added 5 commits September 8, 2025 19:35

Standalone functions for computing score

ec8411b

Add module to compute standalone score

f13996b

Setup main in compute score module

cad1edb

Add README + format script

c8a7ba0

Fixes: Reset scaling + output individual results + remove unnecesary …

105f189

…argument

pgmpablo157321 force-pushed the standalone_score_compute branch from 32e2eec to 105f189 Compare September 9, 2025 00:35

ShriyaRishab previously approved these changes Sep 9, 2025

View reviewed changes

pgmpablo157321 dismissed ShriyaRishab’s stale review via 05e7fae September 9, 2025 20:59

Log samples to converge

b0b2fe3

pgmpablo157321 force-pushed the standalone_score_compute branch from 05e7fae to b0b2fe3 Compare September 9, 2025 22:25

ShriyaRishab approved these changes Sep 10, 2025

View reviewed changes

pgmpablo157321 merged commit 5e82c8f into master Sep 10, 2025
1 check passed

github-actions bot locked and limited conversation to collaborators Sep 10, 2025

Standalone score compute #433

Standalone score compute #433

Uh oh!

Conversation

pgmpablo157321 commented Aug 28, 2025

Uh oh!

github-actions bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShriyaRishab commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShriyaRishab commented Sep 4, 2025

Uh oh!

ShriyaRishab commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShriyaRishab commented Sep 4, 2025

Uh oh!

Uh oh!

matthew-frank Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

pgmpablo157321 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pgmpablo157321 commented Sep 9, 2025

Uh oh!

pgmpablo157321 commented Sep 9, 2025

Uh oh!

ShriyaRishab left a comment

Choose a reason for hiding this comment

Uh oh!

pgmpablo157321 commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Aug 28, 2025 •

edited

Loading

ShriyaRishab commented Sep 4, 2025 •

edited

Loading

ShriyaRishab commented Sep 4, 2025 •

edited

Loading