update flux rcp #443

CarlosGomes98 · 2025-11-19T14:45:31Z

This is required for the bug fix from mlcommons/training#844

Additionally, I took the chance and explored some lr warmup for gbs 1024. This is because I found the variance was on the higher side.

The only large changes are the convergence for 512 and the hparam change and thus CV for 1k. Other scales are mostly unchanged.

The changes are as follows:

512:

Cur mean: 8.16M
New mean: 7.17M
Cur CV: 0.031
New CV: 0.029

1024
Hparams changed. LR 2e-4 -> 2.5e-4. Warmup 0->800 steps

Cur mean: 8.76M
New mean: 8.55M
Cur CV: 0.067
New CV: 0.036

2048

Cur mean: 10.7M
New mean: 10.3M
Cur CV: 0.057
New CV: 0.058

4096

Cur mean: 15.6M
New mean: 15.3M
Cur CV: 0.027
New CV: 0.033

github-actions · 2025-11-19T14:46:04Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

update flux rcp

658bed8

CarlosGomes98 requested review from a team as code owners November 19, 2025 14:45

CarlosGomes98 mentioned this pull request Nov 19, 2025

[Flux] fix incorrect seed setting for dp shard mlcommons/training#844

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update flux rcp #443

update flux rcp #443

Uh oh!

CarlosGomes98 commented Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

update flux rcp #443

Are you sure you want to change the base?

update flux rcp #443

Uh oh!

Conversation

CarlosGomes98 commented Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant