Skip to content

Conversation

@CarlosGomes98
Copy link
Contributor

This is required for the bug fix from mlcommons/training#844

Additionally, I took the chance and explored some lr warmup for gbs 1024. This is because I found the variance was on the higher side.

The only large changes are the convergence for 512 and the hparam change and thus CV for 1k. Other scales are mostly unchanged.

The changes are as follows:

512:

  • Cur mean: 8.16M
  • New mean: 7.17M
  • Cur CV: 0.031
  • New CV: 0.029

1024
Hparams changed. LR 2e-4 -> 2.5e-4. Warmup 0->800 steps

  • Cur mean: 8.76M
  • New mean: 8.55M
  • Cur CV: 0.067
  • New CV: 0.036

2048

  • Cur mean: 10.7M
  • New mean: 10.3M
  • Cur CV: 0.057
  • New CV: 0.058

4096

  • Cur mean: 15.6M
  • New mean: 15.3M
  • Cur CV: 0.027
  • New CV: 0.033

@github-actions
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant