[WIP] LM Workload #860

rka97 · 2025-04-03T17:44:18Z

This is for the LM workload.

Dev -> main

github-actions · 2025-04-03T17:44:32Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

…to use int32 for tensor types and add dropout rate parameter

fsschneider · 2025-10-07T11:28:42Z

I might be wrong here, but I just randomly saw that we are using a default MLP expansion factor of $4$ (i.e.,

algorithmic-efficiency/algoperf/workloads/lm/lm_pytorch/workload.py

Line 40 in 5c85c7e

expand=4, # MLP expansion factor

).

However, it seems to me, that we are also using a gated linear unit, which uses an initial expansion that is twice as large (since it is split into the gate and the value), i.e.

algorithmic-efficiency/algoperf/workloads/lm/lm_pytorch/plainlm_model.py

Lines 28 to 30 in 5c85c7e

    
           self.fc1 = nn.Linear(dim, 2 * hidden_dim, bias=False) 
        
           self.fc2 = nn.Linear(hidden_dim, dim, bias=False) 
        
           self.glu = nn.GLU(dim=2)

I believe that the common expansion factor of $4$ is for models that use non-gated activation functions (e.g., original Transformer with ReLU or GPT-2 with GELU). To keep the number of parameters constant, people commonly divide by $2/3$ for gated versions. This keeps the total number of parameters in the MLP the same (some small approximation error due to the non-integer multiplication).

If you agree, we should either change the default to $4/3$, or adjust the computation of the hidden_dim in the MLP to account for this scaling.

@rka97 @priyakasimbeg @Niccolo-Ajroldi

Niccolo-Ajroldi · 2025-10-07T11:58:08Z

@fsschneider agree, we should definitely adjust the expansion factor as you suggested. This was probably an oversight, as in plainLM we rescale when using GLU.

priyakasimbeg and others added 30 commits February 27, 2025 14:56

Merge pull request #847 from mlcommons/dev

1d81455

Dev -> main

first LM commit

da5f85a

lm data pipeline

a12a364

testing

ca83ab8

LM workload tested torch pipeline

e3e78dc

LM workload - fix torch tests

e619495

add LM tests, remove dev files

d8e9c56

add LM tests, remove dev files

6b4ff12

Stop tracking .gitignore

3c5c847

Remove dev/ from repo, keep locally

20d841b

fix comments

f3ba059

add class specifications

381451f

add workload LM info

f111d2e

restore data_utils.py tree map

808d398

fixed NFS bug

35f8f89

train/val split before concat

cbb6ee6

renamed datasets to avoid conflict with HF

868987c

Merge remote-tracking branch 'upstream/lm_workload' into lm_workload

8191f6d

renamed datasets to dataset

dd59ded

fix style

496b9c3

fix formatting

50989eb

fix style

5af0fdc

fix style

2683099

fix yapf

6b7ee29

fix style

46b645b

HF datasets pipeline

b3ae647

Testing with linear model

f095d4b

Merge branch 'jit_switch' into lm_workload

4189ae0

lm workload with linear model

0c22f3d

add nanodo model

99c7b9b

torch model

706d9f7

rka97 assigned priyakasimbeg and rka97 Apr 3, 2025

lm workload dataset integration in jax

c335e34

rka97 force-pushed the lm_workload branch from 706d9f7 to c335e34 Compare May 29, 2025 14:39

rka97 and others added 12 commits May 29, 2025 14:47

lm workload dataset integration in jax

2d54365

set package versions for transformers and datasets

af8cce4

use train_test_split method to shuffle and split fineweb-edu dataset

d68c54e

modifications to fwedu datasetup

9737367

rename fwedu data dir

1bf0750

fix

a333391

add back batch mapping in tokenization for fwedu

05dc4dd

debugging

b374cf8

debugging

c0c1e3c

debugging

f76dc39

use tfds to shuffle and split dataset

e805fa7

Merge remote-tracking branch 'origin/dev' into lm_workload

362cbda

priyakasimbeg changed the base branch from main to dev October 2, 2025 00:40

priyakasimbeg and others added 5 commits October 2, 2025 03:40

add command for fineweb-edu

c9e9abc

fix

e4323de

update calls to sharing utils

f0c6e75

Fix torch sharding issue, update input pipeline and workload classes …

f4ffbe7

…to use int32 for tensor types and add dropout rate parameter

test working, lm workload training not working (debugging)

5c85c7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] LM Workload #860

[WIP] LM Workload #860

Uh oh!

rka97 commented Apr 3, 2025

Uh oh!

github-actions bot commented Apr 3, 2025 •

edited

Loading

Uh oh!

fsschneider commented Oct 7, 2025

Uh oh!

Niccolo-Ajroldi commented Oct 7, 2025

Uh oh!

Uh oh!

[WIP] LM Workload #860

Are you sure you want to change the base?

[WIP] LM Workload #860

Uh oh!

Conversation

rka97 commented Apr 3, 2025

Uh oh!

github-actions bot commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fsschneider commented Oct 7, 2025

Uh oh!

Niccolo-Ajroldi commented Oct 7, 2025

Uh oh!

Uh oh!

github-actions bot commented Apr 3, 2025 •

edited

Loading