[WIP] JAX.JIT Switch and Sharding #822

rka97 · 2024-12-09T21:21:02Z

Purpose

The goal of this PR is to allow model parameter and optimizer state sharding, and also to migrate the JAX code from using jax.pmap to using jax.jit.

TODOs:

Changelog

Added some sharding utilities to handle data distributed
Replaced pmap code for CIFAR/MNIST with jit
Modified AdamW and Nesterov accordingly
Updated checkpoint and data_utils to support the new approach (mostly removing explicit jax_utils.replicate calls).

Issues

Prefetching functionality in CIFAR is temporarily disabled (marked with FIXME), not sure how to best support it here.
I haven't edited any of the PyTorch code, we will need to make sure they still do comparably..

Apply it to the MNIST workload and the Nesterov optimizer.

github-actions · 2024-12-09T21:21:16Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

rka97 · 2024-12-09T21:42:03Z

recheck

Still need to test out (a) output losses, (b) speed, and (c) look into other librispeech.

priyakasimbeg · 2025-03-06T21:47:03Z

migrating this PR to one that merges from branch on this repo.

rka97 added 4 commits November 21, 2024 11:56

Use jax.jit for sharding initial steps

89fec8f

Apply it to the MNIST workload and the Nesterov optimizer.

Use jax.jit for adamw

c53729c

Pass yapf checks

ef6af03

CIFAR workload sharding

e6037d6

rka97 requested a review from a team as a code owner December 9, 2024 21:21

librispeech_conformer now running

0698e34

Still need to test out (a) output losses, (b) speed, and (c) look into other librispeech.

rka97 force-pushed the jit_switch branch from 37207a7 to 0698e34 Compare January 9, 2025 04:40

rka97 added 4 commits February 5, 2025 14:33

fix formatting

6094690

shard default

8ad564f

start imagenet

1a728dc

remove bn sync in imagenet (jit handles it automatically)

c351975

rka97 force-pushed the jit_switch branch from 95ab984 to c351975 Compare February 5, 2025 14:34

rka97 added 2 commits February 6, 2025 18:08

ImageNet-ViT also works

b7b8f6f

Start working on WMT. OOM error

5bdc6dc

priyakasimbeg closed this Mar 6, 2025

github-actions bot locked and limited conversation to collaborators Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] JAX.JIT Switch and Sharding #822

[WIP] JAX.JIT Switch and Sharding #822

Uh oh!

rka97 commented Dec 9, 2024

Uh oh!

github-actions bot commented Dec 9, 2024 •

edited

Loading

Uh oh!

rka97 commented Dec 9, 2024

Uh oh!

priyakasimbeg commented Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] JAX.JIT Switch and Sharding #822

[WIP] JAX.JIT Switch and Sharding #822

Uh oh!

Conversation

rka97 commented Dec 9, 2024

Purpose

TODOs:

Changelog

Issues

Uh oh!

github-actions bot commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rka97 commented Dec 9, 2024

Uh oh!

priyakasimbeg commented Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 9, 2024 •

edited

Loading