Fsdp pytorch draft PR #823

davidtweedle · 2024-12-19T02:10:42Z

A draft PR for potential changes for the pytorch workloads to upgrade to FSDP (fully sharded data parallel) from DDP (distributed data parallel).

Summary for changes to: cifar, mnist, criteo1tb, imagenet vit, imagenet resnet, librispeech deepspeech, librispeech conformer, ogbg, wmt, fastmri

import required packages (e.g., fsdp)
construct model using FSDP constructor instead of DDP constructor
very naive sharding for now
Crucially: must zero grad before eval

Summary for changes to momentum (as simple test optimizer):

first compute weighted loss on each device
then loss.backward (the gradient of the losses will now be all reduced by a pytorch communication hook)
then display the correct loss

github-actions · 2024-12-19T02:10:56Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

priyakasimbeg · 2025-03-13T03:47:28Z

won't fix. closing for now

davidtweedle added 13 commits November 24, 2024 15:31

Updated cifar workload, and momentum algorithm to use FSDP

37905b1

Removed import causing problems on kaggle

6eccc1d

OGBG updated to FSDP (to be tested still)

d31d79b

wmt workload testing for FSDP

6ccb1e8

added functools import to wmt

b786bfc

First update of FSDP for mnist

a2000bd

First update for FSDP criteo

49774dd

First FSDP update for fastmri

1a30851

First FSDP pytorch update for imagenet resnet

015734f

First FSDP pytorch update for imagenet vit

10a32d4

First update for FSDP pytorch librispeech conformer

11676d6

First update for FSDP pytorch librispeech deepspeech

dbfd233

Typo in FSDP definition for ogbg

40a932f

priyakasimbeg closed this Mar 13, 2025

github-actions bot locked and limited conversation to collaborators Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fsdp pytorch draft PR #823

Fsdp pytorch draft PR #823

Uh oh!

davidtweedle commented Dec 19, 2024

Uh oh!

github-actions bot commented Dec 19, 2024

Uh oh!

priyakasimbeg commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fsdp pytorch draft PR #823

Fsdp pytorch draft PR #823

Uh oh!

Conversation

davidtweedle commented Dec 19, 2024

Uh oh!

github-actions bot commented Dec 19, 2024

Uh oh!

priyakasimbeg commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants