Support FSDP in JAX workloads

It is useful to shard optimizer state across devices (to save significant memory). This reflects current practice. We want to support it.
* We want to switch from no sharding to naive model parameter sharding in both framworks.
* We will forbid (in the rules) any hacks that change the model parallelization strategy and have workload-default sharding. 
* Allow submitters to opt-out of it on a per-workload basis.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support FSDP in JAX workloads #797

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support FSDP in JAX workloads #797

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions