-
Notifications
You must be signed in to change notification settings - Fork 74
Closed
Labels
👷 In ProgressIssue is being worked onIssue is being worked on
Description
It is useful to shard optimizer state across devices (to save significant memory). This reflects current practice. We want to support it.
- We want to switch from no sharding to naive model parameter sharding in both framworks.
- We will forbid (in the rules) any hacks that change the model parallelization strategy and have workload-default sharding.
- Allow submitters to opt-out of it on a per-workload basis.
Metadata
Metadata
Assignees
Labels
👷 In ProgressIssue is being worked onIssue is being worked on