[Reland] Add Dynamic Model Import and ModelSpec Definition (#837) #854

fduwjj · 2025-02-18T05:59:16Z

Stack from ghstack (oldest at bottom):

-> [Reland] Add Dynamic Model Import and ModelSpec Definition (#837) #854

ghstack didn't land #814
correctly. Open this PR to do so. The detail discussion please refer to
#814

What does this PR do?

This PR introduces ModelSpec to describe a model and how to
parallelize a model.
- All the models should call register_model_spec().

Users can also use --experimental.custom_model_path to dynamically
import a model that is not implemented by TorchTitan. The module should
also call register_model_spec().

This PR also refactors OptimizersContainer and
LRSchedulersContainers

Fixes an issue that optimizers will accept parameters that
requires_grad is False.
- Improve typing and docstring.
- Improve the function and class reusability.
- OptimizersContainer now inherits from torch.optim.Optimizer .

This PR also moves parallelize_llama and pipelining_llama to the
llama folder.

Why do we need this PR?
This allows users to use TorchTitan with a new model without intrusively
change TorchTitan code.

Next steps

Dataloader is not included
Checkpoint customization is not included yet.

`ghstack` didn't land #814 correctly. Open this PR to do so. The detail discussion please refer to #814 **What does this PR do?** 1. This PR introduces `ModelSpec` to describe a model and how to parallelize a model. * All the models should call `register_model_spec()`. * Users can also use `--experimental.custom_model_path` to dynamically import a model that is not implemented by TorchTitan. The module should also call `register_model_spec()`. 2. This PR also refactors `OptimizersContainer` and `LRSchedulersContainers` * Fixes an issue that optimizers will accept parameters that requires_grad is False. * Improve typing and docstring. * Improve the function and class reusability. * `OptimizersContainer` now inherits from `torch.optim.Optimizer` . 3. This PR also moves `parallelize_llama` and `pipelining_llama` to the `llama` folder. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. Dataloader is not included 2. Checkpoint customization is not included yet. [ghstack-poisoned]

`ghstack` didn't land #814 correctly. Open this PR to do so. The detail discussion please refer to #814 **What does this PR do?** 1. This PR introduces `ModelSpec` to describe a model and how to parallelize a model. * All the models should call `register_model_spec()`. * Users can also use `--experimental.custom_model_path` to dynamically import a model that is not implemented by TorchTitan. The module should also call `register_model_spec()`. 2. This PR also refactors `OptimizersContainer` and `LRSchedulersContainers` * Fixes an issue that optimizers will accept parameters that requires_grad is False. * Improve typing and docstring. * Improve the function and class reusability. * `OptimizersContainer` now inherits from `torch.optim.Optimizer` . 3. This PR also moves `parallelize_llama` and `pipelining_llama` to the `llama` folder. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. Dataloader is not included 2. Checkpoint customization is not included yet. ghstack-source-id: 0385574 Pull Request resolved: #854

pytorch-bot bot added the ci-no-td label Feb 18, 2025

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 18, 2025

fduwjj closed this Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Reland] Add Dynamic Model Import and ModelSpec Definition (#837) #854

[Reland] Add Dynamic Model Import and ModelSpec Definition (#837) #854

Uh oh!

fduwjj commented Feb 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Reland] Add Dynamic Model Import and ModelSpec Definition (#837) #854

[Reland] Add Dynamic Model Import and ModelSpec Definition (#837) #854

Uh oh!

Conversation

fduwjj commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fduwjj commented Feb 18, 2025 •

edited

Loading