Skip to content

Conversation

@fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Feb 18, 2025

Stack from ghstack (oldest at bottom):

ghstack didn't land #814
correctly. Open this PR to do so. The detail discussion please refer to
#814

What does this PR do?

  1. This PR introduces ModelSpec to describe a model and how to
    parallelize a model.
    • All the models should call register_model_spec().
  • Users can also use --experimental.custom_model_path to dynamically
    import a model that is not implemented by TorchTitan. The module should
    also call register_model_spec().
  1. This PR also refactors OptimizersContainer and
    LRSchedulersContainers
  • Fixes an issue that optimizers will accept parameters that
    requires_grad is False.
    • Improve typing and docstring.
    • Improve the function and class reusability.
    • OptimizersContainer now inherits from torch.optim.Optimizer .
  1. This PR also moves parallelize_llama and pipelining_llama to the
    llama folder.

Why do we need this PR?
This allows users to use TorchTitan with a new model without intrusively
change TorchTitan code.

Next steps

  1. Dataloader is not included
  2. Checkpoint customization is not included yet.

`ghstack` didn't land #814
correctly. Open this PR to do so. The detail discussion please refer to
#814

**What does this PR do?**
1. This PR introduces `ModelSpec` to describe a model and how to
parallelize a model.
    * All the models should call `register_model_spec()`.
* Users can also use `--experimental.custom_model_path` to dynamically
import a model that is not implemented by TorchTitan. The module should
also call `register_model_spec()`.
2. This PR also refactors `OptimizersContainer` and
`LRSchedulersContainers`
* Fixes an issue that optimizers will accept parameters that
requires_grad is False.
    * Improve typing and docstring.
    * Improve the function and class reusability.
    * `OptimizersContainer` now inherits from `torch.optim.Optimizer` .
3. This PR also moves `parallelize_llama` and `pipelining_llama` to the
`llama` folder.

**Why do we need this PR?**
This allows users to use TorchTitan with a new model without intrusively
change TorchTitan code.

**Next steps**
1. Dataloader is not included
2. Checkpoint customization is not included yet.

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the ci-no-td label Feb 18, 2025
fduwjj added a commit that referenced this pull request Feb 18, 2025
`ghstack` didn't land #814
correctly. Open this PR to do so. The detail discussion please refer to
#814

**What does this PR do?**
1. This PR introduces `ModelSpec` to describe a model and how to
parallelize a model.
    * All the models should call `register_model_spec()`.
* Users can also use `--experimental.custom_model_path` to dynamically
import a model that is not implemented by TorchTitan. The module should
also call `register_model_spec()`.
2. This PR also refactors `OptimizersContainer` and
`LRSchedulersContainers`
* Fixes an issue that optimizers will accept parameters that
requires_grad is False.
    * Improve typing and docstring.
    * Improve the function and class reusability.
    * `OptimizersContainer` now inherits from `torch.optim.Optimizer` .
3. This PR also moves `parallelize_llama` and `pipelining_llama` to the
`llama` folder.

**Why do we need this PR?**
This allows users to use TorchTitan with a new model without intrusively
change TorchTitan code.

**Next steps**
1. Dataloader is not included
2. Checkpoint customization is not included yet.

ghstack-source-id: 0385574
Pull Request resolved: #854
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 18, 2025
@fduwjj fduwjj closed this Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants