Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

📄 [Paper] | 🤗 [Hugging Face] 📁 [Dataset] 💻 [Code] | 📊 [Log] | 📰 [Blog]

Models Trained with Drop-Upcycling

We have released models trained using the Drop-Upcycling technique:

🤗 LLM-JP-3-8x13B | 🤗 LLM-JP-3-8x1.8B

These models demonstrate the practical application of our Drop-Upcycling methodology for training efficient sparse Mixture of Experts models. For more details about the model release, please see our blog post.

Pretraining

The experiments were conducted using the following frameworks:

Dense Model Training

Framework: Megatron-LM

MoE Model Training

Framework: moe-recipes

Evaluation

We conducted comprehensive evaluations using the evaluation framework from swallow-llm/swallow-evaluation (commit: 04948a0).

Setup and Usage

For detailed instructions on setting up the evaluation environment and running the evaluation scripts, please refer to the evaluation framework documentation.

Citation

@inproceedings{
    nakamura2025dropupcycling,
    title={Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization},
    author={Taishi Nakamura and Takuya Akiba and Kazuki Fujii and Yusuke Oda and Rio Yokota and Jun Suzuki},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=gx1wHnf5Vp}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
analysis		analysis
conversions		conversions
images		images
scripts/pretrain		scripts/pretrain
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Models Trained with Drop-Upcycling

Pretraining

Dense Model Training

MoE Model Training

Evaluation

Setup and Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Taishi-N324/Drop-Upcycling

Folders and files

Latest commit

History

Repository files navigation

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Models Trained with Drop-Upcycling

Pretraining

Dense Model Training

MoE Model Training

Evaluation

Setup and Usage

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages