📄 [Paper] | 🤗 [Hugging Face] 📁 [Dataset] 💻 [Code] | 📊 [Log] | 📰 [Blog]
We have released models trained using the Drop-Upcycling technique:
🤗 LLM-JP-3-8x13B | 🤗 LLM-JP-3-8x1.8B
These models demonstrate the practical application of our Drop-Upcycling methodology for training efficient sparse Mixture of Experts models. For more details about the model release, please see our blog post.
The experiments were conducted using the following frameworks:
- Framework: Megatron-LM
- Framework: moe-recipes
We conducted comprehensive evaluations using the evaluation framework from swallow-llm/swallow-evaluation (commit: 04948a0).
For detailed instructions on setting up the evaluation environment and running the evaluation scripts, please refer to the evaluation framework documentation.
@inproceedings{
nakamura2025dropupcycling,
title={Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization},
author={Taishi Nakamura and Takuya Akiba and Kazuki Fujii and Yusuke Oda and Rio Yokota and Jun Suzuki},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=gx1wHnf5Vp}
}