Skip to content

Taishi-N324/Drop-Upcycling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Drop-Upcycling
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

📄 [Paper] | 🤗 [Hugging Face] 📁 [Dataset] 💻 [Code] | 📊 [Log] | 📰 [Blog]

Models Trained with Drop-Upcycling

We have released models trained using the Drop-Upcycling technique:

🤗 LLM-JP-3-8x13B | 🤗 LLM-JP-3-8x1.8B

These models demonstrate the practical application of our Drop-Upcycling methodology for training efficient sparse Mixture of Experts models. For more details about the model release, please see our blog post.

Pretraining

The experiments were conducted using the following frameworks:

Dense Model Training

MoE Model Training

Evaluation

We conducted comprehensive evaluations using the evaluation framework from swallow-llm/swallow-evaluation (commit: 04948a0).

Setup and Usage

For detailed instructions on setting up the evaluation environment and running the evaluation scripts, please refer to the evaluation framework documentation.

Citation

@inproceedings{
    nakamura2025dropupcycling,
    title={Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization},
    author={Taishi Nakamura and Takuya Akiba and Kazuki Fujii and Yusuke Oda and Rio Yokota and Jun Suzuki},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=gx1wHnf5Vp}
}

About

[ICLR'25] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published