Skip to content
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The highly sparse small-activation MoE architecture also delivers significant tr

Ling 2.0 employs __FP8 mixed-precision training__ throughout. Compared with BF16, experiments with over 1T training tokens show nearly identical loss curves and downstream benchmark performance. To support the community in efficient continued pretraining and fine-tuning under limited compute, we are also open-sourcing our __FP8 training solution__. Based on tile/blockwise FP8 scaling, it further introduces FP8 optimizer, FP8 on-demand transpose weight, and FP8 padding routing map for extreme memory optimization. On 8/16/32 80G GPUs, compared with LLaMA 3.1 8B and Qwen3 8B, __Ling-mini-2.0 achieved 30–60% throughput gains with MTP enabled, and 90–120% throughput gains with MTP disabled__.

### A More Open Opensource Strategy
### A More Open-Source Strategy

We believe Ling-mini-2.0 is an ideal starting point for MoE research. For the first time at this scale, it integrates 1/32 sparsity, MTP layers, and FP8 training — achieving both strong effectiveness and efficient training/inference performance, making it a prime candidate for the small-size LLM segment.
To further foster community research, in addition to releasing the post-trained version, we are also open-sourcing __five pretraining checkpoints__: the pre-finetuning Ling-mini-2.0-base, along with four base models trained on 5T, 10T, 15T, and 20T tokens, enabling deeper research and broader applications.
Expand Down Expand Up @@ -68,7 +68,7 @@ Note: If you are interested in previous version, please visit the past model col
### Convert to safetensors

Models with safetensors format can be downloaded from [HuggingFace](https://huggingface.co/inclusionAI) or [ModelScope](https://modelscope.cn/organization/inclusionAI).
If you want to train your model and eval it, you can convert from dcp produced by training.
If you want to train your model and evaluate it, you can convert from dcp produced by training.
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the environment variable DCP_PATH and the script name convert_dcp_to_safe_tensors.py, capitalize 'dcp' as 'DCP'.

Suggested change
If you want to train your model and evaluate it, you can convert from dcp produced by training.
If you want to train your model and evaluate it, you can convert from DCP produced by training.

Copilot uses AI. Check for mistakes.
```shell
python tools/convert_dcp_to_safe_tensors.py --checkpoint-path ${DCP_PATH} --target-path ${SAFETENSORS_PATH}
```
Expand Down Expand Up @@ -217,7 +217,7 @@ BF16 and FP8 models are supported by SGLang now, it depends on the dtype of the
- Start server:
```shell
python -m sglang.launch_server \
--model-path $MODLE_PATH \
--model-path $MODEL_PATH \
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use braced variable expansion for consistency with ${DCP_PATH} above and to avoid potential parsing issues; change to ${MODEL_PATH}.

Suggested change
--model-path $MODEL_PATH \
--model-path ${MODEL_PATH} \

Copilot uses AI. Check for mistakes.
--host 0.0.0.0 --port $PORT \
--trust-remote-code \
--attention-backend fa3
Expand Down Expand Up @@ -263,7 +263,7 @@ We recommend you to use [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory

## License

This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/master/LICENCE).
This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/main/LICENSE).

## Citation

Expand Down