From bbe38bdd76ef968320745001ce0a0e404b84b05d Mon Sep 17 00:00:00 2001 From: Pierre-Henry Soria Date: Fri, 17 Oct 2025 19:35:26 +1100 Subject: [PATCH 1/4] fix: Incorrect example variable name (`$MODLE_PATH` -> `$MODEL_PATH`) --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 77f9765..80645cd 100644 --- a/README.md +++ b/README.md @@ -217,7 +217,7 @@ BF16 and FP8 models are supported by SGLang now, it depends on the dtype of the - Start server: ```shell python -m sglang.launch_server \ - --model-path $MODLE_PATH \ + --model-path $MODEL_PATH \ --host 0.0.0.0 --port $PORT \ --trust-remote-code \ --attention-backend fa3 From ee462c2f650c747c31a18031007d8cf287508d3c Mon Sep 17 00:00:00 2001 From: Pierre-Henry Soria Date: Fri, 17 Oct 2025 19:36:18 +1100 Subject: [PATCH 2/4] fix: Incorrect 404 not found LICENSE link --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 80645cd..12ea5c5 100644 --- a/README.md +++ b/README.md @@ -263,7 +263,7 @@ We recommend you to use [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory ## License -This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/master/LICENCE). +This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/main/LICENSE). ## Citation From 12b7bc3499f0d980a7497d8031c30245126e4123 Mon Sep 17 00:00:00 2001 From: Pierre-Henry Soria Date: Fri, 17 Oct 2025 19:37:49 +1100 Subject: [PATCH 3/4] fix: Grammar, typos --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 12ea5c5..8a980e7 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ The highly sparse small-activation MoE architecture also delivers significant tr Ling 2.0 employs __FP8 mixed-precision training__ throughout. Compared with BF16, experiments with over 1T training tokens show nearly identical loss curves and downstream benchmark performance. To support the community in efficient continued pretraining and fine-tuning under limited compute, we are also open-sourcing our __FP8 training solution__. Based on tile/blockwise FP8 scaling, it further introduces FP8 optimizer, FP8 on-demand transpose weight, and FP8 padding routing map for extreme memory optimization. On 8/16/32 80G GPUs, compared with LLaMA 3.1 8B and Qwen3 8B, __Ling-mini-2.0 achieved 30–60% throughput gains with MTP enabled, and 90–120% throughput gains with MTP disabled__. -### A More Open Opensource Strategy +### A More Open-Source Strategy We believe Ling-mini-2.0 is an ideal starting point for MoE research. For the first time at this scale, it integrates 1/32 sparsity, MTP layers, and FP8 training — achieving both strong effectiveness and efficient training/inference performance, making it a prime candidate for the small-size LLM segment. To further foster community research, in addition to releasing the post-trained version, we are also open-sourcing __five pretraining checkpoints__: the pre-finetuning Ling-mini-2.0-base, along with four base models trained on 5T, 10T, 15T, and 20T tokens, enabling deeper research and broader applications. @@ -68,7 +68,7 @@ Note: If you are interested in previous version, please visit the past model col ### Convert to safetensors Models with safetensors format can be downloaded from [HuggingFace](https://huggingface.co/inclusionAI) or [ModelScope](https://modelscope.cn/organization/inclusionAI). -If you want to train your model and eval it, you can convert from dcp produced by training. +If you want to train your model and evaluate it, you can convert from dcp produced by training. ```shell python tools/convert_dcp_to_safe_tensors.py --checkpoint-path ${DCP_PATH} --target-path ${SAFETENSORS_PATH} ``` From fe3d4a4b1ec300c9b18bfee903596867ce025288 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E2=99=9A=20PH=E2=91=A6=20=E2=80=94=20Pierre-Henry?= =?UTF-8?q?=E2=84=A2=E2=99=9B?= Date: Fri, 17 Oct 2025 19:43:06 +1100 Subject: [PATCH 4/4] Update README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8a980e7..e4cadb8 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ The highly sparse small-activation MoE architecture also delivers significant tr Ling 2.0 employs __FP8 mixed-precision training__ throughout. Compared with BF16, experiments with over 1T training tokens show nearly identical loss curves and downstream benchmark performance. To support the community in efficient continued pretraining and fine-tuning under limited compute, we are also open-sourcing our __FP8 training solution__. Based on tile/blockwise FP8 scaling, it further introduces FP8 optimizer, FP8 on-demand transpose weight, and FP8 padding routing map for extreme memory optimization. On 8/16/32 80G GPUs, compared with LLaMA 3.1 8B and Qwen3 8B, __Ling-mini-2.0 achieved 30–60% throughput gains with MTP enabled, and 90–120% throughput gains with MTP disabled__. -### A More Open-Source Strategy +### A More Open Open-Source Strategy We believe Ling-mini-2.0 is an ideal starting point for MoE research. For the first time at this scale, it integrates 1/32 sparsity, MTP layers, and FP8 training — achieving both strong effectiveness and efficient training/inference performance, making it a prime candidate for the small-size LLM segment. To further foster community research, in addition to releasing the post-trained version, we are also open-sourcing __five pretraining checkpoints__: the pre-finetuning Ling-mini-2.0-base, along with four base models trained on 5T, 10T, 15T, and 20T tokens, enabling deeper research and broader applications.