From bbe38bdd76ef968320745001ce0a0e404b84b05d Mon Sep 17 00:00:00 2001
From: Pierre-Henry Soria <pierre@ph7.me>
Date: Fri, 17 Oct 2025 19:35:26 +1100
Subject: [PATCH 1/4] fix: Incorrect example variable name

(`$MODLE_PATH` -> `$MODEL_PATH`)
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 77f9765..80645cd 100644
--- a/README.md
+++ b/README.md
@@ -217,7 +217,7 @@ BF16 and FP8 models are supported by SGLang now, it depends on the dtype of the
 - Start server:
 ```shell
 python -m sglang.launch_server \
-    --model-path $MODLE_PATH \
+    --model-path $MODEL_PATH \
     --host 0.0.0.0 --port $PORT \
     --trust-remote-code \
     --attention-backend fa3

From ee462c2f650c747c31a18031007d8cf287508d3c Mon Sep 17 00:00:00 2001
From: Pierre-Henry Soria <pierre@ph7.me>
Date: Fri, 17 Oct 2025 19:36:18 +1100
Subject: [PATCH 2/4] fix: Incorrect 404 not found LICENSE link

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 80645cd..12ea5c5 100644
--- a/README.md
+++ b/README.md
@@ -263,7 +263,7 @@ We recommend you to use [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory
 
 ## License
 
-This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/master/LICENCE).
+This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/main/LICENSE).
 
 ## Citation
 

From 12b7bc3499f0d980a7497d8031c30245126e4123 Mon Sep 17 00:00:00 2001
From: Pierre-Henry Soria <pierre@ph7.me>
Date: Fri, 17 Oct 2025 19:37:49 +1100
Subject: [PATCH 3/4] fix: Grammar, typos

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 12ea5c5..8a980e7 100644
--- a/README.md
+++ b/README.md
@@ -31,7 +31,7 @@ The highly sparse small-activation MoE architecture also delivers significant tr
 
 Ling 2.0 employs __FP8 mixed-precision training__ throughout. Compared with BF16, experiments with over 1T training tokens show nearly identical loss curves and downstream benchmark performance. To support the community in efficient continued pretraining and fine-tuning under limited compute, we are also open-sourcing our __FP8 training solution__. Based on tile/blockwise FP8 scaling, it further introduces FP8 optimizer, FP8 on-demand transpose weight, and FP8 padding routing map for extreme memory optimization. On 8/16/32 80G GPUs, compared with LLaMA 3.1 8B and Qwen3 8B, __Ling-mini-2.0 achieved 30–60% throughput gains with MTP enabled, and 90–120% throughput gains with MTP disabled__.
 
-### A More Open Opensource Strategy
+### A More Open-Source Strategy
 
 We believe Ling-mini-2.0 is an ideal starting point for MoE research. For the first time at this scale, it integrates 1/32 sparsity, MTP layers, and FP8 training — achieving both strong effectiveness and efficient training/inference performance, making it a prime candidate for the small-size LLM segment.
 To further foster community research, in addition to releasing the post-trained version, we are also open-sourcing __five pretraining checkpoints__: the pre-finetuning Ling-mini-2.0-base, along with four base models trained on 5T, 10T, 15T, and 20T tokens, enabling deeper research and broader applications.
@@ -68,7 +68,7 @@ Note: If you are interested in previous version, please visit the past model col
 ### Convert to safetensors
 
 Models with safetensors format can be downloaded from [HuggingFace](https://huggingface.co/inclusionAI) or [ModelScope](https://modelscope.cn/organization/inclusionAI).
-If you want to train your model and eval it, you can convert from dcp produced by training.
+If you want to train your model and evaluate it, you can convert from dcp produced by training.
 ```shell
 python tools/convert_dcp_to_safe_tensors.py --checkpoint-path ${DCP_PATH} --target-path ${SAFETENSORS_PATH}
 ```

From fe3d4a4b1ec300c9b18bfee903596867ce025288 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E2=99=9A=20PH=E2=91=A6=20=E2=80=94=20Pierre-Henry?=
 =?UTF-8?q?=E2=84=A2=E2=99=9B?= <ph7software@gmail.com>
Date: Fri, 17 Oct 2025 19:43:06 +1100
Subject: [PATCH 4/4] Update README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8a980e7..e4cadb8 100644
--- a/README.md
+++ b/README.md
@@ -31,7 +31,7 @@ The highly sparse small-activation MoE architecture also delivers significant tr
 
 Ling 2.0 employs __FP8 mixed-precision training__ throughout. Compared with BF16, experiments with over 1T training tokens show nearly identical loss curves and downstream benchmark performance. To support the community in efficient continued pretraining and fine-tuning under limited compute, we are also open-sourcing our __FP8 training solution__. Based on tile/blockwise FP8 scaling, it further introduces FP8 optimizer, FP8 on-demand transpose weight, and FP8 padding routing map for extreme memory optimization. On 8/16/32 80G GPUs, compared with LLaMA 3.1 8B and Qwen3 8B, __Ling-mini-2.0 achieved 30–60% throughput gains with MTP enabled, and 90–120% throughput gains with MTP disabled__.
 
-### A More Open-Source Strategy
+### A More Open Open-Source Strategy
 
 We believe Ling-mini-2.0 is an ideal starting point for MoE research. For the first time at this scale, it integrates 1/32 sparsity, MTP layers, and FP8 training — achieving both strong effectiveness and efficient training/inference performance, making it a prime candidate for the small-size LLM segment.
 To further foster community research, in addition to releasing the post-trained version, we are also open-sourcing __five pretraining checkpoints__: the pre-finetuning Ling-mini-2.0-base, along with four base models trained on 5T, 10T, 15T, and 20T tokens, enabling deeper research and broader applications.