[Doc] Update accuracy reports for v0.10.1rc1 (vllm-project#2755)

vllm-ascend-ci · Angazenn · commit 3538411afc5e · 2025-10-21T09:42:33.000+08:00
The accuracy results running on NPU Altlas A2 have changed, updating reports for: All models (Qwen3-30B-A3B, Qwen2.5-VL-7B-Instruct, Qwen3-8B-Base, DeepSeek-V2-Lite) - [Workflow run][1] [1]: https://github.com/vllm-project/vllm-ascend/actions/runs/17459225764 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@2b30afa Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com> Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
diff --git a/docs/source/developer_guide/evaluation/accuracy_report/DeepSeek-V2-Lite.md b/docs/source/developer_guide/evaluation/accuracy_report/DeepSeek-V2-Lite.md
@@ -0,0 +1,20 @@
+# deepseek-ai/DeepSeek-V2-Lite
+
+- **vLLM Version**: vLLM: 0.10.1.1 ([1da94e6](https://github.com/vllm-project/vllm/commit/1da94e6)), **vLLM Ascend Version**: v0.10.1rc1 ([7e16b4a](https://github.com/vllm-project/vllm-ascend/commit/7e16b4a))  
+- **Software Environment**: **CANN**: 8.2.RC1, **PyTorch**: 2.7.1, **torch-npu**: 2.7.1.dev20250724  
+- **Hardware Environment**: Atlas A2 Series  
+- **Parallel mode**: TP2
+- **Execution mode**: ACLGraph
+
+**Command**:  
+
+```bash
+export MODEL_ARGS='pretrained=deepseek-ai/DeepSeek-V2-Lite,tensor_parallel_size=2,dtype=auto,trust_remote_code=True,max_model_len=4096,enforce_eager=True'
+lm_eval --model vllm --model_args $MODEL_ARGS --tasks gsm8k \
+    --batch_size auto
+```
+
+| Task                  | Metric      | Value     | Stderr |
+|-----------------------|-------------|----------:|-------:|
+| gsm8k | exact_match,strict-match | ✅0.3813 | ± 0.0134 |
+| gsm8k | exact_match,flexible-extract | ✅0.3836 | ± 0.0134 |
diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md
@@ -0,0 +1,19 @@
+# Qwen/Qwen2.5-VL-7B-Instruct
+
+- **vLLM Version**: vLLM: 0.10.1.1 ([1da94e6](https://github.com/vllm-project/vllm/commit/1da94e6)), **vLLM Ascend Version**: v0.10.1rc1 ([7e16b4a](https://github.com/vllm-project/vllm-ascend/commit/7e16b4a))  
+- **Software Environment**: **CANN**: 8.2.RC1, **PyTorch**: 2.7.1, **torch-npu**: 2.7.1.dev20250724  
+- **Hardware Environment**: Atlas A2 Series  
+- **Parallel mode**: TP1
+- **Execution mode**: ACLGraph
+
+**Command**:  
+
+```bash
+export MODEL_ARGS='pretrained=Qwen/Qwen2.5-VL-7B-Instruct,tensor_parallel_size=1,dtype=auto,trust_remote_code=False,max_model_len=8192'
+lm_eval --model vllm-vlm --model_args $MODEL_ARGS --tasks mmmu_val \
+ --apply_chat_template True   --fewshot_as_multiturn True    --batch_size auto
+```
+
+| Task                  | Metric      | Value     | Stderr |
+|-----------------------|-------------|----------:|-------:|
+| mmmu_val | acc,none | ✅0.52 | ± 0.0162 |
diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.md
@@ -0,0 +1,21 @@
+# Qwen/Qwen3-30B-A3B
+
+- **vLLM Version**: vLLM: 0.10.1.1 ([1da94e6](https://github.com/vllm-project/vllm/commit/1da94e6)), **vLLM Ascend Version**: v0.10.1rc1 ([7e16b4a](https://github.com/vllm-project/vllm-ascend/commit/7e16b4a))  
+- **Software Environment**: **CANN**: 8.2.RC1, **PyTorch**: 2.7.1, **torch-npu**: 2.7.1.dev20250724  
+- **Hardware Environment**: Atlas A2 Series  
+- **Parallel mode**: TP2 + EP
+- **Execution mode**: ACLGraph
+
+**Command**:  
+
+```bash
+export MODEL_ARGS='pretrained=Qwen/Qwen3-30B-A3B,tensor_parallel_size=2,dtype=auto,trust_remote_code=False,max_model_len=4096,gpu_memory_utilization=0.6,enable_expert_parallel=True'
+lm_eval --model vllm --model_args $MODEL_ARGS --tasks gsm8k,ceval-valid \
+   --num_fewshot 5   --batch_size auto
+```
+
+| Task                  | Metric      | Value     | Stderr |
+|-----------------------|-------------|----------:|-------:|
+| gsm8k | exact_match,strict-match | ✅0.8923 | ± 0.0085 |
+| gsm8k | exact_match,flexible-extract | ✅0.8506 | ± 0.0098 |
+| ceval-valid | acc,none | ✅0.8358 | ± 0.0099 |
diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.md
@@ -0,0 +1,21 @@
+# Qwen/Qwen3-8B-Base
+
+- **vLLM Version**: vLLM: 0.10.1.1 ([1da94e6](https://github.com/vllm-project/vllm/commit/1da94e6)), **vLLM Ascend Version**: v0.10.1rc1 ([7e16b4a](https://github.com/vllm-project/vllm-ascend/commit/7e16b4a))  
+- **Software Environment**: **CANN**: 8.2.RC1, **PyTorch**: 2.7.1, **torch-npu**: 2.7.1.dev20250724  
+- **Hardware Environment**: Atlas A2 Series  
+- **Parallel mode**: TP1
+- **Execution mode**: ACLGraph
+
+**Command**:  
+
+```bash
+export MODEL_ARGS='pretrained=Qwen/Qwen3-8B-Base,tensor_parallel_size=1,dtype=auto,trust_remote_code=False,max_model_len=4096'
+lm_eval --model vllm --model_args $MODEL_ARGS --tasks gsm8k,ceval-valid \
+ --apply_chat_template True   --fewshot_as_multiturn True   --num_fewshot 5   --batch_size auto
+```
+
+| Task                  | Metric      | Value     | Stderr |
+|-----------------------|-------------|----------:|-------:|
+| gsm8k | exact_match,strict-match | ✅0.8271 | ± 0.0104 |
+| gsm8k | exact_match,flexible-extract | ✅0.8294 | ± 0.0104 |
+| ceval-valid | acc,none | ✅0.815 | ± 0.0103 |
diff --git a/docs/source/developer_guide/evaluation/accuracy_report/index.md b/docs/source/developer_guide/evaluation/accuracy_report/index.md
@@ -3,4 +3,8 @@
 :::{toctree}
 :caption: Accuracy Report
 :maxdepth: 1
+DeepSeek-V2-Lite
+Qwen2.5-VL-7B-Instruct
+Qwen3-30B-A3B
+Qwen3-8B-Base
 :::