Skip to content

Commit e5bdfad

Browse files
wangxiyuanyangxiaojun0126
authored andcommitted
[Doc] add v0.9.1 release note (vllm-project#2646)
Add release note for 0.9.1 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@8bd5844 Signed-off-by: wangxiyuan <[email protected]>
1 parent 9916da0 commit e5bdfad

File tree

7 files changed

+55
-7
lines changed

7 files changed

+55
-7
lines changed

.github/ISSUE_TEMPLATE/900-release-checklist.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ body:
3030
3131
- [ ] Add release note to docs/source/user_guide/release_notes.md
3232
33+
- [ ] Update release version in README.md and README.zh.md
34+
3335
- [ ] Update version info in docs/source/community/versioning_policy.md
3436
3537
- [ ] Update contributor info in docs/source/community/contributors.md

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,7 @@ Please use the following recommended versions to get started quickly:
5252
| Version | Release type | Doc |
5353
|------------|--------------|--------------------------------------|
5454
|v0.10.0rc1|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
55-
|v0.9.1rc3|Next stable release|[QuickStart](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html) for more details|
56-
|v0.7.3.post1|Latest stable version|[QuickStart](https://vllm-ascend.readthedocs.io/en/stable/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/stable/installation.html) for more details|
55+
|v0.9.1|Latest stable version|[QuickStart](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html) for more details|
5756

5857
## Contributing
5958
See [CONTRIBUTING](https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/index.html) for more details, which is a step-by-step guide to help you set up development environment, build and test.

README.zh.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP
5252
| Version | Release type | Doc |
5353
|------------|--------------|--------------------------------------|
5454
|v0.10.0rc1| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
55-
|v0.9.1rc3| 下一个正式/稳定版 |[快速开始](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [安装指南](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html)了解更多|
56-
|v0.7.3.post1| 最新正式/稳定版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/stable/quick_start.html)[安装指南](https://vllm-ascend.readthedocs.io/en/stable/installation.html)了解更多|
55+
|v0.9.1| 最新正式/稳定版本 |[快速开始](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [安装指南](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html)了解更多|
5756

5857
## 贡献
5958
请参考 [CONTRIBUTING]((https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/index.html)) 文档了解更多关于开发环境搭建、功能测试以及 PR 提交规范的信息。

docs/source/_templates/sections/header.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,5 +54,5 @@
5454
</style>
5555

5656
<div class="notification-bar">
57-
<p>You are viewing the latest developer preview docs. <a href="https://vllm-ascend.readthedocs.io/en/v0.7.3-dev">Click here</a> to view docs for the latest stable release(v0.7.3.post1).</p>
57+
<p>You are viewing the latest developer preview docs. <a href="https://vllm-ascend.readthedocs.io/en/v0.9.1-dev">Click here</a> to view docs for the latest stable release(v0.9.1).</p>
5858
</div>

docs/source/community/versioning_policy.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
2424
|-------------|--------------|------------------|-------------|--------------------|--------------|
2525
| v0.10.0rc1 | v0.10.0 | >= 3.9, < 3.12 | 8.2.RC1 | 2.7.1 / 2.7.1.dev20250724 | |
2626
| v0.9.2rc1 | v0.9.2 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1.dev20250619 | |
27+
| v0.9.1 | v0.9.1 | >= 3.9, < 3.12 | 8.2.RC1 | 2.5.1 / 2.5.1.post1 | |
2728
| v0.9.1rc3 | v0.9.1 | >= 3.9, < 3.12 | 8.2.RC1 | 2.5.1 / 2.5.1.post1 | |
2829
| v0.9.1rc2 | v0.9.1 | >= 3.9, < 3.12 | 8.2.RC1 | 2.5.1 / 2.5.1.post1| |
2930
| v0.9.1rc1 | v0.9.1 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1.dev20250528 | |
@@ -40,6 +41,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
4041

4142
| Date | Event |
4243
|------------|-------------------------------------------|
44+
| 2025.09.03 | v0.9.1 Final release |
4345
| 2025.08.22 | Release candidates, v0.9.1rc3 |
4446
| 2025.08.07 | Release candidates, v0.10.0rc1 |
4547
| 2025.08.04 | Release candidates, v0.9.1rc2 |

docs/source/faqs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Version Specific FAQs
44

55
- [[v0.7.3.post1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1007)
6-
- [[v0.9.1rc3] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2410)
6+
- [[v0.9.1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2643)
77
- [[v0.10.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2217)
88

99
## General FAQs

docs/source/user_guide/release_notes.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,51 @@
11
# Release note
22

3+
## v0.9.1 - 2025.09.03
4+
5+
We are excited to announce the newest official release of vLLM Ascend. This release includes many feature supports, performance improvements and bug fixes. We recommend users to upgrade from 0.7.3 to this version. Please always set `VLLM_USE_V1=1` to use V1 engine.
6+
7+
In this release, we added many enhancements for large scale expert parallel case. It's recommended to follow the [official guide](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/tutorials/large_scale_ep.html).
8+
9+
Please note that this release note will list all the important changes from last official release(v0.7.3)
10+
11+
### Highlights
12+
13+
- DeepSeek V3/R1 is supported with high quality and performance. MTP can work with DeepSeek as well. Please refer to [muliti node tutorials](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/tutorials/multi_node.html) and [Large Scale Expert Parallelism](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/tutorials/large_scale_ep.html).
14+
- Qwen series models work with graph mode now. It works by default with V1 Engine. Please refer to [Qwen tutorials](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/tutorials/index.html).
15+
- Disaggregated Prefilling support for V1 Engine. Please refer to [Large Scale Expert Parallelism](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/tutorials/large_scale_ep.html) tutorials.
16+
- Automatic prefix caching and chunked prefill feature is supported.
17+
- Speculative decoding feature works with Ngram and MTP method.
18+
- MOE and dense w4a8 quantization support now. Please refer to [quantization guide](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/user_guide/feature_guide/quantization.html).
19+
- Sleep Mode feature is supported for V1 engine. Please refer to [Sleep mode tutorials](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/user_guide/feature_guide/sleep_mode.html).
20+
- Dynamic and Static EPLB support is added. This feature is still experimental.
21+
22+
### Note
23+
The following notes are especially for reference when upgrading from last final release (v0.7.3):
24+
25+
- V0 Engine is not supported from this release. Please always set `VLLM_USE_V1=1` to use V1 engine with vLLM Ascend.
26+
- Mindie Turbo is not needed with this release. And the old version of Mindie Turbo is not compatible. Please do not install it. Currently all the function and enhancement is included in vLLM Ascend already. We'll consider to add it back in the future in needed.
27+
- Torch-npu is upgraded to 2.5.1.post1. CANN is upgraded to 8.2.RC1. Don't forget to upgrade them.
28+
29+
### Core
30+
31+
- The Ascend scheduler is added for V1 engine. This scheduler is more affine with Ascend hardware.
32+
- Structured output feature works now on V1 Engine.
33+
- A batch of custom ops are added to improve the performance.
34+
35+
### Changes
36+
37+
- EPLB support for Qwen3-moe model. [#2000](https://github.com/vllm-project/vllm-ascend/pull/2000)
38+
- Fix the bug that MTP doesn't work well with Prefill Decode Disaggregation. [#2610](https://github.com/vllm-project/vllm-ascend/pull/2610) [#2554](https://github.com/vllm-project/vllm-ascend/pull/2554) [#2531](https://github.com/vllm-project/vllm-ascend/pull/2531)
39+
- Fix few bugs to make sure Prefill Decode Disaggregation works well. [#2538](https://github.com/vllm-project/vllm-ascend/pull/2538) [#2509](https://github.com/vllm-project/vllm-ascend/pull/2509) [#2502](https://github.com/vllm-project/vllm-ascend/pull/2502)
40+
- Fix file not found error with shutil.rmtree in torchair mode. [#2506](https://github.com/vllm-project/vllm-ascend/pull/2506)
41+
42+
### Known Issues
43+
- When running MoE model, Aclgraph mode only work with tensor parallel. DP/EP doesn't work in this release.
44+
- Pipeline parallelism is not supported in this release for V1 engine.
45+
- If you use w4a8 quantization with eager mode, please set `VLLM_ASCEND_MLA_PARALLEL=1` to avoid oom error.
46+
- Accuracy test with some tools may not be correct. It doesn't affect the real user case. We'll fix it in the next post release. [#2654](https://github.com/vllm-project/vllm-ascend/pull/2654)
47+
- We notice that there are still some problems when running vLLM Ascend with Prefill Decode Disaggregation. For example, the memory may be leaked and the service may be stuck. It's caused by known issue by vLLM and vLLM Ascend. We'll fix it in the next post release. [#2650](https://github.com/vllm-project/vllm-ascend/pull/2650) [#2604](https://github.com/vllm-project/vllm-ascend/pull/2604) [vLLM#22736](https://github.com/vllm-project/vllm/pull/22736) [vLLM#23554](https://github.com/vllm-project/vllm/pull/23554) [vLLM#23981](https://github.com/vllm-project/vllm/pull/23981)
48+
349
## v0.9.1rc3 - 2025.08.22
450

551
This is the 3rd release candidate of v0.9.1 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/) to get started.
@@ -252,7 +298,7 @@ After careful consideration, above features **will NOT be included in v0.9.1-dev
252298
- Ascend PyTorch adapter (torch_npu) has been upgraded to `2.5.1.post1.dev20250528`. Don’t forget to update it in your environment. [#1235](https://github.com/vllm-project/vllm-ascend/pull/1235)
253299
- Support Atlas 300I series container image. You can get it from [quay.io](https://quay.io/repository/vllm/vllm-ascend)
254300
- Fix token-wise padding mechanism to make multi-card graph mode work. [#1300](https://github.com/vllm-project/vllm-ascend/pull/1300)
255-
- Upgrade vLLM to 0.9.1 [#1165]https://github.com/vllm-project/vllm-ascend/pull/1165
301+
- Upgrade vLLM to 0.9.1 [#1165](https://github.com/vllm-project/vllm-ascend/pull/1165)
256302

257303
### Other Improvements
258304
- Initial support Chunked Prefill for MLA. [#1172](https://github.com/vllm-project/vllm-ascend/pull/1172)

0 commit comments

Comments
 (0)