Skip to content

Commit 598628d

Browse files
Semmer2z00471799
authored andcommitted
[Doc] Add tutorial doc for Qwen2.5-Omni
Signed-off-by: Ting FU <[email protected]>
1 parent 755b635 commit 598628d

File tree

2 files changed

+180
-0
lines changed

2 files changed

+180
-0
lines changed
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# Qwen2.5-Omni-7B
2+
3+
## Introduction
4+
5+
Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.
6+
7+
This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation.
8+
9+
## Supported Features
10+
11+
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
12+
13+
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
14+
15+
## Environment Preparation
16+
17+
### Model Weight
18+
19+
- `Qwen2.5-Omni-3B`(BF16): [Download model weight](https://huggingface.co/Qwen/Qwen2.5-Omni-3B)
20+
- `Qwen2.5-Omni-7B`(BF16): [Download model weight](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)
21+
22+
Following examples use the 7B version deafultly.
23+
24+
### Installation
25+
26+
You can using our official docker image, v0.11.0 and later version of vllm-ascend supports Qwen2.5-Omni.
27+
28+
:::{note}
29+
Only AArch64 architecture are supported currently due to extra operator's installation limitations.
30+
:::
31+
32+
:::::{tab-set}
33+
:sync-group: install
34+
35+
::::{tab-item} A3&A2 series
36+
:sync: A3&A2
37+
38+
Start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
39+
40+
::::
41+
:::::
42+
43+
In addition, if you don't want to use the docker image as above, you can also build all from source:
44+
45+
- Install `vllm-ascend` from source, refer to [installation](../installation.md).
46+
47+
If you want to deploy multi-node environment, you need to set up environment on each node.
48+
49+
## Deployment
50+
51+
### Single-node Deployment
52+
53+
#### Single NPU (Qwen2.5-Omni-7B)
54+
55+
```bash
56+
export VLLM_USE_MODELSCOPE=true
57+
export MODEL_PATH=vllm-ascend/Qwen2.5-Omni-7B
58+
export LOCAL_MEDIA_PATH=/local_path/to_media/
59+
60+
vllm serve ${MODEL_PATH}\
61+
--host 0.0.0.0 \
62+
--port 8000 \
63+
--served-model-name Qwen-Omni \
64+
--allowed-local-media-path ${LOCAL_MEDIA_PATH} \
65+
--trust-remote-code \
66+
--compilation-config {"full_cuda_graph": 1} \
67+
--no-enable-prefix-caching
68+
```
69+
70+
:::{note}
71+
Now vllm-ascend docker image should contain vllm[audio] build part, if you encounter *audio not supported issue* by any chance, please re-build vllm with [audio] flag.
72+
73+
```bash
74+
VLLM_TARGET_DEVICE=empty pip install -v ".[audio]"
75+
```
76+
:::
77+
78+
`--allowed-local-media-path` is optional, only set it if you need infer model with local media file
79+
80+
`--gpu-memory-utilization` should not be set manually only if yous know what this parameter aims to.
81+
82+
#### Multiple NPU (Qwen2.5-Omni-7B)
83+
84+
```bash
85+
export VLLM_USE_MODELSCOPE=true
86+
export MODEL_PATH=vllm-ascend/Qwen2.5-Omni-7B
87+
export LOCAL_MEDIA_PATH=/local_path/to_media/
88+
export DP_SIZE=8
89+
90+
vllm serve ${MODEL_PATH}\
91+
--host 0.0.0.0 \
92+
--port 8000 \
93+
--served-model-name Qwen-Omni \
94+
--allowed-local-media-path ${LOCAL_MEDIA_PATH} \
95+
--trust-remote-code \
96+
--compilation-config {"full_cuda_graph": 1} \
97+
--data-parallel-size ${DP_SIZE} \
98+
--no-enable-prefix-caching
99+
```
100+
101+
`--tensor_parallel_size` no need to set for this 7B model, but if you really need tensor parallel, tp size can be one of `1\2\4`
102+
103+
### Prefill-Decode Disaggregation
104+
105+
Not supported yet
106+
107+
## Functional Verification
108+
109+
If your service start successfully, you can see the info shown below:
110+
111+
```bash
112+
INFO: Started server process [2736]
113+
INFO: Waiting for application startup.
114+
INFO: Application startup complete.
115+
```
116+
117+
Once your server is started, you can query the model with input prompts:
118+
119+
```bash
120+
curl http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer EMPTY" -d '{
121+
"model": "Qwen-Omni",
122+
"messages": [
123+
{
124+
"role": "user",
125+
"content": [
126+
{
127+
"type": "text",
128+
"text": "What is the text in the illustrate?"
129+
},
130+
{
131+
"type": "image_url",
132+
"image_url": {
133+
"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"
134+
}
135+
}
136+
]
137+
}
138+
],
139+
"max_tokens": 100,
140+
"temperature": 0.7
141+
}'
142+
143+
```
144+
145+
If you query the server successfully, you can see the info shown below (client):
146+
147+
```bash
148+
{"id":"chatcmpl-a70a719c12f7445c8204390a8d0d8c97","object":"chat.completion","created":1764056861,"model":"Qwen-Omni","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is \"TONGYI Qwen\".","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":73,"total_tokens":88,"completion_tokens":15,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
149+
```
150+
151+
## Accuracy Evaluation
152+
153+
Qwen2.5-Omni on vllm-ascend has been test on AISBench.
154+
155+
### Using AISBench
156+
157+
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
158+
159+
2. After execution, you can get the result, here is the result of `Qwen2.5-Omni-7B` with `vllm-ascend:0.11.0rc0` for reference only.
160+
161+
| dataset | platform | metric | mode | vllm-api-stream-chat |
162+
|----- | ----- | ----- | ----- | -----|
163+
| textVQA | A2 | accuracy | gen_base64 | 83.47 |
164+
| textVQA | A3 | accuracy | gen_base64 | 84.04 |
165+
166+
## Performance Evaluation
167+
168+
### Using AISBench
169+
170+
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
171+
172+
Run performance evaluation of `Qwen2.5-Omni-7B` with `vllm-ascend:0.11.0rc0` as an example.
173+
174+
Here is the result of AISBench performance result, FYI:
175+
176+
| dataset | platform | BatchSize | Output len | TTFT(AVG) | TPOT(AVG) | Output Token Throughput |
177+
| ---- | ---- | ---- | ---- | ---- | ---- | ---- |
178+
| 1080P Image input | A2 | 128 | 256 | 13583ms | 216ms | 474 token/s |
179+
| 1080P Image input | A3 | 256 | 256 | 18611ms | 241ms | 794 token/s |

docs/source/tutorials/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,5 @@ multi_node_qwen3vl
2323
multi_node_pd_disaggregation_llmdatadist
2424
multi_node_pd_disaggregation_mooncake
2525
multi_node_ray
26+
Qwen2.5-Omni.md
2627
:::

0 commit comments

Comments
 (0)