Skip to content

Commit 6dcb07f

Browse files
support qwen3-vl handle requests with embeddings (#30037)
Signed-off-by: taoyun <[email protected]> Signed-off-by: Cyrus Leung <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
1 parent 46cbbca commit 6dcb07f

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

docs/features/multimodal_inputs.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,8 @@ For Qwen2-VL and MiniCPM-V, we accept additional parameters alongside the embedd
443443
print(generated_text)
444444
```
445445

446+
For Qwen3-VL, the `image_embeds` should contain both the base image embedding and deepstack features.
447+
446448
#### Audio Embeddings
447449

448450
You can pass pre-computed audio embeddings similar to image embeddings:

vllm/model_executor/models/qwen3_vl.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@
103103
Qwen2_5_VLVideoInputs,
104104
Qwen2_5_VLVideoPixelInputs,
105105
)
106-
from .qwen2_vl import Qwen2VLProcessingInfo
106+
from .qwen2_vl import Qwen2VLMultiModalDataParser, Qwen2VLProcessingInfo
107107
from .qwen3 import Qwen3ForCausalLM, Qwen3Model
108108
from .utils import (
109109
AutoWeightsLoader,
@@ -884,7 +884,10 @@ def _get_dummy_videos(
884884

885885
class Qwen3VLMultiModalProcessor(BaseMultiModalProcessor[Qwen3VLProcessingInfo]):
886886
def _get_data_parser(self) -> MultiModalDataParser:
887-
return MultiModalDataParser(video_needs_metadata=True)
887+
return Qwen2VLMultiModalDataParser(
888+
self.info.get_hf_config().vision_config.spatial_merge_size,
889+
video_needs_metadata=True,
890+
)
888891

889892
def _call_hf_processor(
890893
self,

0 commit comments

Comments
 (0)