Support multiple image/audio embeddings per requests #29988

jeremyteboul · 2025-12-03T18:46:12Z

This enables the Chat Completions API to leverage the model's existing capability for multiple embeddings, previously only accessible through the direct LLM inference API.

Remove limitation that only allowed one message with image_embeds/audio_embeds
Update MultiModalItemTracker and AsyncMultiModalItemTracker to treat embeddings as lists
Embeddings now behave consistently with regular images/audios
Validation via existing validate_num_items() against --limit-mm-per-prompt

Test Plan

Add unit tests for multiple image embeddings support:
- test_parse_chat_messages_multiple_image_embeds
- test_parse_chat_messages_multiple_image_embeds_with_uuids
- test_parse_chat_messages_multiple_image_embeds_async

Test Result

passed

gemini-code-assist

Code Review

This pull request successfully enables support for multiple image and audio embeddings per request by updating the MultiModalItemTracker to handle lists of embeddings. The changes in vllm/entrypoints/chat_utils.py are correct and effectively remove the previous limitation. The new unit tests are also well-designed to cover these new capabilities. However, this change to the data structure for embeddings (from a single item to a list) breaks several existing unit tests that were written with the single-embedding assumption. I've identified these failing tests and provided suggestions to update them. Addressing these test failures is critical for merging this PR.

chatgpt-codex-connector · 2025-12-03T18:51:13Z

💡 Codex Review

vllm/vllm/entrypoints/chat_utils.py

Lines 725 to 727 in c4e242c

    
           if "image_embeds" in items_by_modality: 
        
               image_embeds_lst = items_by_modality["image_embeds"] 
        
               mm_inputs["image"] = image_embeds_lst

Keep single image_embeds output shape backward compatible

Mapping image_embeds directly to mm_inputs["image"] now returns a list even when only one embedding is provided. Existing callers/tests (e.g., test_parse_chat_messages_empty_image_embeds_with_uuid, lines 829–858) relied on mm_data["image"] being a lone tensor/None for a single embed; after this change they receive [tensor] or [None], breaking those assertions and changing the public return contract despite the commit claiming backward compatibility.

vllm/vllm/entrypoints/chat_utils.py

Lines 730 to 732 in c4e242c

    
           if "audio_embeds" in items_by_modality: 
        
               audio_embeds_lst = items_by_modality["audio_embeds"] 
        
               mm_inputs["audio"] = audio_embeds_lst

Audio embeddings wrapped in lists stop being parsed as embeddings

Setting mm_inputs["audio"] = audio_embeds_lst means a single audio embedding now surfaces as a list. Downstream parsing treats embeddings only when it receives a tensor or a list of 2D tensors (MultiModalDataParser.is_embeddings, vllm/multimodal/parse.py:383-390); a list containing the previous 3D tensor (see test_parse_chat_messages_audio_embeds_with_string, lines 893–939) is no longer recognized as an embedding and is processed as raw audio instead, breaking existing tests and single-embed requests.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

DarkLight1337

Thanks, can you update the Multimodal Inputs documentation page accordingly as well?

mergify · 2025-12-04T05:27:26Z

Documentation preview: https://vllm--29988.org.readthedocs.build/en/29988/

jeremyteboul · 2025-12-04T05:34:29Z

Thanks, can you update the Multimodal Inputs documentation page accordingly as well?

Here is the doc !

DarkLight1337 · 2025-12-04T06:48:26Z

#29970 just got merged, can you update your code to use the tensor2base64 convenience function?

DarkLight1337 · 2025-12-04T06:48:52Z

docs/features/multimodal_inputs.md

        print(generated_text)
    ```

+!!! note


This section is under offline inference so I think it's not related?

docs/features/multimodal_inputs.md

This enables the Chat Completions API to leverage the model's existing capability for multiple embeddings - Remove limitation that only allowed one message with image_embeds/audio_embeds - Update MultiModalItemTracker and AsyncMultiModalItemTracker to treat embeddings as lists - Add unit tests for multiple image embeddings support: * test_parse_chat_messages_multiple_image_embeds * test_parse_chat_messages_multiple_image_embeds_with_uuids * test_parse_chat_messages_multiple_image_embeds_async - Embeddings now behave consistently with regular images/audios - Validation via existing validate_num_items() against --limit-mm-per-prompt

DarkLight1337

Thanks, LGTM then

DarkLight1337 · 2025-12-05T05:47:12Z

I have fixed DCO for you, next time please sign-off your commits.

jeremyteboul requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang and robertgshaw2-redhat as code owners December 3, 2025 18:46

mergify bot added the frontend label Dec 3, 2025

gemini-code-assist bot reviewed Dec 3, 2025

View reviewed changes

jeremyteboul force-pushed the multi_image_enbeddings branch from c4e242c to fa53ab7 Compare December 3, 2025 18:49

jeremyteboul force-pushed the multi_image_enbeddings branch from fa53ab7 to f4e251f Compare December 3, 2025 22:09

DarkLight1337 reviewed Dec 4, 2025

View reviewed changes

jeremyteboul force-pushed the multi_image_enbeddings branch from f4e251f to 0306c96 Compare December 4, 2025 05:26

mergify bot added the documentation Improvements or additions to documentation label Dec 4, 2025

DarkLight1337 reviewed Dec 4, 2025

View reviewed changes

docs/features/multimodal_inputs.md Show resolved Hide resolved

jeremyteboul force-pushed the multi_image_enbeddings branch from 3884c6f to ee06b3c Compare December 4, 2025 18:15

jeremyteboul force-pushed the multi_image_enbeddings branch from ee06b3c to dbc789a Compare December 4, 2025 19:30

DarkLight1337 approved these changes Dec 5, 2025

View reviewed changes

Merge branch 'main' into multi_image_enbeddings

764d6dd

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 5, 2025

DarkLight1337 enabled auto-merge (squash) December 5, 2025 05:47

Merge branch 'main' into multi_image_enbeddings

e28f687

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support multiple image/audio embeddings per requests #29988

Support multiple image/audio embeddings per requests #29988

jeremyteboul commented Dec 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chatgpt-codex-connector bot commented Dec 3, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

mergify bot commented Dec 4, 2025

Uh oh!

jeremyteboul commented Dec 4, 2025

Uh oh!

DarkLight1337 commented Dec 4, 2025

Uh oh!

DarkLight1337 Dec 4, 2025

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

DarkLight1337 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Support multiple image/audio embeddings per requests #29988

Are you sure you want to change the base?

Support multiple image/audio embeddings per requests #29988

Conversation

jeremyteboul commented Dec 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector bot commented Dec 3, 2025

💡 Codex Review

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 4, 2025

Uh oh!

jeremyteboul commented Dec 4, 2025

Uh oh!

DarkLight1337 commented Dec 4, 2025

Uh oh!

DarkLight1337 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeremyteboul commented Dec 3, 2025 •

edited by github-actions bot

Loading