Skip to content

Conversation

@jeremyteboul
Copy link
Contributor

@jeremyteboul jeremyteboul commented Dec 3, 2025

This enables the Chat Completions API to leverage the model's existing capability for multiple embeddings, previously only accessible through the direct LLM inference API.

  • Remove limitation that only allowed one message with image_embeds/audio_embeds

  • Update MultiModalItemTracker and AsyncMultiModalItemTracker to treat embeddings as lists

  • Embeddings now behave consistently with regular images/audios

  • Validation via existing validate_num_items() against --limit-mm-per-prompt

Test Plan

  • Add unit tests for multiple image embeddings support:
    • test_parse_chat_messages_multiple_image_embeds
    • test_parse_chat_messages_multiple_image_embeds_with_uuids
    • test_parse_chat_messages_multiple_image_embeds_async

Test Result

passed

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully enables support for multiple image and audio embeddings per request by updating the MultiModalItemTracker to handle lists of embeddings. The changes in vllm/entrypoints/chat_utils.py are correct and effectively remove the previous limitation. The new unit tests are also well-designed to cover these new capabilities. However, this change to the data structure for embeddings (from a single item to a list) breaks several existing unit tests that were written with the single-embedding assumption. I've identified these failing tests and provided suggestions to update them. Addressing these test failures is critical for merging this PR.

@jeremyteboul jeremyteboul force-pushed the multi_image_enbeddings branch from c4e242c to fa53ab7 Compare December 3, 2025 18:49
@chatgpt-codex-connector
Copy link

💡 Codex Review

if "image_embeds" in items_by_modality:
image_embeds_lst = items_by_modality["image_embeds"]
mm_inputs["image"] = image_embeds_lst

P1 Badge Keep single image_embeds output shape backward compatible

Mapping image_embeds directly to mm_inputs["image"] now returns a list even when only one embedding is provided. Existing callers/tests (e.g., test_parse_chat_messages_empty_image_embeds_with_uuid, lines 829–858) relied on mm_data["image"] being a lone tensor/None for a single embed; after this change they receive [tensor] or [None], breaking those assertions and changing the public return contract despite the commit claiming backward compatibility.


if "audio_embeds" in items_by_modality:
audio_embeds_lst = items_by_modality["audio_embeds"]
mm_inputs["audio"] = audio_embeds_lst

P1 Badge Audio embeddings wrapped in lists stop being parsed as embeddings

Setting mm_inputs["audio"] = audio_embeds_lst means a single audio embedding now surfaces as a list. Downstream parsing treats embeddings only when it receives a tensor or a list of 2D tensors (MultiModalDataParser.is_embeddings, vllm/multimodal/parse.py:383-390); a list containing the previous 3D tensor (see test_parse_chat_messages_audio_embeds_with_string, lines 893–939) is no longer recognized as an embedding and is processed as raw audio instead, breaking existing tests and single-embed requests.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@jeremyteboul jeremyteboul force-pushed the multi_image_enbeddings branch from fa53ab7 to f4e251f Compare December 3, 2025 22:09
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you update the Multimodal Inputs documentation page accordingly as well?

@jeremyteboul jeremyteboul force-pushed the multi_image_enbeddings branch from f4e251f to 0306c96 Compare December 4, 2025 05:26
@mergify
Copy link

mergify bot commented Dec 4, 2025

Documentation preview: https://vllm--29988.org.readthedocs.build/en/29988/

@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 4, 2025
@jeremyteboul
Copy link
Contributor Author

Thanks, can you update the Multimodal Inputs documentation page accordingly as well?

Here is the doc !

@DarkLight1337
Copy link
Member

#29970 just got merged, can you update your code to use the tensor2base64 convenience function?

print(generated_text)
```

!!! note
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is under offline inference so I think it's not related?

@jeremyteboul jeremyteboul force-pushed the multi_image_enbeddings branch from 3884c6f to ee06b3c Compare December 4, 2025 18:15
This enables the Chat Completions API to leverage the model's existing
capability for multiple embeddings

- Remove limitation that only allowed one message with image_embeds/audio_embeds
- Update MultiModalItemTracker and AsyncMultiModalItemTracker to treat embeddings as lists
- Add unit tests for multiple image embeddings support:
  * test_parse_chat_messages_multiple_image_embeds
  * test_parse_chat_messages_multiple_image_embeds_with_uuids
  * test_parse_chat_messages_multiple_image_embeds_async
- Embeddings now behave consistently with regular images/audios
- Validation via existing validate_num_items() against --limit-mm-per-prompt
@jeremyteboul jeremyteboul force-pushed the multi_image_enbeddings branch from ee06b3c to dbc789a Compare December 4, 2025 19:30
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM then

@DarkLight1337
Copy link
Member

I have fixed DCO for you, next time please sign-off your commits.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 5, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 5, 2025 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants