-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Support multiple image/audio embeddings per requests #29988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support multiple image/audio embeddings per requests #29988
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully enables support for multiple image and audio embeddings per request by updating the MultiModalItemTracker to handle lists of embeddings. The changes in vllm/entrypoints/chat_utils.py are correct and effectively remove the previous limitation. The new unit tests are also well-designed to cover these new capabilities. However, this change to the data structure for embeddings (from a single item to a list) breaks several existing unit tests that were written with the single-embedding assumption. I've identified these failing tests and provided suggestions to update them. Addressing these test failures is critical for merging this PR.
c4e242c to
fa53ab7
Compare
💡 Codex Reviewvllm/vllm/entrypoints/chat_utils.py Lines 725 to 727 in c4e242c
Mapping vllm/vllm/entrypoints/chat_utils.py Lines 730 to 732 in c4e242c
Setting ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
fa53ab7 to
f4e251f
Compare
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, can you update the Multimodal Inputs documentation page accordingly as well?
f4e251f to
0306c96
Compare
|
Documentation preview: https://vllm--29988.org.readthedocs.build/en/29988/ |
Here is the doc ! |
|
#29970 just got merged, can you update your code to use the |
docs/features/multimodal_inputs.md
Outdated
| print(generated_text) | ||
| ``` | ||
|
|
||
| !!! note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is under offline inference so I think it's not related?
3884c6f to
ee06b3c
Compare
This enables the Chat Completions API to leverage the model's existing capability for multiple embeddings - Remove limitation that only allowed one message with image_embeds/audio_embeds - Update MultiModalItemTracker and AsyncMultiModalItemTracker to treat embeddings as lists - Add unit tests for multiple image embeddings support: * test_parse_chat_messages_multiple_image_embeds * test_parse_chat_messages_multiple_image_embeds_with_uuids * test_parse_chat_messages_multiple_image_embeds_async - Embeddings now behave consistently with regular images/audios - Validation via existing validate_num_items() against --limit-mm-per-prompt
ee06b3c to
dbc789a
Compare
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM then
|
I have fixed DCO for you, next time please sign-off your commits. |
This enables the Chat Completions API to leverage the model's existing capability for multiple embeddings, previously only accessible through the direct LLM inference API.
Remove limitation that only allowed one message with image_embeds/audio_embeds
Update MultiModalItemTracker and AsyncMultiModalItemTracker to treat embeddings as lists
Embeddings now behave consistently with regular images/audios
Validation via existing validate_num_items() against --limit-mm-per-prompt
Test Plan
Test Result
passed