Skip to content

[Core] More fixes to MultiModalEmbeddings type handling #19715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 18, 2025

Conversation

russellb
Copy link
Member

@russellb russellb commented Jun 16, 2025

FIX #19736

This is a follow up to PR #19446.

In that PR, get_multimodal_embeddings() was changed to return
MultiModalEmbeddings instead of Optional[MultiModalEmbeddings]
because code in the model runner was requiring that the result was not
None.

Several models needed tweaks to account for this. Many were missed
because they were not tested in CI. This should fix the rest of the
common changes needed that weren't caught by CI.

Signed-off-by: Russell Bryant [email protected]

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the llama Related to Llama models label Jun 16, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @russellb, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a follow-up to a previous change that modified get_multimodal_embeddings() to always return a MultiModalEmbeddings object, rather than an optional one. My changes update numerous model implementations to align with this new behavior, specifically by refining how the presence of multimodal embeddings is checked, ensuring robustness and consistency across the codebase.

Highlights

  • Multimodal Embeddings Type Handling: This PR completes the necessary adjustments for MultiModalEmbeddings type handling across various model implementations. It addresses models that were not updated in a previous PR (#19446) due to insufficient CI coverage.
  • Code Consistency and Readability: The changes standardize conditional checks for multimodal_embeddings by replacing explicit is not None and is None comparisons with more Pythonic truthiness checks (if multimodal_embeddings: and if not multimodal_embeddings:). This improves code readability and consistency.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR consistently updates the checks for multimodal_embeddings from is not None / is None to direct boolean evaluation (if multimodal_embeddings / if not multimodal_embeddings). This change is Pythonic and correctly handles the case where an empty collection (like an empty list []) signifies the absence of embeddings, treating it as falsy (or truthy for if not). This aligns with the described change in PR #19446 where get_multimodal_embeddings() no longer returns Optional.

A key point for consideration across all modified files is the type hint for the multimodal_embeddings parameter in the get_input_embeddings (and similar) methods. It's currently Optional[MultiModalEmbeddings] = None (or Optional[NestedTensors]). If this parameter is now guaranteed to be a MultiModalEmbeddings (or NestedTensors) instance and never None (with empty lists/tuples representing no embeddings), the type hint should be updated to remove Optional (and the default None potentially removed or changed) to accurately reflect this contract. This would enhance code clarity and maintainability. I've added specific comments detailing this, which applies broadly to all affected files.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) June 17, 2025 03:36
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 17, 2025
@DarkLight1337
Copy link
Member

PTAL at the failing engine test

@@ -641,7 +641,7 @@ def get_input_embeddings(
multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
) -> torch.Tensor:
inputs_embeds = self.language_model.get_input_embeddings(input_ids)
if multimodal_embeddings is not None:
if multimodal_embeddings:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this going to potentially cause a
RuntimeError: Boolean value of Tensor with more than one value is ambiguous error?

Since this type is defined as

MultiModalEmbeddings = Union[list[Tensor], Tensor, tuple[Tensor, ...]]

the if clause could potentially test a torch.Tensor directly.

For example, in blip2, get_multimodal_embeddings() calls _process_image_input which in turn calls language_projection.forward which will return a tensor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, one of the tests if failing for this reason:

FAILED engine/test_short_mm_context.py::test_context_length_too_short[llava-hf/llava-1.5-7b-hf] - RuntimeError: Boolean value of Tensor with more than one value is ambiguous

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(multimodal_embeddings) != 0: should work for lists, tuples and tensors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm updating the checks to use len()

@maxdebayser
Copy link
Contributor

The distributed-tests-2-gpus failure is fixed in this pr: #19734

Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -845,7 +846,7 @@ def get_input_embeddings_v0(
multimodal_embeddings: Optional[NestedTensors] = None,
) -> torch.Tensor:
inputs_embeds = self.language_model.get_input_embeddings(input_ids)
if multimodal_embeddings is None:
if len(multimodal_embeddings) == 0:
Copy link

@govind-ramnarayan govind-ramnarayan Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about this line in the case when we use the default argument - shouldn't this be like the other lines? (Edited; I had a typo where I copied the "is not None" condition when it should check "is None")

if multimodal_embeddings is None or len(multimodal_embeddings) == 0:

If it should be, maybe it makes sense to make a single utility function to make sure they're all the same (check None and len == 0 in the function), or change the argument to NestedTensors type here if we do not ever want to pass None here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - i'll fix these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. i fixed all the spots that came up with git grep 'if len(multimodal_embeddings'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -845,7 +846,7 @@ def get_input_embeddings_v0(
multimodal_embeddings: Optional[NestedTensors] = None,
) -> torch.Tensor:
inputs_embeds = self.language_model.get_input_embeddings(input_ids)
if multimodal_embeddings is None:
if len(multimodal_embeddings) == 0:
Copy link
Collaborator

@aarnphm aarnphm Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(multimodal_embeddings) == 0:
if not multimodal_embeddings:

cc @russellb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarnphm , if the multimodal_embeddings is a torch.Tensor you get RuntimeError: Boolean value of Tensor with more than one value is ambiguous if you try to get the truth value directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I used before, but doesn't work if it's a tensor

cc @maxdebayser who pointed this out and submitted the change to use len instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

@mergify mergify bot added the qwen Related to Qwen models label Jun 18, 2025
russellb and others added 4 commits June 18, 2025 13:51
This is a follow up to PR vllm-project#19446.

In that PR, get_multimodal_embeddings() was changed to return
`MultiModalEmbeddings` instead of `Optional[MultiModalEmbeddings]`
because code in the model runner was requiring that the result was not
`None`.

Several models needed tweaks to account for this. Many were missed
because they were not tested in CI. This should fix the rest of the
common changes needed that weren't caught by CI.

Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
@DarkLight1337 DarkLight1337 merged commit 14fdd21 into vllm-project:main Jun 18, 2025
70 checks passed
yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jun 22, 2025
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Jun 24, 2025
gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Jun 26, 2025
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025
wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025
wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025
wwl2755-google pushed a commit to wwl2755-google/vllm that referenced this pull request Jul 1, 2025
avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llama Related to Llama models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI Failure]: Samplers Test - samplers/test_beam_search.py::test_beam_search_passes_multimodal_data
6 participants