[model] fix: stuck issue with mixed text-image data #3670

HollowMan6 · 2025-10-03T21:12:02Z

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

On top of #3496

Possibly resolve #3665

This PR continue addressing the hanging issue as mentioned here: #3315 (comment)

It should be pretty safe and optional to do model.training checks even the model is in inference mode, as the value of inputs_embeds won't get modified anyway.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

On top of volcengine#3496 Possibly resolve volcengine#3665 This PR continue addressing the hanging issue as mentioned here: volcengine#3315 (comment) It should be pretty safe and optional to do `model.training` checks even the model is in inference mode, as the value of `inputs_embeds` won't get modified anyway. Signed-off-by: Hollow Man <[email protected]>

gemini-code-assist

Code Review

This pull request aims to fix a hanging issue with mixed text-image data by ensuring the vision part of the model is always exercised, even for text-only batches. This is achieved by removing the model.training condition, which triggers a dummy vision forward pass during inference as well. While this is a valid strategy to prevent deadlocks in distributed environments, it introduces a performance overhead for non-distributed inference. My review suggests refining this condition to apply the workaround only when in a distributed context, thus preserving performance for single-device inference scenarios.

verl/models/transformers/glm4v.py

verl/models/transformers/qwen2_vl.py

HollowMan6 · 2025-10-03T21:17:16Z

cc: @hiyouga @wlhgtc

HollowMan6 requested review from vermouth1992, PeterSH6, FightingZhen and ji-huazhong as code owners October 3, 2025 21:12

gemini-code-assist bot reviewed Oct 3, 2025

View reviewed changes

verl/models/transformers/glm4v.py Show resolved Hide resolved

verl/models/transformers/qwen2_vl.py Show resolved Hide resolved

HollowMan6 mentioned this pull request Oct 3, 2025

[worker] fix: get all multi_modal_inputs keys with in a microbatch #3315

Merged

7 tasks

This was referenced Oct 3, 2025

[model] fix: refactor qwen2vl patches & support no-image input for fsdp #3496

Merged

Mixed image-text datasets are not supported. #3665

Closed

vermouth1992 approved these changes Oct 3, 2025

View reviewed changes

vermouth1992 merged commit 4e9faaf into volcengine:main Oct 3, 2025
26 of 55 checks passed

HollowMan6 deleted the multi_modal_stuck branch October 4, 2025 00:03

HollowMan6 mentioned this pull request Oct 4, 2025

[model] fix: stuck issue with mixed text-image data hiyouga/EasyR1#518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[model] fix: stuck issue with mixed text-image data #3670

[model] fix: stuck issue with mixed text-image data #3670

HollowMan6 commented Oct 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

HollowMan6 commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

[model] fix: stuck issue with mixed text-image data #3670

[model] fix: stuck issue with mixed text-image data #3670

Conversation

HollowMan6 commented Oct 3, 2025

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

HollowMan6 commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!