Skip to content

Conversation

m-Just
Copy link

@m-Just m-Just commented Sep 30, 2025

What does this PR do?

Fix qwen2_vl position_id shape mismatch: verl/models/transformers/qwen2_vl.py:process_position_ids expects position_ids to have a shape of (4, batch_size, seq_length) but verl/experimental/agent_loop/agent_loop.py:generate_sequences returns (batch_size, 3, seq_length) (which will be transposed to (3, batch_size, seq_length)), ignoring the text dimension. This PR follows the relevant code in verl/utils/dataset/rl_dataset.py to fix the issue.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a shape mismatch for position_ids in qwen2_vl. While it correctly identifies that an additional dimension for text position IDs is needed, the resulting tensor shape is still incorrect for the model's expectations, which would lead to a runtime error. I've provided a critical comment with a suggested fix that corrects the tensor dimensions. This fix also requires a small change in the _postprocess method, which I've detailed in the comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant