Mixed image-text datasets are not supported.

### System Info

`
if "multi_modal_data" in non_tensor_batch:
    vllm_inputs = []
    for raw_prompt_ids, multi_modal_data in zip(
        non_tensor_batch.pop("raw_prompt_ids"), non_tensor_batch.pop("multi_modal_data"), strict=True
    ):
        vllm_inputs.append({"prompt_token_ids": raw_prompt_ids, "multi_modal_data": multi_modal_data})
        #if multi_modal_data == {}:
        #    vllm_inputs.append({"prompt_token_ids": raw_prompt_ids})
        #else:
        #    vllm_inputs.append({"prompt_token_ids": raw_prompt_ids, "multi_modal_data": multi_modal_data})
else:
    vllm_inputs = [
        {"prompt_token_ids": raw_prompt_ids} for raw_prompt_ids in non_tensor_batch.pop("raw_prompt_ids")
    ]
`

In verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py, If `vllm_inputs` is a mix of {"prompt_token_ids": raw_prompt_ids} and {"prompt_token_ids": raw_prompt_ids, "multi_modal_data": multi_modal_data}, and program will stuck

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

if "multi_modal_data" in non_tensor_batch:
    vllm_inputs = []
    for raw_prompt_ids, multi_modal_data in zip(
        non_tensor_batch.pop("raw_prompt_ids"), non_tensor_batch.pop("multi_modal_data"), strict=True
    ):
        if multi_modal_data == {}:
            vllm_inputs.append({"prompt_token_ids": raw_prompt_ids})
        else:
            vllm_inputs.append({"prompt_token_ids": raw_prompt_ids, "multi_modal_data": multi_modal_data})
else:
    vllm_inputs = [
        {"prompt_token_ids": raw_prompt_ids} for raw_prompt_ids in non_tensor_batch.pop("raw_prompt_ids")
    ]

### Expected behavior

null

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mixed image-text datasets are not supported. #3665

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mixed image-text datasets are not supported. #3665

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions