Error occurs when single and multi-image inputs are included in the same batch

When passing both single-image and multi-image inputs in the same batch, the following error occurs:

```
RuntimeError: Tensors must have same number of dimensions: got 2 and 1
```

Is there any solution?

**Reproduction code**


```
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaOnevisionForConditionalGeneration

# Load the model in half-precision
model = LlavaOnevisionForConditionalGeneration.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf", torch_dtype=torch.float16, device_map="auto")
processor = AutoProcessor.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf")

img_path = "{your_image}"
messages = [
    [
        {
            "role": "user",
            "content": [ # QA 1
                {"type": "image_url", "image_url": img_path}},
                {"type": "text", "text": "Text."},
            ]
        }
    ],
    [
        {
            "role": "user",
            "content": [ # QA 2
                {"type": "image_url", "image_url": {"url": img_path}},
                {"type": "image_url", "image_url": {"url": img_path}},
                {"type": "text", "text": "Text."},
            ]
        }
    ],
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    padding=True,
    return_tensors="pt"
).to(model.device, torch.float16)

# Generate
generate_ids = model.generate(**inputs, max_new_tokens=30)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error occurs when single and multi-image inputs are included in the same batch #471

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error occurs when single and multi-image inputs are included in the same batch #471

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions