-
Notifications
You must be signed in to change notification settings - Fork 396
Open
Description
When passing both single-image and multi-image inputs in the same batch, the following error occurs:
RuntimeError: Tensors must have same number of dimensions: got 2 and 1
Is there any solution?
Reproduction code
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaOnevisionForConditionalGeneration
# Load the model in half-precision
model = LlavaOnevisionForConditionalGeneration.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf", torch_dtype=torch.float16, device_map="auto")
processor = AutoProcessor.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf")
img_path = "{your_image}"
messages = [
[
{
"role": "user",
"content": [ # QA 1
{"type": "image_url", "image_url": img_path}},
{"type": "text", "text": "Text."},
]
}
],
[
{
"role": "user",
"content": [ # QA 2
{"type": "image_url", "image_url": {"url": img_path}},
{"type": "image_url", "image_url": {"url": img_path}},
{"type": "text", "text": "Text."},
]
}
],
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
padding=True,
return_tensors="pt"
).to(model.device, torch.float16)
# Generate
generate_ids = model.generate(**inputs, max_new_tokens=30)
Metadata
Metadata
Assignees
Labels
No labels