add basic support and optimization for qwen2-vl #12104

MeouSker77 · 2024-09-20T09:19:13Z

Description

add basic support and optimization for qwen2-vl

1. Why the change?

2. User API changes

this model requires transformers 4.45

pip install transformers==4.45
pip install qwen_vl_utils

A smiple example (this):

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

import time
import torch
from ipex_llm import optimize_model

model_path = r"Qwen/Qwen2-VL-7B-Instruct"

model = Qwen2VLForConditionalGeneration.from_pretrained(model_path)

model = optimize_model(model, modules_to_not_convert=["visual"])    # default to 'sym_int4'
model = model.float().eval()    # use .float() for better output, use .half() for better speed
# print(model)

model = model.to('xpu')

# default processer
# processor = AutoProcessor.from_pretrained(model_path)

# The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path, min_pixels=min_pixels, max_pixels=max_pixels)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to('xpu')

with torch.inference_mode():
    for i in range(3):
        st = time.time()
        # Inference: Generation of the output
        generated_ids = model.generate(**inputs, max_new_tokens=32)
        et = time.time()
        print(et - st)
        generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
        ]
        output_text = processor.batch_decode(
            generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
        )
        print(output_text)

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

rnwang04

LGTM

add basic support and optimization for qwen2-vl

4e8ef3e

MeouSker77 requested a review from rnwang04 September 20, 2024 09:20

rnwang04 approved these changes Sep 20, 2024

View reviewed changes

MeouSker77 merged commit 9239fd4 into intel:main Sep 20, 2024
1 check passed

MeouSker77 deleted the add-qwen2-optimization branch September 20, 2024 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add basic support and optimization for qwen2-vl #12104

add basic support and optimization for qwen2-vl #12104

Uh oh!

MeouSker77 commented Sep 20, 2024 •

edited

Loading

Uh oh!

rnwang04 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add basic support and optimization for qwen2-vl #12104

add basic support and optimization for qwen2-vl #12104

Uh oh!

Conversation

MeouSker77 commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

Uh oh!

rnwang04 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MeouSker77 commented Sep 20, 2024 •

edited

Loading