Enable QWen VL video preprocess #2514

xipingyan · 2025-07-30T13:17:45Z

tickets: CVS-173219
1: Enable video preprocessing for Qwen VL model.
Add: ov::Property<std::vectorov::Tensor> video{"video"};

2: The main updates:
For video: For 2-in-1 merging, if 9 images are input, only 5 images are actually processed.
For image: For 2-in-1 merging, we only double each image, so if we input 9 images, we only actually process 9 images.
Introduce "If" node, merge video and image preprocess into one OV subgroup.

rkazants

please share CVS number in JIRA to access from arch perspective.

Thanks.

src/cpp/src/visual_language/llava/classes.cpp

Signed-off-by: xipingya <[email protected]>

Only calc once for video process. Signed-off-by: xipingya <[email protected]>

Signed-off-by: xipingya <[email protected]>

2: add ov::Properity::video Signed-off-by: xipingya <[email protected]>

Co-authored-by: Wanglei Shen <[email protected]>

Copilot

Pull Request Overview

Enables video processing for QWen VL models by adding video input support throughout the VLM pipeline. The main change allows QWen VL models to handle video input through temporal patch processing, which groups video frames and merges them into combined patches for more efficient processing.

Adds video parameter support to all VLM pipeline interfaces and binding functions
Implements video encoding functionality specifically for QWen2VL models
Updates the generation config to include is_video flag for video-specific processing

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/python/py_vlm_pipeline.cpp	Adds video parameter to Python VLM pipeline bindings
src/python/py_continuous_batching_pipeline.cpp	Updates continuous batching pipeline with video support
src/python/openvino_genai/py_openvino_genai.pyi	Adds is_video property and video parameter to type stubs
src/cpp/src/visual_language/vision_encoder.hpp	Adds virtual encode_video method to base VisionEncoder
src/cpp/src/visual_language/qwen2vl/classes.hpp	Declares video encoding implementation for QWen2VL
src/cpp/src/visual_language/qwen2vl/classes.cpp	Implements video preprocessing and encoding logic
src/cpp/src/visual_language/pipeline_base.hpp	Updates base pipeline interface to include video parameter
src/cpp/src/visual_language/pipeline.cpp	Updates main pipeline implementation with video support
src/cpp/src/visual_language/llava_next/classes.hpp	Updates LLaVANext interface with video parameter
src/cpp/src/visual_language/llava_next/classes.cpp	Adds video warning for unsupported models
src/cpp/src/visual_language/llava/classes.hpp	Updates LLaVA interface with video parameter
src/cpp/src/visual_language/llava/classes.cpp	Adds video warning for unsupported models
src/cpp/src/visual_language/inputs_embedder.hpp	Updates inputs embedder interface for video support
src/cpp/src/visual_language/inputs_embedder.cpp	Implements video encoding routing logic
src/cpp/src/visual_language/continuous_batching_adapter.hpp	Updates adapter interface with video parameter
src/cpp/src/visual_language/clip.cpp	Optimizes bicubic resize with early exit for same-size images
src/cpp/src/continuous_batching/pipeline_impl.cpp	Updates implementation to handle video parameters
src/cpp/src/continuous_batching/pipeline_base.hpp	Updates base interface with video support
src/cpp/src/continuous_batching/pipeline_base.cpp	Implements video parameter handling in pipeline
src/cpp/src/continuous_batching/pipeline.cpp	Updates main pipeline with video parameter support
src/cpp/include/openvino/genai/visual_language/pipeline.hpp	Adds video property and parameter to public interface
src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp	Updates public interface with video support

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/cpp/src/visual_language/pipeline.cpp

src/cpp/src/visual_language/qwen2vl/classes.cpp

# Conflicts: # src/cpp/src/continuous_batching/pipeline_base.cpp # src/cpp/src/visual_language/inputs_embedder.cpp # src/cpp/src/visual_language/inputs_embedder.hpp # src/cpp/src/visual_language/qwen2vl/classes.cpp # src/cpp/src/visual_language/qwen2vl/classes.hpp

Signed-off-by: xipingya <[email protected]>

…ideo_preprocess

Signed-off-by: xipingya <[email protected]>

get_inputs_embeds(const std::string& prompt, const std::vector<ov::Tensor>& images ... get_inputs_embeds_with_token_type_ids(const std::string& prompt, const std::vector<ov::Tensor>& images, ... Because 1: they never been called for current codes. 2: Getting embeds feature, we usually need to apply a chat template. I think only keeping below interface is enough. get_inputs_embeds(const std::string& prompt, const std::vector<EncodedImage>& images... get_inputs_embeds_with_token_type_ids(const std::string& prompt, const std::vector<EncodedImage>& images... Signed-off-by: xipingya <[email protected]>

2: Enable video for get_input_embeds Signed-off-by: xipingya <[email protected]>

src/cpp/include/openvino/genai/visual_language/pipeline.hpp

Signed-off-by: xipingya <[email protected]>

src/cpp/src/visual_language/vision_encoder.hpp

src/cpp/include/openvino/genai/visual_language/pipeline.hpp

Signed-off-by: xipingya <[email protected]>

Co-authored-by: Chen Peter <[email protected]>

…om/xipingyan/openvino.genai into xp/enable_qwen_vl_video_preprocess

std::vector<ov::Tensor> videos std::vector means multiple videos ov::Tensor means [N,H,W,C], N represents multiple frames of a video. Signed-off-by: xipingya <[email protected]>

Signed-off-by: xipingya <[email protected]>

Signed-off-by: xiping.yan <[email protected]>

github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms category: CPP API Changes in GenAI C++ public headers labels Jul 30, 2025

xipingyan added do_not_merge do_not_review labels Jul 30, 2025

xipingyan requested a review from wangleis July 30, 2025 13:19

xipingyan changed the title ~~enable qwen VL video preprocess~~ [Draft] Enable QWen VL video preprocess Jul 30, 2025

rkazants requested changes Jul 31, 2025

View reviewed changes

rkazants requested a review from Wovchena July 31, 2025 04:32

github-actions bot added the category: Python API Python API for GenAI label Jul 31, 2025

xipingyan changed the title ~~[Draft] Enable QWen VL video preprocess~~ Enable QWen VL video preprocess Aug 5, 2025

xipingyan removed do_not_merge do_not_review labels Aug 5, 2025

xipingyan requested a review from rkazants August 5, 2025 03:30

xipingyan marked this pull request as ready for review August 5, 2025 03:30

wangleis reviewed Aug 6, 2025

View reviewed changes

src/cpp/src/visual_language/llava/classes.cpp Outdated Show resolved Hide resolved

github-actions bot removed the category: sampling Sampling / Decoding algorithms label Aug 10, 2025

xipingyan added 7 commits August 11, 2025 09:31

Avoid to do resize for same width and height images.

6c49dc8

Signed-off-by: xipingya <[email protected]>

Enable video process for qwen*-vl

c7d9932

Signed-off-by: xipingya <[email protected]>

Add python interface: generate config: is_video, default false.

2ee043f

Signed-off-by: xipingya <[email protected]>

fallback video_encode to image encode in base class.

29c74fd

Signed-off-by: xipingya <[email protected]>

Update calc target image size.

78dac29

Only calc once for video process. Signed-off-by: xipingya <[email protected]>

Reduce shared codes, fallback to image process via return empty vector;

7b2c115

Signed-off-by: xipingya <[email protected]>

1: remove is_video,

10d8e8d

2: add ov::Properity::video Signed-off-by: xipingya <[email protected]>

xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from edd75d8 to 10d8e8d Compare August 11, 2025 01:31

Update src/cpp/src/visual_language/llava/classes.cpp

a3000d4

Co-authored-by: Wanglei Shen <[email protected]>

Copilot AI review requested due to automatic review settings September 11, 2025 07:25

Copilot AI reviewed Sep 11, 2025

View reviewed changes

src/cpp/src/visual_language/pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/visual_language/qwen2vl/classes.cpp Outdated Show resolved Hide resolved

xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from 7178143 to 6d8f9f1 Compare September 25, 2025 03:02

Add examples to .md

8768795

Signed-off-by: xipingya <[email protected]>

xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from 6d8f9f1 to 8768795 Compare September 25, 2025 03:14

xipingyan added 4 commits September 25, 2025 14:42

Fix test video error, and input multiple images.

be57bf2

Signed-off-by: xipingya <[email protected]>

Update test based on 4D video.

d96c5dd

Signed-off-by: xipingya <[email protected]>

Add vlm test dependency: opencv-python

aaf20b0

Signed-off-by: xipingya <[email protected]>

Merge remote-tracking branch 'origin/master' into xp/enable_qwen_vl_v…

a2ad61b

…ideo_preprocess

github-actions bot added category: tokenizers Tokenizer class or submodule update category: tests dependencies labels Sep 27, 2025

Enable mix video and image input.

6f5189b

Signed-off-by: xipingya <[email protected]>

xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from dfa276d to 6f5189b Compare September 28, 2025 00:55

xipingyan added 3 commits September 28, 2025 09:58

split encode_images into encode_images and encode_video

c0829a3

Signed-off-by: xipingya <[email protected]>

1: Add <video_pad> placeholder,

72c621b

2: Enable video for get_input_embeds Signed-off-by: xipingya <[email protected]>

popovaan reviewed Sep 29, 2025

View reviewed changes

src/cpp/include/openvino/genai/visual_language/pipeline.hpp Show resolved Hide resolved

Update position_ids after enable video.

132b228

Signed-off-by: xipingya <[email protected]>

popovaan reviewed Sep 29, 2025

View reviewed changes

src/cpp/src/visual_language/vision_encoder.hpp Outdated Show resolved Hide resolved

peterchen-intel reviewed Sep 29, 2025

View reviewed changes

src/cpp/include/openvino/genai/visual_language/pipeline.hpp Outdated Show resolved Hide resolved

peterchen-intel reviewed Sep 29, 2025

View reviewed changes

src/cpp/include/openvino/genai/visual_language/pipeline.hpp Outdated Show resolved Hide resolved

xipingyan and others added 4 commits September 30, 2025 10:18

add video histry id.

8c0e13d

Signed-off-by: xipingya <[email protected]>

Update src/cpp/include/openvino/genai/visual_language/pipeline.hpp

64ba684

Co-authored-by: Chen Peter <[email protected]>

Merge branch 'xp/enable_qwen_vl_video_preprocess' of https://github.c…

bbbef65

…om/xipingyan/openvino.genai into xp/enable_qwen_vl_video_preprocess

Rename video to videos, reducing confusion.

6e33dcf

std::vector<ov::Tensor> videos std::vector means multiple videos ov::Tensor means [N,H,W,C], N represents multiple frames of a video. Signed-off-by: xipingya <[email protected]>

xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from 6320103 to 6e33dcf Compare September 30, 2025 03:17

xipingyan added 2 commits September 30, 2025 11:33

Remove useless header.

6bf63de

Update video-> videos in Readme

eb4faea

Signed-off-by: xipingya <[email protected]>

xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from ac965a1 to eb4faea Compare September 30, 2025 03:51

all video -> videos

123221b

Signed-off-by: xiping.yan <[email protected]>

xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from e521645 to 515c911 Compare September 30, 2025 13:36

Call images when the models not implement video process.

515c911

Signed-off-by: xiping.yan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable QWen VL video preprocess #2514

Enable QWen VL video preprocess #2514

xipingyan commented Jul 30, 2025 •

edited

Loading

Uh oh!

rkazants left a comment •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Enable QWen VL video preprocess #2514

Are you sure you want to change the base?

Enable QWen VL video preprocess #2514

Conversation

xipingyan commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkazants left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xipingyan commented Jul 30, 2025 •

edited

Loading

rkazants left a comment •

edited

Loading