Skip to content

Conversation

xipingyan
Copy link
Contributor

@xipingyan xipingyan commented Jul 30, 2025

tickets: CVS-173219
1: Enable video preprocessing for Qwen VL model.
Add: ov::Property<std::vectorov::Tensor> video{"video"};

2: The main updates:
For video: For 2-in-1 merging, if 9 images are input, only 5 images are actually processed.
For image: For 2-in-1 merging, we only double each image, so if we input 9 images, we only actually process 9 images.
Introduce "If" node, merge video and image preprocess into one OV subgroup.

@github-actions github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms category: CPP API Changes in GenAI C++ public headers labels Jul 30, 2025
@xipingyan xipingyan requested a review from wangleis July 30, 2025 13:19
@xipingyan xipingyan changed the title enable qwen VL video preprocess [Draft] Enable QWen VL video preprocess Jul 30, 2025
Copy link
Collaborator

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please share CVS number in JIRA to access from arch perspective.

Thanks.

@rkazants rkazants requested a review from Wovchena July 31, 2025 04:32
@github-actions github-actions bot added the category: Python API Python API for GenAI label Jul 31, 2025
@xipingyan xipingyan changed the title [Draft] Enable QWen VL video preprocess Enable QWen VL video preprocess Aug 5, 2025
@xipingyan xipingyan requested a review from rkazants August 5, 2025 03:30
@xipingyan xipingyan marked this pull request as ready for review August 5, 2025 03:30
@github-actions github-actions bot removed the category: sampling Sampling / Decoding algorithms label Aug 10, 2025
@xipingyan xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from edd75d8 to 10d8e8d Compare August 11, 2025 01:31
@Copilot Copilot AI review requested due to automatic review settings September 11, 2025 07:25
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Enables video processing for QWen VL models by adding video input support throughout the VLM pipeline. The main change allows QWen VL models to handle video input through temporal patch processing, which groups video frames and merges them into combined patches for more efficient processing.

  • Adds video parameter support to all VLM pipeline interfaces and binding functions
  • Implements video encoding functionality specifically for QWen2VL models
  • Updates the generation config to include is_video flag for video-specific processing

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/python/py_vlm_pipeline.cpp Adds video parameter to Python VLM pipeline bindings
src/python/py_continuous_batching_pipeline.cpp Updates continuous batching pipeline with video support
src/python/openvino_genai/py_openvino_genai.pyi Adds is_video property and video parameter to type stubs
src/cpp/src/visual_language/vision_encoder.hpp Adds virtual encode_video method to base VisionEncoder
src/cpp/src/visual_language/qwen2vl/classes.hpp Declares video encoding implementation for QWen2VL
src/cpp/src/visual_language/qwen2vl/classes.cpp Implements video preprocessing and encoding logic
src/cpp/src/visual_language/pipeline_base.hpp Updates base pipeline interface to include video parameter
src/cpp/src/visual_language/pipeline.cpp Updates main pipeline implementation with video support
src/cpp/src/visual_language/llava_next/classes.hpp Updates LLaVANext interface with video parameter
src/cpp/src/visual_language/llava_next/classes.cpp Adds video warning for unsupported models
src/cpp/src/visual_language/llava/classes.hpp Updates LLaVA interface with video parameter
src/cpp/src/visual_language/llava/classes.cpp Adds video warning for unsupported models
src/cpp/src/visual_language/inputs_embedder.hpp Updates inputs embedder interface for video support
src/cpp/src/visual_language/inputs_embedder.cpp Implements video encoding routing logic
src/cpp/src/visual_language/continuous_batching_adapter.hpp Updates adapter interface with video parameter
src/cpp/src/visual_language/clip.cpp Optimizes bicubic resize with early exit for same-size images
src/cpp/src/continuous_batching/pipeline_impl.cpp Updates implementation to handle video parameters
src/cpp/src/continuous_batching/pipeline_base.hpp Updates base interface with video support
src/cpp/src/continuous_batching/pipeline_base.cpp Implements video parameter handling in pipeline
src/cpp/src/continuous_batching/pipeline.cpp Updates main pipeline with video parameter support
src/cpp/include/openvino/genai/visual_language/pipeline.hpp Adds video property and parameter to public interface
src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp Updates public interface with video support

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

# Conflicts:
#	src/cpp/src/continuous_batching/pipeline_base.cpp
#	src/cpp/src/visual_language/inputs_embedder.cpp
#	src/cpp/src/visual_language/inputs_embedder.hpp
#	src/cpp/src/visual_language/qwen2vl/classes.cpp
#	src/cpp/src/visual_language/qwen2vl/classes.hpp
@xipingyan xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from 7178143 to 6d8f9f1 Compare September 25, 2025 03:02
Signed-off-by: xipingya <[email protected]>
@xipingyan xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from 6d8f9f1 to 8768795 Compare September 25, 2025 03:14
@github-actions github-actions bot added category: tokenizers Tokenizer class or submodule update category: tests dependencies labels Sep 27, 2025
@xipingyan xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from dfa276d to 6f5189b Compare September 28, 2025 00:55
	get_inputs_embeds(const std::string& prompt, const std::vector<ov::Tensor>& images ...
	get_inputs_embeds_with_token_type_ids(const std::string& prompt, const std::vector<ov::Tensor>& images, ...
Because
1: they never been called for current codes.
2: Getting embeds feature, we usually need to apply a chat template. I think only keeping below interface is enough.

	get_inputs_embeds(const std::string& prompt, const std::vector<EncodedImage>& images...
	get_inputs_embeds_with_token_type_ids(const std::string& prompt, const std::vector<EncodedImage>& images...


Signed-off-by: xipingya <[email protected]>
2: Enable video for get_input_embeds

Signed-off-by: xipingya <[email protected]>
xipingyan and others added 4 commits September 30, 2025 10:18
Signed-off-by: xipingya <[email protected]>
std::vector<ov::Tensor> videos
std::vector means multiple videos
ov::Tensor means [N,H,W,C], N represents multiple frames of a video.

Signed-off-by: xipingya <[email protected]>
@xipingyan xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from 6320103 to 6e33dcf Compare September 30, 2025 03:17
@xipingyan xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from ac965a1 to eb4faea Compare September 30, 2025 03:51
Signed-off-by: xiping.yan <[email protected]>
@xipingyan xipingyan force-pushed the xp/enable_qwen_vl_video_preprocess branch from e521645 to 515c911 Compare September 30, 2025 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: Python API Python API for GenAI category: tests dependencies category: tokenizers Tokenizer class or submodule update category: visual language Visual language pipeline no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants