CB: Hetero pipeline parallel support #2227

WeldonWangwang · 2025-05-19T05:44:13Z

depend on: openvinotoolkit/openvino#30371

Co-authored-by: Copilot <[email protected]>

src/cpp/src/continuous_batching/cache_manager.hpp

WeldonWangwang · 2025-06-20T06:08:29Z

Hi @Wovchena , I check the failed test case in CI with openvino-genai 2025.2.0.0 locally, some cases failed too, for example:
test_cb_streamer_vs_return_vs_stateful in https://github.com/openvinotoolkit/openvino.genai/actions/runs/15752272348/job/44400676499?pr=2227

Can you help to check if the CI works well? thanks!!!

Wovchena · 2025-06-20T07:07:16Z

CI is broken. I don't know the component to blame yet

WeldonWangwang · 2025-06-23T02:45:59Z

Hi @Wovchena , i re-run the failed item in merge queue, but it seems can not be merged again after failed.

Copilot

Pull Request Overview

This PR extends continuous batching to support multi-GPU execution by updating device assertions and block sizing logic.

Relax the assertion to allow single CPU, single GPU, or multiple GPUs.
Introduce all_gpu_device to drive block size and context initialization.
Replace per-GPU flag with all_gpu_device checks in cache manager.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/cpp/src/continuous_batching/pipeline_impl.cpp	Relax device assertion for heterogeneous pipelines and detect all-GPU deployments.
src/cpp/src/continuous_batching/cache_manager.hpp	Use `all_gpu_device` for block size selection and context setup, replacing `is_gpu` logic.

Comments suppressed due to low confidence (1)

src/cpp/src/continuous_batching/pipeline_impl.cpp:107

[nitpick] Consider renaming all_gpu_device to all_gpu_devices to better reflect that it checks a collection of devices.

    const bool all_gpu_device =

Copilot · 2025-06-23T06:58:08Z

src/cpp/src/continuous_batching/pipeline_impl.cpp

+        std::all_of(execution_devices.begin(), execution_devices.end(), [&](const std::string& device) {
+            return device.find("GPU") != std::string::npos;
+        });
+    OPENVINO_ASSERT(all_gpu_device || execution_devices.size() == 1,


The assertion allows empty execution_devices (since all_of on an empty vector is true). Add a check to ensure execution_devices is non-empty before accessing index 0.

Copilot · 2025-06-23T06:58:08Z

src/cpp/src/continuous_batching/cache_manager.hpp

+            std::all_of(execution_devices.begin(), execution_devices.end(), [&](const std::string& device) {
+                return device.find("GPU") != std::string::npos;
+            });
+        OPENVINO_ASSERT(all_gpu_device || execution_devices.size() == 1,


As above, all_gpu_device will be true for an empty vector. Ensure execution_devices is not empty before using element 0 or combine this into the assertion.

CB: Hetero pipeline parallel support

0f3d812

github-actions bot added the category: continuous batching Continuous batching label May 19, 2025

as-suvorov requested a review from popovaan May 20, 2025 12:35

Merge branch 'master' into wangwang/support_hetero_cb

7448463

Wovchena approved these changes Jun 18, 2025

View reviewed changes

Wovchena requested a review from Copilot June 18, 2025 14:58

This comment was marked as outdated.

Sign in to view

WeldonWangwang and others added 2 commits June 19, 2025 10:09

Update src/cpp/src/continuous_batching/pipeline_impl.cpp

0549a0a

Co-authored-by: Copilot <[email protected]>

Add a judgment that execution_devices is empty

578d78f

WeldonWangwang requested a review from Copilot June 19, 2025 06:08

This comment was marked as outdated.

Sign in to view

Wovchena reviewed Jun 19, 2025

View reviewed changes

src/cpp/src/continuous_batching/cache_manager.hpp Outdated Show resolved Hide resolved

Wovchena requested a review from Copilot June 19, 2025 07:09

This comment was marked as outdated.

Sign in to view

Remove the judgment to empty execution_devices

64619c2

Wovchena added this pull request to the merge queue Jun 20, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 20, 2025

Wovchena requested a review from Copilot June 23, 2025 06:57

Copilot AI reviewed Jun 23, 2025

View reviewed changes

Wovchena approved these changes Jun 23, 2025

View reviewed changes

Wovchena added this pull request to the merge queue Jun 23, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 23, 2025

Wovchena added this pull request to the merge queue Jun 23, 2025

Merged via the queue into openvinotoolkit:master with commit 11401c1 Jun 23, 2025
149 of 160 checks passed

Wovchena mentioned this pull request Jul 29, 2025

Using export_model.py with Qwen3-32B in HETERO config and KV Cache Precision set to u8 results in a Dynamic Shape error. openvinotoolkit/model_server#3537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CB: Hetero pipeline parallel support #2227

CB: Hetero pipeline parallel support #2227

Uh oh!

WeldonWangwang commented May 19, 2025 •

edited by peterchen-intel

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

WeldonWangwang commented Jun 20, 2025 •

edited

Loading

Uh oh!

Wovchena commented Jun 20, 2025

Uh oh!

Uh oh!

WeldonWangwang commented Jun 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 23, 2025

Uh oh!

Copilot AI Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CB: Hetero pipeline parallel support #2227

CB: Hetero pipeline parallel support #2227

Uh oh!

Conversation

WeldonWangwang commented May 19, 2025 • edited by peterchen-intel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

WeldonWangwang commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wovchena commented Jun 20, 2025

Uh oh!

Uh oh!

WeldonWangwang commented Jun 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WeldonWangwang commented May 19, 2025 •

edited by peterchen-intel

Loading

WeldonWangwang commented Jun 20, 2025 •

edited

Loading