Fix inadvertently silenced PP tests for `mp`, add DeepSeek V2/V3 model family to PP tests #20831

eicherseiji · 2025-07-11T16:59:55Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Increased coverage for DeepSeek V2/V3 model model family in light of Correct PPMissingLayer handling in Deepseek-V2-Lite PP deployment #20665

Test Plan

Add parameter to existing test: tests/distributed/test_pipeline_parallel.py.

Test Result

N/A, test passes

(Optional) Documentation Update

github-actions · 2025-07-11T17:00:04Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @eicherseiji, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the test suite's coverage by integrating the DeepSeek V2/V3 model family into the existing pipeline parallel tests. This ensures that the system's compatibility and performance with these new models are validated, contributing to a more robust and comprehensive testing framework.

Highlights

Test Coverage Expansion: I've expanded the pipeline parallel test coverage to include the DeepSeek V2/V3 model family. This was achieved by adding deepseek-ai/DeepSeek-V2-Lite-Chat to the list of models used in the existing test_pipeline_parallel.py suite.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds deepseek-ai/DeepSeek-V2-Lite-Chat to the pipeline parallelism tests, increasing test coverage for the DeepSeek model family. My review focuses on ensuring the changes align with the stated goals of the PR. I've pointed out that while the PR aims to add coverage for both DeepSeek V2 and V3, only a V2 model has been included. I've recommended adding a V3 model to fully address the PR's objective.

tests/distributed/test_pipeline_parallel.py

ruisearch42 · 2025-07-11T17:12:02Z

tests/distributed/test_pipeline_parallel.py

@@ -248,6 +248,7 @@ def iter_params(self, model_id: str):
    "meta-llama/Llama-3.2-1B-Instruct",
    "ArthurZ/Ilama-3.2-1B",
    "ibm/PowerLM-3b",
+    "deepseek-ai/DeepSeek-V2-Lite-Chat",


by default L4 is used which cannot hold this model. You need to change gpu to a100 (that's the only other option) in test-pipeline.yaml.

@simon-mo @youkaichao considering the popularity of deepseek + PP, maybe it's worth adding this test with a100?

btw, I recall A100 does not support a specific fp8 format that vanilla DeepSeekV3 requires when I tested it (not sure about DeepSeekV2), so A100 might not be enough (Hopper for sure works)

Need to check if the exact configuration in this test_pipeline_parallel.py will work, but sanity check on Anyscale was successful for this model:

eicherseiji · 2025-07-14T20:14:54Z

OOM but test passes. Following up to fix.

[2025-07-14T16:52:34Z] INFO 07-14 09:52:34 [model_runner.py:1171] Starting to load model deepseek-ai/DeepSeek-V2-Lite-Chat...
[2025-07-14T16:52:35Z] DEBUG 07-14 09:52:35 [decorators.py:110] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.deepseek_v2.DeepseekV2Model'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458] CUDA out of memory. Tried to allocate 704.00 MiB. GPU 0 has a total capacity of 22.05 GiB of which 508.12 MiB is free. Including non-PyTorch memory, this process has 21.54 GiB memory in use. Of the allocated memory 21.30 GiB is allocated by PyTorch, and 28.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458] Traceback (most recent call last):

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     engine = MQLLMEngine.from_vllm_config(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return cls(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 87, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.engine = LLMEngine(*args, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 265, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.model_executor = executor_class(vllm_config=vllm_config)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 287, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     super().__init__(*args, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 53, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self._init_executor()

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/mp_distributed_executor.py", line 126, in _init_executor

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self._run_workers("load_model",

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/mp_distributed_executor.py", line 186, in _run_workers

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     driver_worker_output = run_method(self.driver_worker, sent_method,

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2943, in run_method

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return func(*args, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 210, in load_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.model_runner.load_model()

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1174, in load_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.model = get_model(vllm_config=self.vllm_config)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 59, in get_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return loader.load_model(vllm_config=vllm_config,

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 38, in load_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     model = initialize_model(vllm_config=vllm_config,

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return model_class(vllm_config=vllm_config, prefix=prefix)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 723, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.model = DeepseekV2Model(vllm_config=vllm_config,

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 152, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 661, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.start_layer, self.end_layer, self.layers = make_layers(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                                                     ^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 640, in make_layers

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 663, in <lambda>

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     lambda prefix: DeepseekV2DecoderLayer(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                    ^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 570, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.mlp = DeepseekV2MoE(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                ^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 148, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.experts = FusedMoE(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                    ^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 772, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.quant_method.create_weights(layer=self, **moe_quant_params)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 263, in create_weights

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     w13_weight = torch.nn.Parameter(torch.empty(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                                     ^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py", line 104, in __torch_function__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return func(*args, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 704.00 MiB. GPU 0 has a total capacity of 22.05 GiB of which 508.12 MiB is free. Including non-PyTorch memory, this process has 21.54 GiB memory in use. Of the allocated memory 21.30 GiB is allocated by PyTorch, and 28.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[2025-07-14T16:52:35Z] Process SpawnProcess-1:
[2025-07-14T16:52:35Z] Traceback (most recent call last):
[2025-07-14T16:52:35Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-07-14T16:52:35Z]     self.run()
[2025-07-14T16:52:35Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-07-14T16:52:35Z]     self._target(*self._args, **self._kwargs)
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 460, in run_mp_engine
[2025-07-14T16:52:35Z]     raise e from None
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine
[2025-07-14T16:52:35Z]     engine = MQLLMEngine.from_vllm_config(
[2025-07-14T16:52:35Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config
[2025-07-14T16:52:35Z]     return cls(
[2025-07-14T16:52:35Z]            ^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 87, in __init__
[2025-07-14T16:52:35Z]     self.engine = LLMEngine(*args, **kwargs)
[2025-07-14T16:52:35Z]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 265, in __init__
[2025-07-14T16:52:35Z]     self.model_executor = executor_class(vllm_config=vllm_config)
[2025-07-14T16:52:35Z]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 287, in __init__
[2025-07-14T16:52:35Z]     super().__init__(*args, **kwargs)
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 53, in __init__
[2025-07-14T16:52:35Z]     self._init_executor()
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/mp_distributed_executor.py", line 126, in _init_executor
[2025-07-14T16:52:35Z]     self._run_workers("load_model",
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/mp_distributed_executor.py", line 186, in _run_workers
[2025-07-14T16:52:35Z]     driver_worker_output = run_method(self.driver_worker, sent_method,
[2025-07-14T16:52:35Z]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2943, in run_method
[2025-07-14T16:52:35Z]     return func(*args, **kwargs)
[2025-07-14T16:52:35Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 210, in load_model
[2025-07-14T16:52:35Z]     self.model_runner.load_model()
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1174, in load_model
[2025-07-14T16:52:35Z]     self.model = get_model(vllm_config=self.vllm_config)
[2025-07-14T16:52:35Z]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 59, in get_model
[2025-07-14T16:52:35Z]     return loader.load_model(vllm_config=vllm_config,
[2025-07-14T16:52:35Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 38, in load_model
[2025-07-14T16:52:35Z]     model = initialize_model(vllm_config=vllm_config,
[2025-07-14T16:52:35Z]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model
[2025-07-14T16:52:35Z]     return model_class(vllm_config=vllm_config, prefix=prefix)
[2025-07-14T16:52:35Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 723, in __init__
[2025-07-14T16:52:35Z]     self.model = DeepseekV2Model(vllm_config=vllm_config,
[2025-07-14T16:52:35Z]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 152, in __init__
[2025-07-14T16:52:35Z]     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 661, in __init__
[2025-07-14T16:52:35Z]     self.start_layer, self.end_layer, self.layers = make_layers(
[2025-07-14T16:52:35Z]                                                     ^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 640, in make_layers
[2025-07-14T16:52:35Z]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
[2025-07-14T16:52:35Z]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 663, in <lambda>
[2025-07-14T16:52:35Z]     lambda prefix: DeepseekV2DecoderLayer(
[2025-07-14T16:52:35Z]                    ^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 570, in __init__
[2025-07-14T16:52:35Z]     self.mlp = DeepseekV2MoE(
[2025-07-14T16:52:35Z]                ^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 148, in __init__
[2025-07-14T16:52:35Z]     self.experts = FusedMoE(
[2025-07-14T16:52:35Z]                    ^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 772, in __init__
[2025-07-14T16:52:35Z]     self.quant_method.create_weights(layer=self, **moe_quant_params)
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 263, in create_weights
[2025-07-14T16:52:35Z]     w13_weight = torch.nn.Parameter(torch.empty(
[2025-07-14T16:52:35Z]                                     ^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py", line 104, in __torch_function__
[2025-07-14T16:52:35Z]     return func(*args, **kwargs)
[2025-07-14T16:52:35Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 704.00 MiB. GPU 0 has a total capacity of 22.05 GiB of which 508.12 MiB is free. Including non-PyTorch memory, this process has 21.54 GiB memory in use. Of the allocated memory 21.30 GiB is allocated by PyTorch, and 28.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[2025-07-14T16:52:36Z] [rank0]:[W714 09:52:36.281349739 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[2025-07-14T16:52:37Z] DEBUG 07-14 09:52:37 [client.py:261] Shutting down MQLLMEngineClient output handler.
[2025-07-14T16:52:37Z] Traceback (most recent call last):
[2025-07-14T16:52:37Z]   File "/usr/local/bin/vllm", line 10, in <module>
[2025-07-14T16:52:37Z]     sys.exit(main())
[2025-07-14T16:52:37Z]              ^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 65, in main
[2025-07-14T16:52:37Z]     args.dispatch_function(args)
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
[2025-07-14T16:52:37Z]     uvloop.run(run_server(args))
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
[2025-07-14T16:52:37Z]     return __asyncio.run(
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
[2025-07-14T16:52:37Z]     return runner.run(main)
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
[2025-07-14T16:52:37Z]     return self._loop.run_until_complete(task)
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
[2025-07-14T16:52:37Z]     return await main
[2025-07-14T16:52:37Z]            ^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1675, in run_server
[2025-07-14T16:52:37Z]     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1695, in run_server_worker
[2025-07-14T16:52:37Z]     async with build_async_engine_client(args, client_config) as engine_client:
[2025-07-14T16:52:37Z]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[2025-07-14T16:52:37Z]     return await anext(self.gen)
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
[2025-07-14T16:52:37Z]     async with build_async_engine_client_from_engine_args(
[2025-07-14T16:52:37Z]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[2025-07-14T16:52:37Z]     return await anext(self.gen)
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 291, in build_async_engine_client_from_engine_args
[2025-07-14T16:52:37Z]     raise RuntimeError(
[2025-07-14T16:52:37Z] RuntimeError: Engine process failed to start. See stack trace for the root cause.
[2025-07-14T16:52:39Z] �[32mPASSED�[0m

eicherseiji · 2025-07-14T22:40:59Z

The PP run was successful, but the baseline (defaults to TP=1, PP=1) failed with OOM (ofc). Seems like #14219 inadvertently silenced all test_pipeline_parallel.py test failures with mp backend. Should be fixed now. Waiting for tests to run

tests/distributed/test_pipeline_parallel.py

eicherseiji · 2025-07-15T02:26:55Z

Additional test failure surfaced after test bugfix: https://buildkite.com/vllm/ci/builds/23938#01980b35-a0f5-41c8-b364-731727cca107

[2025-07-15T01:08:37Z] Traceback (most recent call last):
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/utils.py", line 741, in wrapper
[2025-07-15T01:08:37Z]     f(*args, **kwargs)
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/distributed/test_pipeline_parallel.py", line 455, in test_tp_language_embedding
[2025-07-15T01:08:37Z]     _compare_tp(model_id,
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/distributed/test_pipeline_parallel.py", line 393, in _compare_tp
[2025-07-15T01:08:37Z]     compare_two_settings(model_id,
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/utils.py", line 467, in compare_two_settings
[2025-07-15T01:08:37Z]     compare_all_settings(
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/utils.py", line 531, in compare_all_settings
[2025-07-15T01:08:37Z]     with RemoteOpenAIServer(model,
[2025-07-15T01:08:37Z]          ^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/utils.py", line 116, in __init__
[2025-07-15T01:08:37Z]     model_config = engine_args.create_model_config()
[2025-07-15T01:08:37Z]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T01:08:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 976, in create_model_config
[2025-07-15T01:08:37Z]     return ModelConfig(
[2025-07-15T01:08:37Z]            ^^^^^^^^^^^^
[2025-07-15T01:08:37Z]   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 120, in __init__
[2025-07-15T01:08:37Z]     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
[2025-07-15T01:08:37Z] pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
[2025-07-15T01:08:37Z]   Value error, The model type 'gemma2' does not support float16. Reason: Numerical instability. Please use bfloat16 or float32 instead. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
[2025-07-15T01:08:37Z]     For further information visit https://errors.pydantic.dev/2.11/v/value_error
[2025-07-15T01:08:38Z] FAILED
[...]
[2025-07-15T01:11:42Z] =========================== short test summary info ============================
[2025-07-15T01:11:42Z] FAILED distributed/test_pipeline_parallel.py::test_tp_language_embedding[BAAI/bge-multilingual-gemma2-parallel_setup1-mp-0-auto-test_options1] - AssertionError: function <function test_tp_language_embedding at 0x7fc76b74ed40> failed when called with args () and kwargs {'model_id': 'BAAI/bge-multilingual-gemma2', 'parallel_setup': ParallelSetup(tp_size=1, pp_size=2, eager_mode=True, chunked_prefill=False), 'distributed_backend': 'mp', 'vllm_major_version': '0', 'task': 'auto', 'test_options': PPTestOptions(multi_node_only=False, load_format=None), 'num_gpus_available': 4}

eicherseiji · 2025-07-15T02:54:20Z

TPU test failures due to:

>   from vllm.model_executor.layers.fused_moe import fused_experts
E   ImportError: cannot import name 'fused_experts' from 'vllm.model_executor.layers.fused_moe' (/workspace/vllm/vllm/model_executor/layers/fused_moe/__init__.py)

Distributed test failure due to:

[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586] EngineCore failed to start.
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586] Traceback (most recent call last):
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 575, in run_engine_core
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     engine_core = DPEngineCoreProc(*args, **kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 835, in __init__
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     super().__init__(vllm_config, local_client, handshake_address,
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 404, in __init__
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     super().__init__(vllm_config, executor_class, log_stats,
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 75, in __init__
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     self.model_executor = executor_class(vllm_config)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 53, in __init__
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     self._init_executor()
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     self.collective_rpc("init_device")
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     answer = run_method(self.driver_worker, method, args, kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2943, in run_method
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     return func(*args, **kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 606, in init_device
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     self.worker.init_device()  # type: ignore
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     ^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 164, in init_device
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     init_worker_distributed_environment(self.vllm_config, self.rank,
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 413, in init_worker_distributed_environment
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     init_distributed_environment(parallel_config.world_size, rank,
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 952, in init_distributed_environment
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     torch.distributed.init_process_group(
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     return func(*args, **kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py", line 95, in wrapper
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     func_return = func(*args, **kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]                   ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 1710, in init_process_group
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     store, rank, world_size = next(rendezvous_iterator)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]                               ^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/rendezvous.py", line 230, in _tcp_rendezvous_handler
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     store = _create_c10d_store(
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]             ^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/rendezvous.py", line 198, in _create_c10d_store
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     return TCPStore(
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]            ^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586] torch.distributed.DistNetworkError: The server socket has failed to listen on any local network address. port: 46620, useIpv6: false, code: -98, name: EADDRINUSE, message: address already in use

https://buildkite.com/vllm/ci/builds/23938#01980b35-a0f3-43b4-a8eb-5cf80512dc28

These don't seem related to the test addition/fix.

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji · 2025-07-15T21:04:24Z

Looks like #20771 changed resolution of auto model runner type:

# vllm/vllm/config.py:928
        suffix_to_preferred_runner: list[tuple[str, RunnerType]] = [
            ("ForCausalLM", "generate"),
            ("ForConditionalGeneration", "generate"),
            ("ChatModel", "generate"),
            ("LMHeadModel", "generate"),
            ("ForSequenceClassification", "pooling"),
            ("EmbeddingModel", "pooling"),
            ("RewardModel", "pooling"),
        ]

Since intfloat/e5-mistral-7b-instruct has arch LlamaForCausalLM, embeddings requests will fail with The model does not support Embeddings API' if that task is not specified directly.

eicherseiji · 2025-07-15T22:44:04Z

tests/distributed/test_pipeline_parallel.py

-    "intfloat/e5-mistral-7b-instruct": PPTestSettings.fast(),
-    "BAAI/bge-multilingual-gemma2": PPTestSettings.fast(),
-    "Qwen/Qwen2.5-Math-RM-72B": PPTestSettings.fast(load_format="dummy"),
+    "intfloat/e5-mistral-7b-instruct": PPTestSettings.fast(task="embed"),


Needed due to #20771

eicherseiji · 2025-07-15T22:45:06Z

tests/distributed/test_pipeline_parallel.py

+
+    dtype = "float16"
+    if hf_config.model_type in _FLOAT16_NOT_SUPPORTED_MODELS:
+        dtype = "bfloat16"


Needed since BAAI/bge-multilingual-gemma2 doesn't support float16

eicherseiji · 2025-07-15T22:46:01Z

TPU v1 test also failing on main.

@ruisearch42 can you take a look when you get a chance? A few additional test fixes were needed to recover since the test was disabled. Thanks for your help so far!

DarkLight1337

LGTM, thanks for fixing!

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]>

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: Himanshu Jaju <[email protected]>

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]>

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: avigny <[email protected]>

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: x22x22 <[email protected]>

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji requested a review from youkaichao as a code owner July 11, 2025 16:59

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

mergify bot added the deepseek Related to DeepSeek models label Jul 11, 2025

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

tests/distributed/test_pipeline_parallel.py Show resolved Hide resolved

eicherseiji marked this pull request as draft July 11, 2025 17:00

ruisearch42 reviewed Jul 11, 2025

View reviewed changes

eicherseiji marked this pull request as ready for review July 11, 2025 19:42

ruisearch42 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 14, 2025

eicherseiji marked this pull request as draft July 14, 2025 20:14

eicherseiji marked this pull request as ready for review July 14, 2025 22:45

ruisearch42 reviewed Jul 14, 2025

View reviewed changes

tests/distributed/test_pipeline_parallel.py Show resolved Hide resolved

eicherseiji added 4 commits July 15, 2025 10:14

Add DeepSeek-V2-Lite to PP tests

9586974

Signed-off-by: Seiji Eicher <[email protected]>

Fix Exception handling, set TP base to 2

9878137

Signed-off-by: Seiji Eicher <[email protected]>

Define testing_ray_compiled_graph outside if

5c4ffbb

Signed-off-by: Seiji Eicher <[email protected]>

Fix dtype for BAAI/bge-multilingual-gemma2

480beba

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji force-pushed the test-deepseek-v2-pp branch from 80600bb to 480beba Compare July 15, 2025 17:14

Since vllm-project#20771, task may default to generate. Specify directly

912eec8

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji commented Jul 15, 2025

View reviewed changes

eicherseiji changed the title ~~Add DeepSeek V2/V3 model family to PP tests~~ PP tests for mp were disabled, add DeepSeek V2/V3 model family to PP tests Jul 15, 2025

eicherseiji changed the title ~~PP tests for mp were disabled, add DeepSeek V2/V3 model family to PP tests~~ Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests Jul 16, 2025

DarkLight1337 approved these changes Jul 16, 2025

View reviewed changes

vllm-bot merged commit d0dc4cf into vllm-project:main Jul 16, 2025
47 of 49 checks passed

nadathurv pushed a commit to nadathurv/vllm that referenced this pull request Jul 16, 2025

Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 mode…

25f7d37

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]>

LyrisZhong pushed a commit to LyrisZhong/vllm that referenced this pull request Jul 23, 2025

Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 mode…

ea7913d

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 mode…

70d1bdf

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 mode…

e89a3fb

…l family to PP tests (vllm-project#20831) Signed-off-by: Seiji Eicher <[email protected]>

Uh oh!

Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests #20831

Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests #20831

Uh oh!

Conversation

eicherseiji commented Jul 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ruisearch42 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

ruisearch42 Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eicherseiji Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji commented Jul 14, 2025

Uh oh!

eicherseiji commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eicherseiji commented Jul 15, 2025

Uh oh!

eicherseiji commented Jul 15, 2025

Uh oh!

eicherseiji commented Jul 15, 2025

Uh oh!

eicherseiji Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Fix inadvertently silenced PP tests for `mp`, add DeepSeek V2/V3 model family to PP tests #20831

Fix inadvertently silenced PP tests for `mp`, add DeepSeek V2/V3 model family to PP tests #20831

eicherseiji commented Jul 11, 2025 •

edited by github-actions bot

Loading

ruisearch42 Jul 11, 2025 •

edited

Loading

eicherseiji commented Jul 14, 2025 •

edited

Loading

eicherseiji commented Jul 15, 2025 •

edited

Loading