Skip to content

Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests #20831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 16, 2025

Conversation

eicherseiji
Copy link
Contributor

@eicherseiji eicherseiji commented Jul 11, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Add parameter to existing test: tests/distributed/test_pipeline_parallel.py.

Test Result

N/A, test passes

(Optional) Documentation Update

@eicherseiji eicherseiji requested a review from youkaichao as a code owner July 11, 2025 16:59
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @eicherseiji, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the test suite's coverage by integrating the DeepSeek V2/V3 model family into the existing pipeline parallel tests. This ensures that the system's compatibility and performance with these new models are validated, contributing to a more robust and comprehensive testing framework.

Highlights

  • Test Coverage Expansion: I've expanded the pipeline parallel test coverage to include the DeepSeek V2/V3 model family. This was achieved by adding deepseek-ai/DeepSeek-V2-Lite-Chat to the list of models used in the existing test_pipeline_parallel.py suite.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the deepseek Related to DeepSeek models label Jul 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds deepseek-ai/DeepSeek-V2-Lite-Chat to the pipeline parallelism tests, increasing test coverage for the DeepSeek model family. My review focuses on ensuring the changes align with the stated goals of the PR. I've pointed out that while the PR aims to add coverage for both DeepSeek V2 and V3, only a V2 model has been included. I've recommended adding a V3 model to fully address the PR's objective.

@eicherseiji eicherseiji marked this pull request as draft July 11, 2025 17:00
@@ -248,6 +248,7 @@ def iter_params(self, model_id: str):
"meta-llama/Llama-3.2-1B-Instruct",
"ArthurZ/Ilama-3.2-1B",
"ibm/PowerLM-3b",
"deepseek-ai/DeepSeek-V2-Lite-Chat",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default L4 is used which cannot hold this model. You need to change gpu to a100 (that's the only other option) in test-pipeline.yaml.

@simon-mo @youkaichao considering the popularity of deepseek + PP, maybe it's worth adding this test with a100?

Copy link
Collaborator

@ruisearch42 ruisearch42 Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I recall A100 does not support a specific fp8 format that vanilla DeepSeekV3 requires when I tested it (not sure about DeepSeekV2), so A100 might not be enough (Hopper for sure works)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check if the exact configuration in this test_pipeline_parallel.py will work, but sanity check on Anyscale was successful for this model:
Screenshot 2025-07-11 at 12 41 06 PM

@eicherseiji eicherseiji marked this pull request as ready for review July 11, 2025 19:42
@ruisearch42 ruisearch42 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 14, 2025
@eicherseiji eicherseiji marked this pull request as draft July 14, 2025 20:14
@eicherseiji
Copy link
Contributor Author

OOM but test passes. Following up to fix.

[2025-07-14T16:52:34Z] INFO 07-14 09:52:34 [model_runner.py:1171] Starting to load model deepseek-ai/DeepSeek-V2-Lite-Chat...
[2025-07-14T16:52:35Z] DEBUG 07-14 09:52:35 [decorators.py:110] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.deepseek_v2.DeepseekV2Model'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458] CUDA out of memory. Tried to allocate 704.00 MiB. GPU 0 has a total capacity of 22.05 GiB of which 508.12 MiB is free. Including non-PyTorch memory, this process has 21.54 GiB memory in use. Of the allocated memory 21.30 GiB is allocated by PyTorch, and 28.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458] Traceback (most recent call last):

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     engine = MQLLMEngine.from_vllm_config(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return cls(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 87, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.engine = LLMEngine(*args, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 265, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.model_executor = executor_class(vllm_config=vllm_config)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 287, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     super().__init__(*args, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 53, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self._init_executor()

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/mp_distributed_executor.py", line 126, in _init_executor

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self._run_workers("load_model",

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/mp_distributed_executor.py", line 186, in _run_workers

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     driver_worker_output = run_method(self.driver_worker, sent_method,

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2943, in run_method

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return func(*args, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 210, in load_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.model_runner.load_model()

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1174, in load_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.model = get_model(vllm_config=self.vllm_config)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 59, in get_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return loader.load_model(vllm_config=vllm_config,

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 38, in load_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     model = initialize_model(vllm_config=vllm_config,

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return model_class(vllm_config=vllm_config, prefix=prefix)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 723, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.model = DeepseekV2Model(vllm_config=vllm_config,

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 152, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 661, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.start_layer, self.end_layer, self.layers = make_layers(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                                                     ^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 640, in make_layers

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 663, in <lambda>

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     lambda prefix: DeepseekV2DecoderLayer(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                    ^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 570, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.mlp = DeepseekV2MoE(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                ^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 148, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.experts = FusedMoE(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                    ^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 772, in __init__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     self.quant_method.create_weights(layer=self, **moe_quant_params)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 263, in create_weights

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     w13_weight = torch.nn.Parameter(torch.empty(

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]                                     ^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py", line 104, in __torch_function__

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]     return func(*args, **kwargs)

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^

[2025-07-14T16:52:35Z] ERROR 07-14 09:52:35 [engine.py:458] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 704.00 MiB. GPU 0 has a total capacity of 22.05 GiB of which 508.12 MiB is free. Including non-PyTorch memory, this process has 21.54 GiB memory in use. Of the allocated memory 21.30 GiB is allocated by PyTorch, and 28.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[2025-07-14T16:52:35Z] Process SpawnProcess-1:
[2025-07-14T16:52:35Z] Traceback (most recent call last):
[2025-07-14T16:52:35Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-07-14T16:52:35Z]     self.run()
[2025-07-14T16:52:35Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-07-14T16:52:35Z]     self._target(*self._args, **self._kwargs)
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 460, in run_mp_engine
[2025-07-14T16:52:35Z]     raise e from None
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine
[2025-07-14T16:52:35Z]     engine = MQLLMEngine.from_vllm_config(
[2025-07-14T16:52:35Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config
[2025-07-14T16:52:35Z]     return cls(
[2025-07-14T16:52:35Z]            ^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 87, in __init__
[2025-07-14T16:52:35Z]     self.engine = LLMEngine(*args, **kwargs)
[2025-07-14T16:52:35Z]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 265, in __init__
[2025-07-14T16:52:35Z]     self.model_executor = executor_class(vllm_config=vllm_config)
[2025-07-14T16:52:35Z]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 287, in __init__
[2025-07-14T16:52:35Z]     super().__init__(*args, **kwargs)
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 53, in __init__
[2025-07-14T16:52:35Z]     self._init_executor()
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/mp_distributed_executor.py", line 126, in _init_executor
[2025-07-14T16:52:35Z]     self._run_workers("load_model",
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/mp_distributed_executor.py", line 186, in _run_workers
[2025-07-14T16:52:35Z]     driver_worker_output = run_method(self.driver_worker, sent_method,
[2025-07-14T16:52:35Z]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2943, in run_method
[2025-07-14T16:52:35Z]     return func(*args, **kwargs)
[2025-07-14T16:52:35Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 210, in load_model
[2025-07-14T16:52:35Z]     self.model_runner.load_model()
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1174, in load_model
[2025-07-14T16:52:35Z]     self.model = get_model(vllm_config=self.vllm_config)
[2025-07-14T16:52:35Z]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 59, in get_model
[2025-07-14T16:52:35Z]     return loader.load_model(vllm_config=vllm_config,
[2025-07-14T16:52:35Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 38, in load_model
[2025-07-14T16:52:35Z]     model = initialize_model(vllm_config=vllm_config,
[2025-07-14T16:52:35Z]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model
[2025-07-14T16:52:35Z]     return model_class(vllm_config=vllm_config, prefix=prefix)
[2025-07-14T16:52:35Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 723, in __init__
[2025-07-14T16:52:35Z]     self.model = DeepseekV2Model(vllm_config=vllm_config,
[2025-07-14T16:52:35Z]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 152, in __init__
[2025-07-14T16:52:35Z]     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 661, in __init__
[2025-07-14T16:52:35Z]     self.start_layer, self.end_layer, self.layers = make_layers(
[2025-07-14T16:52:35Z]                                                     ^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 640, in make_layers
[2025-07-14T16:52:35Z]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
[2025-07-14T16:52:35Z]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 663, in <lambda>
[2025-07-14T16:52:35Z]     lambda prefix: DeepseekV2DecoderLayer(
[2025-07-14T16:52:35Z]                    ^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 570, in __init__
[2025-07-14T16:52:35Z]     self.mlp = DeepseekV2MoE(
[2025-07-14T16:52:35Z]                ^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 148, in __init__
[2025-07-14T16:52:35Z]     self.experts = FusedMoE(
[2025-07-14T16:52:35Z]                    ^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 772, in __init__
[2025-07-14T16:52:35Z]     self.quant_method.create_weights(layer=self, **moe_quant_params)
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 263, in create_weights
[2025-07-14T16:52:35Z]     w13_weight = torch.nn.Parameter(torch.empty(
[2025-07-14T16:52:35Z]                                     ^^^^^^^^^^^^
[2025-07-14T16:52:35Z]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py", line 104, in __torch_function__
[2025-07-14T16:52:35Z]     return func(*args, **kwargs)
[2025-07-14T16:52:35Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:35Z] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 704.00 MiB. GPU 0 has a total capacity of 22.05 GiB of which 508.12 MiB is free. Including non-PyTorch memory, this process has 21.54 GiB memory in use. Of the allocated memory 21.30 GiB is allocated by PyTorch, and 28.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[2025-07-14T16:52:36Z] [rank0]:[W714 09:52:36.281349739 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[2025-07-14T16:52:37Z] DEBUG 07-14 09:52:37 [client.py:261] Shutting down MQLLMEngineClient output handler.
[2025-07-14T16:52:37Z] Traceback (most recent call last):
[2025-07-14T16:52:37Z]   File "/usr/local/bin/vllm", line 10, in <module>
[2025-07-14T16:52:37Z]     sys.exit(main())
[2025-07-14T16:52:37Z]              ^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 65, in main
[2025-07-14T16:52:37Z]     args.dispatch_function(args)
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
[2025-07-14T16:52:37Z]     uvloop.run(run_server(args))
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
[2025-07-14T16:52:37Z]     return __asyncio.run(
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
[2025-07-14T16:52:37Z]     return runner.run(main)
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
[2025-07-14T16:52:37Z]     return self._loop.run_until_complete(task)
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
[2025-07-14T16:52:37Z]     return await main
[2025-07-14T16:52:37Z]            ^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1675, in run_server
[2025-07-14T16:52:37Z]     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1695, in run_server_worker
[2025-07-14T16:52:37Z]     async with build_async_engine_client(args, client_config) as engine_client:
[2025-07-14T16:52:37Z]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[2025-07-14T16:52:37Z]     return await anext(self.gen)
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
[2025-07-14T16:52:37Z]     async with build_async_engine_client_from_engine_args(
[2025-07-14T16:52:37Z]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[2025-07-14T16:52:37Z]     return await anext(self.gen)
[2025-07-14T16:52:37Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-14T16:52:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 291, in build_async_engine_client_from_engine_args
[2025-07-14T16:52:37Z]     raise RuntimeError(
[2025-07-14T16:52:37Z] RuntimeError: Engine process failed to start. See stack trace for the root cause.
[2025-07-14T16:52:39Z] �[32mPASSED�[0m

@eicherseiji
Copy link
Contributor Author

eicherseiji commented Jul 14, 2025

The PP run was successful, but the baseline (defaults to TP=1, PP=1) failed with OOM (ofc). Seems like #14219 inadvertently silenced all test_pipeline_parallel.py test failures with mp backend. Should be fixed now. Waiting for tests to run

@eicherseiji eicherseiji marked this pull request as ready for review July 14, 2025 22:45
@eicherseiji
Copy link
Contributor Author

Additional test failure surfaced after test bugfix: https://buildkite.com/vllm/ci/builds/23938#01980b35-a0f5-41c8-b364-731727cca107

[2025-07-15T01:08:37Z] Traceback (most recent call last):
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/utils.py", line 741, in wrapper
[2025-07-15T01:08:37Z]     f(*args, **kwargs)
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/distributed/test_pipeline_parallel.py", line 455, in test_tp_language_embedding
[2025-07-15T01:08:37Z]     _compare_tp(model_id,
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/distributed/test_pipeline_parallel.py", line 393, in _compare_tp
[2025-07-15T01:08:37Z]     compare_two_settings(model_id,
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/utils.py", line 467, in compare_two_settings
[2025-07-15T01:08:37Z]     compare_all_settings(
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/utils.py", line 531, in compare_all_settings
[2025-07-15T01:08:37Z]     with RemoteOpenAIServer(model,
[2025-07-15T01:08:37Z]          ^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T01:08:37Z]   File "/vllm-workspace/tests/utils.py", line 116, in __init__
[2025-07-15T01:08:37Z]     model_config = engine_args.create_model_config()
[2025-07-15T01:08:37Z]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T01:08:37Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 976, in create_model_config
[2025-07-15T01:08:37Z]     return ModelConfig(
[2025-07-15T01:08:37Z]            ^^^^^^^^^^^^
[2025-07-15T01:08:37Z]   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 120, in __init__
[2025-07-15T01:08:37Z]     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
[2025-07-15T01:08:37Z] pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
[2025-07-15T01:08:37Z]   Value error, The model type 'gemma2' does not support float16. Reason: Numerical instability. Please use bfloat16 or float32 instead. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
[2025-07-15T01:08:37Z]     For further information visit https://errors.pydantic.dev/2.11/v/value_error
[2025-07-15T01:08:38Z] FAILED
[...]
[2025-07-15T01:11:42Z] =========================== short test summary info ============================
[2025-07-15T01:11:42Z] FAILED distributed/test_pipeline_parallel.py::test_tp_language_embedding[BAAI/bge-multilingual-gemma2-parallel_setup1-mp-0-auto-test_options1] - AssertionError: function <function test_tp_language_embedding at 0x7fc76b74ed40> failed when called with args () and kwargs {'model_id': 'BAAI/bge-multilingual-gemma2', 'parallel_setup': ParallelSetup(tp_size=1, pp_size=2, eager_mode=True, chunked_prefill=False), 'distributed_backend': 'mp', 'vllm_major_version': '0', 'task': 'auto', 'test_options': PPTestOptions(multi_node_only=False, load_format=None), 'num_gpus_available': 4}

@eicherseiji
Copy link
Contributor Author

TPU test failures due to:

>   from vllm.model_executor.layers.fused_moe import fused_experts
E   ImportError: cannot import name 'fused_experts' from 'vllm.model_executor.layers.fused_moe' (/workspace/vllm/vllm/model_executor/layers/fused_moe/__init__.py)

Distributed test failure due to:

[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586] EngineCore failed to start.
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586] Traceback (most recent call last):
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 575, in run_engine_core
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     engine_core = DPEngineCoreProc(*args, **kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 835, in __init__
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     super().__init__(vllm_config, local_client, handshake_address,
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 404, in __init__
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     super().__init__(vllm_config, executor_class, log_stats,
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 75, in __init__
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     self.model_executor = executor_class(vllm_config)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 53, in __init__
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     self._init_executor()
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     self.collective_rpc("init_device")
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     answer = run_method(self.driver_worker, method, args, kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2943, in run_method
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     return func(*args, **kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 606, in init_device
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     self.worker.init_device()  # type: ignore
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     ^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 164, in init_device
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     init_worker_distributed_environment(self.vllm_config, self.rank,
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 413, in init_worker_distributed_environment
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     init_distributed_environment(parallel_config.world_size, rank,
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 952, in init_distributed_environment
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     torch.distributed.init_process_group(
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     return func(*args, **kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py", line 95, in wrapper
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     func_return = func(*args, **kwargs)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]                   ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 1710, in init_process_group
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     store, rank, world_size = next(rendezvous_iterator)
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]                               ^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/rendezvous.py", line 230, in _tcp_rendezvous_handler
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     store = _create_c10d_store(
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]             ^^^^^^^^^^^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/rendezvous.py", line 198, in _create_c10d_store
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]     return TCPStore(
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586]            ^^^^^^^^^
[2025-07-15T00:29:48Z] (EngineCore_0 pid=7123) ERROR 07-14 17:29:48 [core.py:586] torch.distributed.DistNetworkError: The server socket has failed to listen on any local network address. port: 46620, useIpv6: false, code: -98, name: EADDRINUSE, message: address already in use

These don't seem related to the test addition/fix.

@eicherseiji eicherseiji force-pushed the test-deepseek-v2-pp branch from 80600bb to 480beba Compare July 15, 2025 17:14
@eicherseiji
Copy link
Contributor Author

Looks like #20771 changed resolution of auto model runner type:

# vllm/vllm/config.py:928
        suffix_to_preferred_runner: list[tuple[str, RunnerType]] = [
            ("ForCausalLM", "generate"),
            ("ForConditionalGeneration", "generate"),
            ("ChatModel", "generate"),
            ("LMHeadModel", "generate"),
            ("ForSequenceClassification", "pooling"),
            ("EmbeddingModel", "pooling"),
            ("RewardModel", "pooling"),
        ]

Since intfloat/e5-mistral-7b-instruct has arch LlamaForCausalLM, embeddings requests will fail with The model does not support Embeddings API' if that task is not specified directly.

"intfloat/e5-mistral-7b-instruct": PPTestSettings.fast(),
"BAAI/bge-multilingual-gemma2": PPTestSettings.fast(),
"Qwen/Qwen2.5-Math-RM-72B": PPTestSettings.fast(load_format="dummy"),
"intfloat/e5-mistral-7b-instruct": PPTestSettings.fast(task="embed"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed due to #20771


dtype = "float16"
if hf_config.model_type in _FLOAT16_NOT_SUPPORTED_MODELS:
dtype = "bfloat16"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed since BAAI/bge-multilingual-gemma2 doesn't support float16

@eicherseiji
Copy link
Contributor Author

eicherseiji commented Jul 15, 2025

TPU v1 test also failing on main.

@ruisearch42 can you take a look when you get a chance? A few additional test fixes were needed to recover since the test was disabled. Thanks for your help so far!

@eicherseiji eicherseiji changed the title Add DeepSeek V2/V3 model family to PP tests PP tests for mp were disabled, add DeepSeek V2/V3 model family to PP tests Jul 15, 2025
@eicherseiji eicherseiji changed the title PP tests for mp were disabled, add DeepSeek V2/V3 model family to PP tests Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests Jul 16, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing!

@vllm-bot vllm-bot merged commit d0dc4cf into vllm-project:main Jul 16, 2025
47 of 49 checks passed
nadathurv pushed a commit to nadathurv/vllm that referenced this pull request Jul 16, 2025
hj-mistral pushed a commit to hj-mistral/vllm that referenced this pull request Jul 19, 2025
…l family to PP tests (vllm-project#20831)

Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Himanshu Jaju <[email protected]>
LyrisZhong pushed a commit to LyrisZhong/vllm that referenced this pull request Jul 23, 2025
avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants