🙋 Add Optional Eager Execution Mode for vLLM Serving #3335

ucalyptus · 2025-04-21T20:21:46Z

Overview

This pull request introduces a new optional flag, enforce_eager, to the VLLM serving script. This enhancement gives users control over the execution mode of the model, allowing them to choose between eager execution or the default hybrid CUDA graph approach based on their specific needs. See https://docs.vllm.ai/en/stable/api/offline_inference/llm.html#vllm.LLM

Changes Made

Added a new boolean parameter enforce_eager_flag to the ScriptArguments class in trl/scripts/vllm_serve.py
Updated the main function to pass this flag as enforce_eager to the model configuration
Set the default value to False to maintain backward compatibility with existing behavior

Benefits

Memory Optimization: Enforcing eager execution can reduce memory usage in certain scenarios
Performance Flexibility: Allows users to choose between memory efficiency and potential performance gains
Better Control: Provides explicit control over execution strategy without modifying source code

Technical Details

The changes primarily affect:

trl/scripts/vllm_serve.py - Added new flag to the argument parser and passed it to the model configuration

Testing

Verified the flag works correctly in both states:

When enforce_eager_flag=True: Model uses pure eager execution
When enforce_eager_flag=False (default): Model uses hybrid CUDA graph execution

Related Issues

Fixes #XXXX

Checklist Before Submitting

Read the [contributor guidelines](https://github.com/huggingface/trl/blob/main/CONTRIBUTING.md#create-a-pull-request)
Updated documentation to reflect the new parameter
Tested the feature with different flag values
Maintained backward compatibility

Who Can Review?

Anyone in the community is welcome to review once tests have passed. Team members with expertise in model serving or CUDA optimization would be particularly valuable reviewers.

Enable the user to set Eager Execution instead of building cuda graph to save memory.

Update vllm_serve.py

ucalyptus · 2025-04-21T20:21:57Z

@qgallouedec

trl/scripts/vllm_serve.py

Co-authored-by: Quentin Gallouédec <[email protected]>

trl/scripts/vllm_serve.py

Co-authored-by: Quentin Gallouédec <[email protected]>

qgallouedec

LGTM! Thanks

HuggingFaceDocBuilderDev · 2025-04-21T22:18:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ucalyptus added 2 commits April 21, 2025 16:07

Update vllm_serve.py

c7fdd9c

Enable the user to set Eager Execution instead of building cuda graph to save memory.

Merge pull request #1 from ucalyptus/feat/enforce-eager-vllm

6d8e969

Update vllm_serve.py

qgallouedec reviewed Apr 21, 2025

View reviewed changes

trl/scripts/vllm_serve.py Outdated Show resolved Hide resolved

qgallouedec reviewed Apr 21, 2025

View reviewed changes

trl/scripts/vllm_serve.py Outdated Show resolved Hide resolved

ucalyptus and others added 3 commits April 21, 2025 16:38

Update trl/scripts/vllm_serve.py

c7ad2b7

Co-authored-by: Quentin Gallouédec <[email protected]>

Update trl/scripts/vllm_serve.py

a6ea80d

Co-authored-by: Quentin Gallouédec <[email protected]>

Update vllm_serve.py

3f59257

qgallouedec reviewed Apr 21, 2025

View reviewed changes

trl/scripts/vllm_serve.py Outdated Show resolved Hide resolved

ucalyptus and others added 3 commits April 21, 2025 16:49

Update trl/scripts/vllm_serve.py

60118d1

Co-authored-by: Quentin Gallouédec <[email protected]>

Merge branch 'main' into main

e1d57d9

doc and style

26679e7

qgallouedec approved these changes Apr 21, 2025

View reviewed changes

qgallouedec changed the title ~~Add Optional Eager Execution Mode for VLLM Serving~~ 🙋 Add Optional Eager Execution Mode for vLLM Serving Apr 21, 2025

qgallouedec merged commit b4ffda7 into huggingface:main Apr 21, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🙋 Add Optional Eager Execution Mode for vLLM Serving #3335

🙋 Add Optional Eager Execution Mode for vLLM Serving #3335

Uh oh!

ucalyptus commented Apr 21, 2025

Uh oh!

ucalyptus commented Apr 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 21, 2025

Uh oh!

Uh oh!

Uh oh!

🙋 Add Optional Eager Execution Mode for vLLM Serving #3335

🙋 Add Optional Eager Execution Mode for vLLM Serving #3335

Uh oh!

Conversation

ucalyptus commented Apr 21, 2025

Overview

Changes Made

Benefits

Technical Details

Testing

Related Issues

Checklist Before Submitting

Who Can Review?

Uh oh!

ucalyptus commented Apr 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 21, 2025

Uh oh!

Uh oh!

Uh oh!