-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Breaking change: perf: Enable scheduling overlap by default #4174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/bot run --disable-fail-fast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reverses the overlap scheduler flag by replacing the old “enable_overlap_scheduler” parameter with “disable_overlap_scheduler” (defaulted to False) across the codebase to enable scheduling overlap by default.
- Inverted the flag logic in all relevant modules (worker, executor, pyexecutor, model_engine, decoder, etc.).
- Updated CLI examples, documentation, and configuration files to reflect the new “disable_overlap_scheduler” naming and semantics.
Reviewed Changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tensorrt_llm/scaffolding/worker.py | Parameter renamed and updated in worker initialization. |
tensorrt_llm/executor/worker.py | Flag condition inverted for determining scheduling overlap. |
tensorrt_llm/commands/* | Removed legacy flag usage in serve and eval commands. |
tensorrt_llm/_torch/pyexecutor/* | Multiple files updated to invert flag logic consistently. |
examples/* and docs/* | CLI/example scripts and documentation now reference the new flag. |
PR_Github #4651 [ run ] triggered by Bot |
PR_Github #4651 [ run ] completed with state |
8e4b83f
to
dfd073f
Compare
/bot run --disable-fail-fast |
PR_Github #4692 [ run ] triggered by Bot |
PR_Github #4692 [ run ] completed with state |
dfd073f
to
3c8cf62
Compare
/bot run --disable-fail-fast |
PR_Github #4767 [ run ] triggered by Bot |
PR_Github #4767 [ run ] completed with state |
/bot run --disable-fail-fast |
PR_Github #4790 [ run ] triggered by Bot |
PR_Github #4790 [ run ] completed with state |
/bot run --disable-fail-fast --add-multi-gpu-test |
PR_Github #5075 [ run ] triggered by Bot |
PR_Github #5075 [ run ] completed with state |
/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Others-1" |
PR_Github #5127 [ run ] triggered by Bot |
PR_Github #5127 [ run ] completed with state |
fdeeb26
to
cc02fae
Compare
/bot run --disable-fail-fast --add-multi-gpu-test |
PR_Github #5171 [ run ] triggered by Bot |
...gregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml
Outdated
Show resolved
Hide resolved
PR_Github #5171 [ run ] completed with state |
Signed-off-by: Kaiyu Xie <[email protected]> Fix Signed-off-by: Kaiyu Xie <[email protected]> Fix test_ptp_quickstart_advanced_eagle3 Signed-off-by: Kaiyu Xie <[email protected]> Fix Signed-off-by: Kaiyu Xie <[email protected]> Fix Signed-off-by: Kaiyu Xie <[email protected]> Fix Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
cc02fae
to
a0ac332
Compare
/bot run --disable-fail-fast --add-multi-gpu-test |
PR_Github #5227 [ run ] triggered by Bot |
Pre-merge pipeline has passed, merging this in case there are going to be more conflicts. |
/bot skip --comment "pre-merge pipeline has passed" |
PR_Github #5278 [ skip ] triggered by Bot |
PR_Github #5227 [ run ] completed with state |
PR_Github #5278 [ skip ] completed with state |
This PR:
enable_overlap_scheduler
argument inPyTorchConfig
todisable_overlap_scheduler
.