Feat: Variable-Beam-Width-Search (VBWS) part4 #3979

wili-65535 · 2025-04-30T07:57:24Z

Part 4 of VBWS support.

Previous PRs:
- Part 1: link
- Part 2: link
- Part 3: link

The progress of this PR

Complete end-to-end support of VBWS.
Remove all "useVariableBeamWidthSearch" flags in the runtime since we mark it in the DecodeConfig.
Replace llmReq->mSamplingConfig.beamWidth with llmReq->getBeamWidthByIter() in the runtime to get corresponding beam width of current generation step.
Adjust MicroBatchScheduler to schedule requests with same beam width into a batch.
Collect generation steps of each request in the decoder for later use by beam search layer.
Simplify members in beam search layer / kernels (needed in Diverse-Beam-Search, DBWS but needless in VBWS).
Add related unit tests (CPP + Python).
Update related document.
Other small refactor of comments and format of the code.

cpp/tensorrt_llm/runtime/tllmRuntime.h

cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp

cpp/include/tensorrt_llm/runtime/promptTuningParams.h

cpp/include/tensorrt_llm/batch_manager/llmRequest.h

tensorrt-cicd · 2025-05-04T22:40:46Z

PR_Github #4039 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #2869 completed with status: 'SUCCESS'

shiqingzhangCSU · 2025-05-06T11:44:21Z

When running beam_width_array=[6,8,128], an error is reported.

[TensorRT-LLM][ERROR] Encountered an error in forwardSync function: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaEventSynchronize(get()): an illegal memory access was encountered (/code/tensorrt_llm/cpp/include/tensorrt_llm/runtime/cudaEvent.h:66)
1 0x7f588642df3b void tensorrt_llm::common::check(cudaError, char const*, char const*, int) + 139
2 0x7f58871053bb tensorrt_llm::batch_manager::TrtGptModelInflightBatching::decoderSync(tensorrt_llm::batch_manager::ScheduledRequests const&, std::optional<tensorrt_llm::runtime::CudaEvent> const&) + 299
3 0x7f588710618c tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardSync() + 716
4 0x7f58871e0228 tensorrt_llm::executor::Executor::Impl::forwardSync(std::__cxx11::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 72
5 0x7f58871eb539 tensorrt_llm::executor::Executor::Impl::executionLoop() + 585
6 0x7f59db6c7db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f59db6c7db4]
7 0x7f5a72c9caa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f5a72c9caa4]
8 0x7f5a72d29c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f5a72d29c3c]

wili-65535 · 2025-05-06T13:12:51Z

When running beam_width_array=[6,8,128], an error is reported.

[TensorRT-LLM][ERROR] Encountered an error in forwardSync function: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaEventSynchronize(get()): an illegal memory access was encountered (/code/tensorrt_llm/cpp/include/tensorrt_llm/runtime/cudaEvent.h:66) 1 0x7f588642df3b void tensorrt_llm::common::check(cudaError, char const*, char const*, int) + 139 2 0x7f58871053bb tensorrt_llm::batch_manager::TrtGptModelInflightBatching::decoderSync(tensorrt_llm::batch_manager::ScheduledRequests const&, std::optional<tensorrt_llm::runtime::CudaEvent> const&) + 299 3 0x7f588710618c tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardSync() + 716 4 0x7f58871e0228 tensorrt_llm::executor::Executor::Impl::forwardSync(std::__cxx11::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 72 5 0x7f58871eb539 tensorrt_llm::executor::Executor::Impl::executionLoop() + 585 6 0x7f59db6c7db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f59db6c7db4] 7 0x7f5a72c9caa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f5a72c9caa4] 8 0x7f5a72d29c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f5a72d29c3c]

@shiqingzhangCSU I am not able to reproduce the error since I can run with "--beam_width_array=[6,8,128]" successfully, and my output is in the following file.
log-6_8_128.log

I use output length 8 to reduce file size, but output length 1024 is also OK to run. You can search "Using Optimization Profile: 0" in the file to check the input shape of TRT engine in each generation step.

Furthermore, could you provide more detailed information to reproduce the issue? (model name and size, build commands, run commands, etc.)

tensorrt-cicd · 2025-05-09T17:20:33Z

PR_Github #4709 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3396 completed with status: 'SUCCESS'

Signed-off-by: wili-65535 <[email protected]>

…ngth Signed-off-by: wili-65535 <[email protected]>

Signed-off-by: wili-65535 <[email protected]>

wili-65535 · 2025-05-12T03:21:49Z

/bot run

tensorrt-cicd · 2025-05-12T03:27:20Z

PR_Github #4818 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-12T17:16:28Z

PR_Github #4818 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3488 completed with status: 'SUCCESS'

wili-65535 force-pushed the feat/vbws-part4 branch 2 times, most recently from 7e51257 to 866656c Compare May 1, 2025 05:26

NVIDIA deleted a comment from tensorrt-cicd May 1, 2025

wili-65535 requested a review from Funatiq May 1, 2025 05:27

wili-65535 self-assigned this May 1, 2025

Funatiq reviewed May 2, 2025

View reviewed changes

wili-65535 force-pushed the feat/vbws-part4 branch from 866656c to 17f32d0 Compare May 4, 2025 13:20

NVIDIA deleted a comment from tensorrt-cicd May 4, 2025

wili-65535 force-pushed the feat/vbws-part4 branch from c39197d to 3925848 Compare May 5, 2025 11:44

NVIDIA deleted a comment from tensorrt-cicd May 5, 2025

Funatiq approved these changes May 5, 2025

View reviewed changes

wili-65535 force-pushed the feat/vbws-part4 branch from 3925848 to 51cd2a6 Compare May 6, 2025 05:35

NVIDIA deleted a comment from tensorrt-cicd May 6, 2025

wili-65535 force-pushed the feat/vbws-part4 branch from 51cd2a6 to 4b44bea Compare May 6, 2025 13:27

NVIDIA deleted a comment from tensorrt-cicd May 6, 2025

wili-65535 force-pushed the feat/vbws-part4 branch from d9a7e24 to acb2fa2 Compare May 6, 2025 14:53

NVIDIA deleted a comment from tensorrt-cicd May 6, 2025

wili-65535 force-pushed the feat/vbws-part4 branch from 2e4714c to a377cc1 Compare May 6, 2025 23:07

NVIDIA deleted a comment from tensorrt-cicd May 8, 2025

wili-65535 force-pushed the feat/vbws-part4 branch from 13483e0 to eb849eb Compare May 8, 2025 07:05

NVIDIA deleted a comment from tensorrt-cicd May 8, 2025

wili-65535 force-pushed the feat/vbws-part4 branch from eb849eb to 12a9558 Compare May 8, 2025 11:07

NVIDIA deleted a comment from tensorrt-cicd May 8, 2025

NVIDIA deleted a comment from tensorrt-cicd May 9, 2025

wili-65535 added 7 commits May 12, 2025 08:21

feat/vbws-part4-v1.8: rebase

8372f90

Signed-off-by: wili-65535 <[email protected]>

feat/vbws-part4-v1.9: fix incorrect output when using short output le…

4b72542

…ngth Signed-off-by: wili-65535 <[email protected]>

v1.9.1: remove useless variables

322e9f6

Signed-off-by: wili-65535 <[email protected]>

v1.9.2:fix incorrect output when using short output length

834346a

Signed-off-by: wili-65535 <[email protected]>

v1.9.3: rebase

383ab9c

Signed-off-by: wili-65535 <[email protected]>

v1.9.4: rebase

7838325

Signed-off-by: wili-65535 <[email protected]>

v1.9.5: remove API change

49068da

Signed-off-by: wili-65535 <[email protected]>

wili-65535 force-pushed the feat/vbws-part4 branch from 12a9558 to 49068da Compare May 12, 2025 03:21

NVIDIA deleted a comment from tensorrt-cicd May 12, 2025

Funatiq merged commit eba3623 into NVIDIA:main May 12, 2025
3 checks passed

wili-65535 deleted the feat/vbws-part4 branch May 12, 2025 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Variable-Beam-Width-Search (VBWS) part4 #3979

Feat: Variable-Beam-Width-Search (VBWS) part4 #3979

Uh oh!

wili-65535 commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented May 4, 2025

Uh oh!

shiqingzhangCSU commented May 6, 2025

Uh oh!

wili-65535 commented May 6, 2025 •

edited

Loading

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

wili-65535 commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Feat: Variable-Beam-Width-Search (VBWS) part4 #3979

Feat: Variable-Beam-Width-Search (VBWS) part4 #3979

Uh oh!

Conversation

wili-65535 commented Apr 30, 2025

Part 4 of VBWS support.

The progress of this PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented May 4, 2025

Uh oh!

shiqingzhangCSU commented May 6, 2025

Uh oh!

wili-65535 commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

wili-65535 commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wili-65535 commented May 6, 2025 •

edited

Loading