Skip to content

Conversation

wili-65535
Copy link
Collaborator

Part 4 of VBWS support.

The progress of this PR

  • Complete end-to-end support of VBWS.
  • Remove all "useVariableBeamWidthSearch" flags in the runtime since we mark it in the DecodeConfig.
  • Replace llmReq->mSamplingConfig.beamWidth with llmReq->getBeamWidthByIter() in the runtime to get corresponding beam width of current generation step.
  • Adjust MicroBatchScheduler to schedule requests with same beam width into a batch.
  • Collect generation steps of each request in the decoder for later use by beam search layer.
  • Simplify members in beam search layer / kernels (needed in Diverse-Beam-Search, DBWS but needless in VBWS).
  • Add related unit tests (CPP + Python).
  • Update related document.
  • Other small refactor of comments and format of the code.

@wili-65535 wili-65535 force-pushed the feat/vbws-part4 branch 2 times, most recently from 7e51257 to 866656c Compare May 1, 2025 05:26
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 1, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 1, 2025
@wili-65535 wili-65535 requested a review from Funatiq May 1, 2025 05:27
@wili-65535 wili-65535 self-assigned this May 1, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 4, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 4, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #4039 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #2869 completed with status: 'SUCCESS'

@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 5, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@shiqingzhangCSU
Copy link

When running beam_width_array=[6,8,128], an error is reported.

[TensorRT-LLM][ERROR] Encountered an error in forwardSync function: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaEventSynchronize(get()): an illegal memory access was encountered (/code/tensorrt_llm/cpp/include/tensorrt_llm/runtime/cudaEvent.h:66)
1 0x7f588642df3b void tensorrt_llm::common::check(cudaError, char const*, char const*, int) + 139
2 0x7f58871053bb tensorrt_llm::batch_manager::TrtGptModelInflightBatching::decoderSync(tensorrt_llm::batch_manager::ScheduledRequests const&, std::optional<tensorrt_llm::runtime::CudaEvent> const&) + 299
3 0x7f588710618c tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardSync() + 716
4 0x7f58871e0228 tensorrt_llm::executor::Executor::Impl::forwardSync(std::__cxx11::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 72
5 0x7f58871eb539 tensorrt_llm::executor::Executor::Impl::executionLoop() + 585
6 0x7f59db6c7db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f59db6c7db4]
7 0x7f5a72c9caa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f5a72c9caa4]
8 0x7f5a72d29c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f5a72d29c3c]

@wili-65535
Copy link
Collaborator Author

wili-65535 commented May 6, 2025

When running beam_width_array=[6,8,128], an error is reported.

[TensorRT-LLM][ERROR] Encountered an error in forwardSync function: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaEventSynchronize(get()): an illegal memory access was encountered (/code/tensorrt_llm/cpp/include/tensorrt_llm/runtime/cudaEvent.h:66) 1 0x7f588642df3b void tensorrt_llm::common::check(cudaError, char const*, char const*, int) + 139 2 0x7f58871053bb tensorrt_llm::batch_manager::TrtGptModelInflightBatching::decoderSync(tensorrt_llm::batch_manager::ScheduledRequests const&, std::optional<tensorrt_llm::runtime::CudaEvent> const&) + 299 3 0x7f588710618c tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardSync() + 716 4 0x7f58871e0228 tensorrt_llm::executor::Executor::Impl::forwardSync(std::__cxx11::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 72 5 0x7f58871eb539 tensorrt_llm::executor::Executor::Impl::executionLoop() + 585 6 0x7f59db6c7db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f59db6c7db4] 7 0x7f5a72c9caa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f5a72c9caa4] 8 0x7f5a72d29c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f5a72d29c3c]

@shiqingzhangCSU I am not able to reproduce the error since I can run with "--beam_width_array=[6,8,128]" successfully, and my output is in the following file.
log-6_8_128.log

I use output length 8 to reduce file size, but output length 1024 is also OK to run. You can search "Using Optimization Profile: 0" in the file to check the input shape of TRT engine in each generation step.

Furthermore, could you provide more detailed information to reproduce the issue? (model name and size, build commands, run commands, etc.)

@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 6, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 9, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 9, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #4709 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3396 completed with status: 'SUCCESS'

@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 12, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 12, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd May 12, 2025
@wili-65535
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4818 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4818 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3488 completed with status: 'SUCCESS'

@Funatiq Funatiq merged commit eba3623 into NVIDIA:main May 12, 2025
3 checks passed
@wili-65535 wili-65535 deleted the feat/vbws-part4 branch May 12, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants