-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Feat: Variable-Beam-Width-Search (VBWS) part4 #3979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7e51257
to
866656c
Compare
cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp
Outdated
Show resolved
Hide resolved
866656c
to
17f32d0
Compare
PR_Github #4039 [ run ] completed with state |
c39197d
to
3925848
Compare
3925848
to
51cd2a6
Compare
When running beam_width_array=[6,8,128], an error is reported. [TensorRT-LLM][ERROR] Encountered an error in forwardSync function: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaEventSynchronize(get()): an illegal memory access was encountered (/code/tensorrt_llm/cpp/include/tensorrt_llm/runtime/cudaEvent.h:66) |
@shiqingzhangCSU I am not able to reproduce the error since I can run with "--beam_width_array=[6,8,128]" successfully, and my output is in the following file. I use output length 8 to reduce file size, but output length 1024 is also OK to run. You can search "Using Optimization Profile: 0" in the file to check the input shape of TRT engine in each generation step. Furthermore, could you provide more detailed information to reproduce the issue? (model name and size, build commands, run commands, etc.) |
51cd2a6
to
4b44bea
Compare
d9a7e24
to
acb2fa2
Compare
2e4714c
to
a377cc1
Compare
13483e0
to
eb849eb
Compare
eb849eb
to
12a9558
Compare
PR_Github #4709 [ run ] completed with state |
Signed-off-by: wili-65535 <[email protected]>
…ngth Signed-off-by: wili-65535 <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
12a9558
to
49068da
Compare
/bot run |
PR_Github #4818 [ run ] triggered by Bot |
PR_Github #4818 [ run ] completed with state |
Part 4 of VBWS support.
The progress of this PR
DecodeConfig
.llmReq->mSamplingConfig.beamWidth
withllmReq->getBeamWidthByIter()
in the runtime to get corresponding beam width of current generation step.MicroBatchScheduler
to schedule requests with same beam width into a batch.