-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Variable-Beam-Width-Search (VBWS) part1 #3082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Variable-Beam-Width-Search (VBWS) part1 #3082
Conversation
be63ca6
to
9ec352c
Compare
/bot run |
PR_Github #481 [ run ] triggered by Bot |
PR_Github #481 [ run ] completed with state |
/bot run |
PR_Github #487 [ run ] triggered by Bot |
PR_Github #487 [ run ] completed with state |
c310a46
to
15bfd98
Compare
/bot run |
PR_Github #519 [ run ] triggered by Bot |
PR_Github #519 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
15bfd98
to
21d41d3
Compare
/bot reuse-pipeline |
PR_Github #551 [ reuse-pipeline ] triggered by Bot |
PR_Github #551 [ reuse-pipeline ] completed with state |
6c12987
to
e6b738a
Compare
e6b738a
to
d70b74d
Compare
/bot reuse-pipeline "the last commit only update the author message" |
PR_Github #563 Bot args parsing error! |
/bot reuse-pipeline --comment "the last commit only update the author message" |
d70b74d
to
d9eb586
Compare
/bot run |
PR_Github #572 [ run ] triggered by Bot |
PR_Github #572 [ run ] completed with state |
d9eb586
to
3716983
Compare
/bot run |
PR_Github #588 [ run ] triggered by Bot |
PR_Github #588 [ run ] completed with state |
Signed-off-by: wili <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
3716983
to
88b9bd8
Compare
/bot reuse-pipeline |
PR_Github #600 [ reuse-pipeline ] triggered by Bot |
PR_Github #600 [ reuse-pipeline ] completed with state |
early_stopping (int, optional): Controls whether the generation process finishes once beamWidth sentences are generated (ends with end_token). None means using C++ runtime default 1. Defaults to None. | ||
no_repeat_ngram_size (int, optional): Controls how many repeat ngram size are acceptable. None means using C++ runtime default 1 << 30. Defaults to None. | ||
min_p (float, optional): scale the most likely token to determine the minimum token probability. None means using C++ runtime default 0.0. Defaults to None. | ||
beam_width_array (List[int], optional): The array of beam width using in Variable-Beam-Width-Search. Defaults to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wili-65535 ,
Do you have plans to refine this PR?
- The docstring is vague, users are unlikely to figure out how to use
beam_width_array
; Please elaborate a bit on this argument. - There is no LLM API test to verify this feature.
- This argument is added to "committed" APIs; is it indeed committed? To reduce risks, I would suggest moving it to uncommitted references.
cc @Superjomn
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @syuoni !
We plan to add the feature VBWS in four PRs (this PR is just the first part). The workflow will be enable to work (including document, unit tests and examples) after the final PR is merged. Before that time, we are adding the utils for the feature step by step.
Here beam_width_array
is used in SamplingConfig and related tests, but not exposed to higher level APIs.
Thank you for your suggestion, I will move the argument as uncommitted references in the following PR.
min_p: | ||
annotation: Optional[float] | ||
default: null | ||
beam_width_array: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wili-65535 The references_committed
directory is for APIs committed for 1.0, we may need to move this new API into references
directory instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it in the Part2 of this PR here (#3133).
Signed-off-by: wili <[email protected]>
Background
In current TRT-LLM, we regard the
beam_width
of the runtime as a scalar, which means:same beam_width
must be used for a request along all generation steps (time axis).same beam_width
must be used for requests batched together (space axis).Final target
beam_width_array
for beam search. For example,--beam_width_array=[20,40,60]
means usingbeam_width=20
for the first step, 40 for the second step, 60 for all following steps (we call it Variable-Beam-Width-Search, VBWS).Target of this PR
We plan to implement the final target in 4 PRs, and this PR is the first part, where we achieve:
beamWidthArray
and related methods for classSamplingConfig
.SamplingConfig
.cpp/tests/executor/SamplingConfigTest.cpp
,cpp/tests/runtime/SamplingConfigTest.cpp
.tests/unittest/api_stability/test_llm_api.py
,tests/bindings/test_bindings_ut.py
.