Skip to content

Conversation

wili-65535
Copy link
Collaborator

Background

In current TRT-LLM, we regard the beam_width of the runtime as a scalar, which means:

  1. The same beam_width must be used for a request along all generation steps (time axis).
  2. The same beam_width must be used for requests batched together (space axis).

Final target

  • Loosening the constrains above as:
  1. Each request owns a beam_width_array for beam search. For example, --beam_width_array=[20,40,60] means using beam_width=20 for the first step, 40 for the second step, 60 for all following steps (we call it Variable-Beam-Width-Search, VBWS).
  2. Requests with different beam width can be batched together for generation (we call it Diverse-Beam-Width-Search, DBWS).

Target of this PR

We plan to implement the final target in 4 PRs, and this PR is the first part, where we achieve:

  1. Add member beamWidthArray and related methods for class SamplingConfig.
  2. Rewrite C++/Python unit tests for class SamplingConfig.
    • CPP unit test: cpp/tests/executor/SamplingConfigTest.cpp, cpp/tests/runtime/SamplingConfigTest.cpp.
    • Python unit test: tests/unittest/api_stability/test_llm_api.py, tests/bindings/test_bind‎ings_ut.py‎.

@wili-65535 wili-65535 force-pushed the feat/Variable-Beam-Width-Search-Part1 branch from be63ca6 to 9ec352c Compare March 26, 2025 01:04
@wili-65535
Copy link
Collaborator Author

/bot run

@wili-65535 wili-65535 changed the title v1.0 [feat] Variable-Beam-Width-Search (VBWS) part1 Mar 26, 2025
@wili-65535 wili-65535 changed the title [feat] Variable-Beam-Width-Search (VBWS) part1 feat: Variable-Beam-Width-Search (VBWS) part1 Mar 26, 2025
@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #481 [ run ] triggered by Bot

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #481 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #414 completed with status: 'FAILURE'

@wili-65535
Copy link
Collaborator Author

/bot run

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #487 [ run ] triggered by Bot

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #487 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #419 completed with status: 'FAILURE'

@wili-65535 wili-65535 force-pushed the feat/Variable-Beam-Width-Search-Part1 branch from c310a46 to 15bfd98 Compare March 26, 2025 05:39
@wili-65535
Copy link
Collaborator Author

/bot run

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #519 [ run ] triggered by Bot

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #519 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #444 completed with status: 'SUCCESS'

Copy link
Collaborator

@byshiue byshiue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@byshiue byshiue force-pushed the feat/Variable-Beam-Width-Search-Part1 branch from 15bfd98 to 21d41d3 Compare March 26, 2025 08:47
@byshiue
Copy link
Collaborator

byshiue commented Mar 26, 2025

/bot reuse-pipeline

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #551 [ reuse-pipeline ] triggered by Bot

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #551 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #519 for commit 21d41d3

@byshiue byshiue force-pushed the feat/Variable-Beam-Width-Search-Part1 branch from 6c12987 to e6b738a Compare March 26, 2025 09:30
@wili-65535 wili-65535 force-pushed the feat/Variable-Beam-Width-Search-Part1 branch from e6b738a to d70b74d Compare March 26, 2025 09:42
@byshiue
Copy link
Collaborator

byshiue commented Mar 26, 2025

/bot reuse-pipeline "the last commit only update the author message"

@byshiue byshiue enabled auto-merge (squash) March 26, 2025 09:43
@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #563 Bot args parsing error!

@byshiue
Copy link
Collaborator

byshiue commented Mar 26, 2025

/bot reuse-pipeline --comment "the last commit only update the author message"

@wili-65535 wili-65535 force-pushed the feat/Variable-Beam-Width-Search-Part1 branch from d70b74d to d9eb586 Compare March 26, 2025 10:41
@wili-65535
Copy link
Collaborator Author

/bot run

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #572 [ run ] triggered by Bot

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #572 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #486 completed with status: 'FAILURE'

@wili-65535 wili-65535 force-pushed the feat/Variable-Beam-Width-Search-Part1 branch from d9eb586 to 3716983 Compare March 26, 2025 12:50
@wili-65535
Copy link
Collaborator Author

/bot run

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #588 [ run ] triggered by Bot

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #588 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #500 completed with status: 'SUCCESS'

wili-65535 and others added 2 commits March 26, 2025 16:19
Signed-off-by: wili <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
@Funatiq Funatiq force-pushed the feat/Variable-Beam-Width-Search-Part1 branch from 3716983 to 88b9bd8 Compare March 26, 2025 15:19
@Funatiq
Copy link
Collaborator

Funatiq commented Mar 26, 2025

/bot reuse-pipeline

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #600 [ reuse-pipeline ] triggered by Bot

@niukuo
Copy link
Collaborator

niukuo commented Mar 26, 2025

PR_Github #600 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #588 for commit 88b9bd8

@byshiue byshiue merged commit 3e035f2 into NVIDIA:main Mar 26, 2025
2 checks passed
early_stopping (int, optional): Controls whether the generation process finishes once beamWidth sentences are generated (ends with end_token). None means using C++ runtime default 1. Defaults to None.
no_repeat_ngram_size (int, optional): Controls how many repeat ngram size are acceptable. None means using C++ runtime default 1 << 30. Defaults to None.
min_p (float, optional): scale the most likely token to determine the minimum token probability. None means using C++ runtime default 0.0. Defaults to None.
beam_width_array (List[int], optional): The array of beam width using in Variable-Beam-Width-Search. Defaults to None.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wili-65535 ,

Do you have plans to refine this PR?

  • The docstring is vague, users are unlikely to figure out how to use beam_width_array; Please elaborate a bit on this argument.
  • There is no LLM API test to verify this feature.
  • This argument is added to "committed" APIs; is it indeed committed? To reduce risks, I would suggest moving it to uncommitted references.

cc @Superjomn
Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @syuoni !
We plan to add the feature VBWS in four PRs (this PR is just the first part). The workflow will be enable to work (including document, unit tests and examples) after the final PR is merged. Before that time, we are adding the utils for the feature step by step.

Here beam_width_array is used in SamplingConfig and related tests, but not exposed to higher level APIs.

Thank you for your suggestion, I will move the argument as uncommitted references in the following PR.

min_p:
annotation: Optional[float]
default: null
beam_width_array:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wili-65535 The references_committed directory is for APIs committed for 1.0, we may need to move this new API into references directory instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it in the Part2 of this PR here (#3133).

@wili-65535 wili-65535 deleted the feat/Variable-Beam-Width-Search-Part1 branch April 9, 2025 09:59
wu1du2 pushed a commit to wu1du2/TensorRT-LLM that referenced this pull request May 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants