[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model #6104

mikeiovine · 2025-07-16T20:03:23Z

Description

This PR adds chunked prefill support to the 2-model spec decode flow. In this design, prefill chunks are sent to the draft model immediately after they are processed by the target.

One consequence of this setup is that we'll have to load the draft model on prefill workers for disagg scenarios.

Test Coverage

Added new unit test for both one model and 2 model. Manually verified that AR is the same on a set of long prompts after enabling chunked prefill.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

New Features
- Improved handling of chunked prefill processing for draft models, allowing more efficient and synchronized processing of input chunks.
Enhancements
- Added tracking of the last processed context chunk for each request, ensuring better management of draft requests and token generation during chunked prefill scenarios.
Tests
- Extended tests to cover chunked prefill scenarios with varied prompt inputs and token limits.

coderabbitai · 2025-07-16T20:03:29Z

Walkthrough

The changes introduce a new attribute, py_last_context_chunk, to track context chunk boundaries in the LlmRequest class and update it during request state transitions. The speculative draft model logic is extended to handle chunked prefill scenarios, ensuring correct draft request management and synchronization between target and draft models. The test suite is also enhanced to cover chunked prefill cases.

Changes

File(s)	Change Summary
tensorrt_llm/_torch/pyexecutor/llm_request.py	Added `py_last_context_chunk` attribute (initialized as `(None, None)`) to `LlmRequest` class constructor.
tensorrt_llm/_torch/pyexecutor/py_executor.py	Updated `_update_request_states_tp` to set `py_last_context_chunk` for each context request before chunk advance.
tensorrt_llm/_torch/speculative/model_drafter.py	Enhanced draft model logic to handle chunked prefill, using `py_last_context_chunk` for chunk tracking and sync; renamed `_create_chunked_context_request` to `_create_accepted_tokens_request`.
tests/unittest/_torch/speculative/test_eagle3.py	Extended `test_llama_eagle3` to parameterize and test chunked prefill scenarios with adjusted prompts and configs.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PyExecutor
    participant LlmRequest
    participant ModelDrafter

    User->>PyExecutor: Submit request
    PyExecutor->>LlmRequest: Initialize (py_last_context_chunk = (None, None))
    loop For each context chunk
        PyExecutor->>LlmRequest: Update py_last_context_chunk (start, end)
        PyExecutor->>ModelDrafter: Prepare draft batch (with chunk info)
        ModelDrafter->>LlmRequest: Create/Update context request with chunk info
        ModelDrafter->>ModelDrafter: Process decoded tokens (synchronize with target)
    end

Estimated code review effort

2 (~20 minutes)

Possibly related PRs

[TRTLLM-6352][feat] Migrate EAGLE3 and draft/target speculation to Drafter #6007: Refactors speculative decoding by migrating draft handling to a new ModelDrafter class and removes old draft batch preparation methods from PyExecutor. Related by domain but no direct overlap in modified methods or attributes.

Suggested labels

Community want to contribute

Suggested reviewers

HuiGao-NV
yilin-void
qiaoxj07

Poem

In the warren of code, a chunk hops anew,
Tracking its journey, from start point to through.
Drafts now aligned, in perfect prefill,
Synchrony hopping, with rabbit-like skill.
Each chunk accounted, no tokens astray—
The LLM’s request hops smarter today! 🐇

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2381142 and 875fba9.

📒 Files selected for processing (1)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tensorrt_llm/_torch/pyexecutor/py_executor.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

mikeiovine · 2025-07-17T16:51:46Z

/bot run

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/unittest/_torch/speculative/test_eagle3.py (1)

78-90: Address line length violation and improve prompt readability.

The long prompt string on line 81 exceeds the 120 character limit flagged by static analysis.

-        prompts = [
-            "The capital of France is a city of romance, art, fashion, and cuisine. Paris is a must-visit destination for anyone who loves history, architecture, and culture. From the iconic Eiffel Tower to the world-famous Louvre Museum, Paris has something to offer for every interest and age.\nThe city is divided into 20 arrondissements, each with its own unique character and charm. The Latin Quarter is a popular area for students and young travelers, while the Champs-Élysées is a hub for shopping and dining. The Montmartre neighborhood is famous for its bohemian vibe and stunning views of the city.\nParis is also known for its beautiful parks and gardens, such as the Luxembourg Gardens and the Tuileries Garden. The city has a rich history, with landmarks like the Notre-Dame Cathedral and the Arc de Triomphe. Visitors can also explore the city's many museums, including the Musée d'Orsay and the Musée Rodin.\nIn addition to its cultural and historical attractions, Paris is also a great destination for foodies. The city is famous for its cuisine, including croissants, baguettes, and cheese. Visitors can sample the city's famous dishes at one of the many restaurants, cafes, and "
-        ]
+        prompts = [
+            ("The capital of France is a city of romance, art, fashion, and cuisine. "
+             "Paris is a must-visit destination for anyone who loves history, architecture, and culture. "
+             "From the iconic Eiffel Tower to the world-famous Louvre Museum, Paris has something to offer "
+             "for every interest and age.\nThe city is divided into 20 arrondissements, each with its own "
+             "unique character and charm. The Latin Quarter is a popular area for students and young travelers, "
+             "while the Champs-Élysées is a hub for shopping and dining. The Montmartre neighborhood is famous "
+             "for its bohemian vibe and stunning views of the city.\nParis is also known for its beautiful "
+             "parks and gardens, such as the Luxembourg Gardens and the Tuileries Garden. The city has a rich "
+             "history, with landmarks like the Notre-Dame Cathedral and the Arc de Triomphe. Visitors can also "
+             "explore the city's many museums, including the Musée d'Orsay and the Musée Rodin.\nIn addition "
+             "to its cultural and historical attractions, Paris is also a great destination for foodies. The "
+             "city is famous for its cuisine, including croissants, baguettes, and cheese. Visitors can sample "
+             "the city's famous dishes at one of the many restaurants, cafes, and ")
+        ]

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 840fb61 and 95ab84c.

📒 Files selected for processing (4)

tensorrt_llm/_torch/pyexecutor/llm_request.py (1 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor.py (1 hunks)
tensorrt_llm/_torch/speculative/model_drafter.py (5 hunks)
tests/unittest/_torch/speculative/test_eagle3.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tensorrt_llm/_torch/pyexecutor/llm_request.py
tensorrt_llm/_torch/pyexecutor/py_executor.py

🧰 Additional context used

🧬 Code Graph Analysis (1)

tests/unittest/_torch/speculative/test_eagle3.py (2)

tensorrt_llm/llmapi/llm.py (2)

tokenizer (657-661)

tokenizer (664-665)

tests/unittest/llmapi/test_llm.py (1)

encode (308-309)

🪛 Ruff (0.12.2)

tests/unittest/_torch/speculative/test_eagle3.py

81-81: Line too long (1197 > 120)

(E501)

🔇 Additional comments (7)

tests/unittest/_torch/speculative/test_eagle3.py (2)

17-31: Test coverage for chunked prefill looks comprehensive.

The parametrize decorator appropriately adds enable_chunked_prefill parameter with test cases covering both chunked and non-chunked scenarios across different configurations.

62-66: Configuration for chunked prefill is correctly implemented.

The conditional logic properly enables chunked prefill and reduces max_num_tokens to 64 to trigger the chunked prefill code path, which aligns with the test objectives.

tensorrt_llm/_torch/speculative/model_drafter.py (5)

79-88: Context request creation properly handles chunked prefill boundaries.

The method correctly extracts chunk boundaries from py_last_context_chunk and sets context_current_position and context_chunk_size appropriately for chunked prefill scenarios.

103-118: Method rename improves clarity and maintains correct logic.

The rename from _create_chunked_context_request to _create_accepted_tokens_request better describes the method's purpose. The logic for handling accepted tokens in chunked context remains correct.

180-194: Chunked prefill handling in draft batch preparation is well-implemented.

The logic correctly:

Skips requests with context_current_position == 0 (still need target model processing)

Handles chunked prefill by reconstructing input tokens and creating context requests

Properly integrates with the existing draft batch workflow

285-289: Token processing correctly defers draft token addition for chunked prefill.

The logic appropriately checks if the target model request is not in GENERATION_IN_PROGRESS state and defers adding draft tokens until the entire prompt is processed, while properly freeing resources.

142-143: Method call update aligns with the renamed method.

The call to _create_accepted_tokens_request correctly reflects the method rename and maintains the same parameters.

tensorrt-cicd · 2025-07-17T16:57:17Z

PR_Github #12222 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-17T17:14:17Z

PR_Github #12222 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9076 completed with status: 'FAILURE'

mikeiovine · 2025-07-17T17:48:58Z

/bot run

tensorrt-cicd · 2025-07-17T17:54:57Z

PR_Github #12227 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-17T18:12:44Z

PR_Github #12227 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9081 completed with status: 'FAILURE'

tensorrt_llm/_torch/speculative/model_drafter.py

mikeiovine · 2025-07-18T16:17:58Z

/bot run

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/unittest/_torch/speculative/test_eagle3.py (1)

78-90: Fix the line length violation for better readability.

The prompt selection logic is well-implemented and appropriate for testing chunked prefill functionality. However, the long prompt string on line 81 exceeds the 120-character limit.

Consider breaking the long prompt into multiple lines for better readability:

-            "The capital of France is a city of romance, art, fashion, and cuisine. Paris is a must-visit destination for anyone who loves history, architecture, and culture. From the iconic Eiffel Tower to the world-famous Louvre Museum, Paris has something to offer for every interest and age.\nThe city is divided into 20 arrondissements, each with its own unique character and charm. The Latin Quarter is a popular area for students and young travelers, while the Champs-Élysées is a hub for shopping and dining. The Montmartre neighborhood is famous for its bohemian vibe and stunning views of the city.\nParis is also known for its beautiful parks and gardens, such as the Luxembourg Gardens and the Tuileries Garden. The city has a rich history, with landmarks like the Notre-Dame Cathedral and the Arc de Triomphe. Visitors can also explore the city's many museums, including the Musée d'Orsay and the Musée Rodin.\nIn addition to its cultural and historical attractions, Paris is also a great destination for foodies. The city is famous for its cuisine, including croissants, baguettes, and cheese. Visitors can sample the city's famous dishes at one of the many restaurants, cafes, and "
+            ("The capital of France is a city of romance, art, fashion, and cuisine. "
+             "Paris is a must-visit destination for anyone who loves history, architecture, and culture. "
+             "From the iconic Eiffel Tower to the world-famous Louvre Museum, Paris has something to offer for every interest and age.\n"
+             "The city is divided into 20 arrondissements, each with its own unique character and charm. "
+             "The Latin Quarter is a popular area for students and young travelers, while the Champs-Élysées is a hub for shopping and dining. "
+             "The Montmartre neighborhood is famous for its bohemian vibe and stunning views of the city.\n"
+             "Paris is also known for its beautiful parks and gardens, such as the Luxembourg Gardens and the Tuileries Garden. "
+             "The city has a rich history, with landmarks like the Notre-Dame Cathedral and the Arc de Triomphe. "
+             "Visitors can also explore the city's many museums, including the Musée d'Orsay and the Musée Rodin.\n"
+             "In addition to its cultural and historical attractions, Paris is also a great destination for foodies. "
+             "The city is famous for its cuisine, including croissants, baguettes, and cheese. "
+             "Visitors can sample the city's famous dishes at one of the many restaurants, cafes, and ")

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8c5d0b and 92b4a83.

📒 Files selected for processing (4)

tensorrt_llm/_torch/pyexecutor/llm_request.py (1 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor.py (1 hunks)
tensorrt_llm/_torch/speculative/model_drafter.py (5 hunks)
tests/unittest/_torch/speculative/test_eagle3.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

tensorrt_llm/_torch/pyexecutor/llm_request.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/speculative/model_drafter.py

🧰 Additional context used

🪛 Ruff (0.12.2)

tests/unittest/_torch/speculative/test_eagle3.py

81-81: Line too long (1197 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (3)

tests/unittest/_torch/speculative/test_eagle3.py (3)

16-27: LGTM! Comprehensive test coverage for chunked prefill feature.

The parameterization correctly adds the new enable_chunked_prefill parameter and includes test cases for both single-model and two-model scenarios with chunked prefill enabled. The existing test cases are preserved to maintain backward compatibility.

31-31: Function signature properly updated.

The function signature correctly includes the new enable_chunked_prefill parameter with proper type annotation.

62-66: Well-implemented chunked prefill configuration.

The configuration correctly enables chunked prefill and sets max_num_tokens to 64 to ensure the chunked prefill code path is exercised during testing. The comment provides clear context for this choice.

tensorrt-cicd · 2025-07-18T16:23:06Z

PR_Github #12330 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-18T18:55:09Z

PR_Github #12330 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9160 completed with status: 'FAILURE'

mikeiovine · 2025-07-18T19:05:42Z

/bot run

tensorrt-cicd · 2025-07-18T19:11:16Z

PR_Github #12345 [ run ] triggered by Bot

mikeiovine · 2025-07-23T14:48:31Z

/bot run

tensorrt-cicd · 2025-07-23T14:53:31Z

PR_Github #12719 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-23T18:44:25Z

PR_Github #12719 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9466 completed with status: 'FAILURE'

Signed-off-by: Mike Iovine <[email protected]>

mikeiovine · 2025-07-23T19:02:20Z

/bot run

tensorrt-cicd · 2025-07-23T19:08:44Z

PR_Github #12740 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-23T22:26:37Z

PR_Github #12740 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9485 completed with status: 'FAILURE'

mikeiovine · 2025-07-24T14:48:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-24T14:53:48Z

PR_Github #12866 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-24T18:17:09Z

PR_Github #12866 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9589 completed with status: 'FAILURE'

mikeiovine · 2025-07-24T18:25:07Z

/bot run

tensorrt-cicd · 2025-07-24T18:31:05Z

PR_Github #12888 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-25T00:13:20Z

PR_Github #12888 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9608 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

…VIDIA#6104) Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>

…VIDIA#6104) Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>

…VIDIA#6104) Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>

mikeiovine changed the title ~~[feat] Support chunked prefill on spec decode 2 model~~ [TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model Jul 16, 2025

mikeiovine force-pushed the chunked-prefill-spec-dec branch 5 times, most recently from 840fb61 to 95ab84c Compare July 17, 2025 16:48

mikeiovine requested a review from ziyixiong-nv July 17, 2025 16:50

mikeiovine marked this pull request as ready for review July 17, 2025 16:51

mikeiovine requested review from a team as code owners July 17, 2025 16:51

mikeiovine requested review from achartier and yilin-void July 17, 2025 16:51

coderabbitai bot reviewed Jul 17, 2025

View reviewed changes

ziyixiong-nv approved these changes Jul 18, 2025

View reviewed changes

ziyixiong-nv reviewed Jul 18, 2025

View reviewed changes

tensorrt_llm/_torch/speculative/model_drafter.py Outdated Show resolved Hide resolved

mikeiovine requested a review from lfr-0531 July 18, 2025 15:43

mikeiovine force-pushed the chunked-prefill-spec-dec branch from a8c5d0b to 92b4a83 Compare July 18, 2025 16:17

coderabbitai bot reviewed Jul 18, 2025

View reviewed changes

Fix get_draft_model_prompt call

61d9701

Signed-off-by: Mike Iovine <[email protected]>

Merge branch 'main' into chunked-prefill-spec-dec

5e9008b

mikeiovine merged commit 0f2f11f into NVIDIA:main Jul 25, 2025
3 checks passed

mikeiovine deleted the chunked-prefill-spec-dec branch July 25, 2025 01:50

This was referenced Jul 25, 2025

tests: add test_chunked_prefill for llama4 #5549

Merged

[None][infra] Enable accuracy test for mtp and chunked prefill #6314

Merged

[None][infra] Enable accuracy test for eagle3 and chunked prefill #6386

Merged

tests: Add llama4 functional cases #6392

Merged

NVShreyas pushed a commit to NVShreyas/TensorRT-LLM that referenced this pull request Jul 28, 2025

[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (N…

8b58a33

…VIDIA#6104) Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>

Ransiki pushed a commit to Ransiki/TensorRT-LLM that referenced this pull request Jul 29, 2025

[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (N…

7e5261e

…VIDIA#6104) Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>

This was referenced Jul 29, 2025

[doc] update the doc of feature combination matrix #6441

Merged

[None][infra] Enable test of chunked prefill with logit post processor #6483

Merged

coderabbitai bot mentioned this pull request Aug 5, 2025

[TRTLLM-6637][feat] Resolve KV cache divergence issue #6628

Merged

lancelly pushed a commit to lancelly/TensorRT-LLM that referenced this pull request Aug 6, 2025

[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (N…

90b8aec

…VIDIA#6104) Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>

coderabbitai bot mentioned this pull request Aug 26, 2025

[TRTLLM-7353][feat] Implement capturable drafting loops for speculation #7100

Merged

[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model #6104

[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model #6104

Uh oh!

Conversation

mikeiovine commented Jul 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

mikeiovine commented Jul 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

mikeiovine commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

Uh oh!

mikeiovine commented Jul 18, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jul 18, 2025

Uh oh!

tensorrt-cicd commented Jul 18, 2025

Uh oh!

mikeiovine commented Jul 18, 2025

Uh oh!

tensorrt-cicd commented Jul 18, 2025

Uh oh!

mikeiovine commented Jul 23, 2025

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

mikeiovine commented Jul 23, 2025

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

mikeiovine commented Jul 24, 2025

Uh oh!

tensorrt-cicd commented Jul 24, 2025

Uh oh!

tensorrt-cicd commented Jul 24, 2025

Uh oh!

mikeiovine commented Jul 24, 2025

Uh oh!

tensorrt-cicd commented Jul 24, 2025

mikeiovine commented Jul 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 16, 2025 •

edited

Loading