Skip to content

Conversation

mikeiovine
Copy link
Collaborator

@mikeiovine mikeiovine commented May 7, 2025

[feat] Enable chunked context for flashinfer

Description

There was a subtle bug preventing us from enabling this. The computation of the last_page_len argument for flashinfer
was wrong in chunked prefill cases.

This will help unblock the chunked attention PR for llama 4.

Test Coverage

Added new llama 3 8B + chunked context accuracy test.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@mikeiovine mikeiovine requested review from schetlur-nv and syuoni May 7, 2025 18:35
@mikeiovine
Copy link
Collaborator Author

Hey @syuoni, can you verify that I added the tests in the correct way? Thanks

@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4417 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4417 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3179 completed with status: 'FAILURE'

@mikeiovine mikeiovine force-pushed the chunked-context-flashinfer branch from f869c79 to 614581c Compare May 7, 2025 20:49
@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4423 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4423 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3185 completed with status: 'FAILURE'

@mikeiovine mikeiovine force-pushed the chunked-context-flashinfer branch from 614581c to c29560d Compare May 7, 2025 22:52
@mikeiovine mikeiovine requested a review from yuxianq May 7, 2025 23:17
@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4432 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4432 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3192 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4574 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4574 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3290 completed with status: 'FAILURE'

@mikeiovine mikeiovine force-pushed the chunked-context-flashinfer branch from 5b17618 to 7a94f21 Compare May 9, 2025 15:12
@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4725 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4725 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3409 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4735 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4735 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3417 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4884 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4884 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3538 completed with status: 'FAILURE'

@mikeiovine mikeiovine force-pushed the chunked-context-flashinfer branch from dee96dc to 0024ef1 Compare May 13, 2025 14:43
@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5034 [ run ] triggered by Bot

@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5046 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5034 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5046 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3674 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5066 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5066 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3687 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5190 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5190 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3787 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot skip --comment "The last DGX test timed out at 90% complete (https://prod.blsm.nvidia.com/sw-tensorrt-top-1/blue/organizations/jenkins/LLM%2Fmain%2FL0_Test-x86_64/detail/L0_Test-x86_64/10947/pipeline/7052). Should be safe to merge given that the vast majority of tests have been run and the remaining tests are unrelated to the code this PR touches."

@mikeiovine mikeiovine enabled auto-merge (squash) May 15, 2025 02:33
@tensorrt-cicd
Copy link
Collaborator

PR_Github #5250 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5250 [ skip ] completed with state SUCCESS
Skipping testing for commit f76075a

@mikeiovine mikeiovine merged commit f9adac3 into NVIDIA:main May 15, 2025
3 checks passed
@mikeiovine mikeiovine deleted the chunked-context-flashinfer branch May 15, 2025 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants