feat: Prefetch safetensors files before loading them #4140

nvpohanh · 2025-05-08T03:26:23Z

Prefetching safetensors files so that they are stored in the system file cache. This significantly speeds up the model weight loading for the very first run after entering the docker container.

This is beneficial because model weight loading is done layer-by-layer, which means reading from the safetensors chunk-by-chunk, and that cannot utilize the internet bandwidth very well, assuming that these files are stored in some network drives. Instead, loading the whole files in bulk can achieve higher internet bandwidth utilization.

When running with world_size>1, all ranks collaboratedly prefetch these files.

In theory, we should add heuristics to decide whether to prefetch the files or not, but that is beyond the scope of this commit.

For example, when the CPU memory is small, doing prefetching may result in file cache thrashing, resulting in slower weight loading time.

nvpohanh · 2025-05-08T03:26:38Z

/bot run

tensorrt-cicd · 2025-05-08T03:32:00Z

PR_Github #4469 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-08T03:41:43Z

PR_Github #4469 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3206 completed with status: 'FAILURE'

tensorrt_llm/_torch/pyexecutor/model_engine.py

nvpohanh · 2025-05-08T08:30:31Z

/bot run

tensorrt-cicd · 2025-05-08T09:18:22Z

PR_Github #4525 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-08T12:10:27Z

PR_Github #4525 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3253 completed with status: 'FAILURE'

nvpohanh · 2025-05-09T01:00:48Z

/bot run

tensorrt-cicd · 2025-05-09T01:06:35Z

PR_Github #4611 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-09T03:31:35Z

PR_Github #4611 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3323 completed with status: 'SUCCESS'

nvpohanh · 2025-05-09T08:22:47Z

/bot reuse-build

github-actions · 2025-05-09T08:22:58Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

nvpohanh · 2025-05-09T08:24:14Z

/bot reuse-pipeline

tensorrt-cicd · 2025-05-09T08:29:52Z

PR_Github #4688 [ reuse-pipeline ] triggered by Bot

nvpohanh · 2025-05-09T08:30:25Z

/bot kill

Prefetching safetensors files so that they are stored in the system file cache. This significantly speeds up the model weight loading for the very first run after entering the docker container. This is beneficial because model weight loading is done layer-by-layer, which means reading from the safetensors chunk-by-chunk, and that cannot utilize the internet bandwidth very well, assuming that these files are stored in some network drives. Instead, loading the whole files in bulk can achieve higher internet bandwidth utilization. When running with world_size>1, all ranks collaboratedly prefetch these files. In theory, we should add heuristics to decide whether to prefetch the files or not, but that is beyond the scope of this commit. For example, when the CPU memory is small, doing prefetching may result in file cache thrashing, resulting in slower weight loading time. Signed-off-by: Po-Han Huang <[email protected]>

nvpohanh · 2025-05-09T08:32:16Z

/bot run

tensorrt-cicd · 2025-05-09T08:35:59Z

PR_Github #4689 [ kill ] triggered by Bot

tensorrt-cicd · 2025-05-09T08:36:00Z

PR_Github #4688 [ reuse-pipeline ] completed with state ABORTED
Can't reuse PR_Github #4611 with status: SUCCESS

tensorrt-cicd · 2025-05-09T08:36:31Z

PR_Github #4689 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 3b80ccb

tensorrt-cicd · 2025-05-09T08:38:01Z

PR_Github #4690 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-09T10:30:56Z

PR_Github #4690 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3380 completed with status: 'FAILURE'

nvpohanh · 2025-05-12T01:55:15Z

/bot run

tensorrt-cicd · 2025-05-12T02:03:28Z

PR_Github #4802 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-12T03:32:22Z

PR_Github #4802 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3474 completed with status: 'FAILURE'

nvpohanh · 2025-05-12T06:22:18Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-12T06:27:47Z

PR_Github #4837 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-12T09:56:37Z

PR_Github #4837 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3503 completed with status: 'SUCCESS'

nvpohanh requested review from Kefeng-Duan, chenfeiz0326, hlu1 and kaiyux May 8, 2025 03:26

hlu1 approved these changes May 8, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/model_engine.py Outdated Show resolved Hide resolved

nvpohanh force-pushed the user/pohanh/prefetch-safetensors branch from 8e825a1 to fdeee42 Compare May 8, 2025 08:29

nvpohanh force-pushed the user/pohanh/prefetch-safetensors branch from fdeee42 to 0616016 Compare May 9, 2025 01:00

nvpohanh enabled auto-merge (squash) May 9, 2025 08:22

nvpohanh disabled auto-merge May 9, 2025 08:22

nvpohanh force-pushed the user/pohanh/prefetch-safetensors branch from c670ae9 to 3b80ccb Compare May 9, 2025 08:30

nvpohanh enabled auto-merge (squash) May 9, 2025 09:28

Merge branch 'main' into user/pohanh/prefetch-safetensors

9f66e5c

nvpohanh requested review from Funatiq, HuiGao-NV, dcampora and dongxuy04 as code owners May 12, 2025 01:55

Funatiq removed request for Funatiq and dcampora May 12, 2025 07:05

nvpohanh disabled auto-merge May 13, 2025 05:35

nvpohanh merged commit 13c8e5a into NVIDIA:main May 13, 2025
3 checks passed

feat: Prefetch safetensors files before loading them #4140

feat: Prefetch safetensors files before loading them #4140

Uh oh!

Conversation

nvpohanh commented May 8, 2025

Uh oh!

nvpohanh commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

Uh oh!

nvpohanh commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

nvpohanh commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

nvpohanh commented May 9, 2025

Uh oh!

github-actions bot commented May 9, 2025

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

nvpohanh commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

nvpohanh commented May 9, 2025

Uh oh!

nvpohanh commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

nvpohanh commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

nvpohanh commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants