Skip to content

Conversation

nvpohanh
Copy link
Collaborator

@nvpohanh nvpohanh commented May 8, 2025

Prefetching safetensors files so that they are stored in the system file cache. This significantly speeds up the model weight loading for the very first run after entering the docker container.

This is beneficial because model weight loading is done layer-by-layer, which means reading from the safetensors chunk-by-chunk, and that cannot utilize the internet bandwidth very well, assuming that these files are stored in some network drives. Instead, loading the whole files in bulk can achieve higher internet bandwidth utilization.

When running with world_size>1, all ranks collaboratedly prefetch these files.

In theory, we should add heuristics to decide whether to prefetch the files or not, but that is beyond the scope of this commit.

For example, when the CPU memory is small, doing prefetching may result in file cache thrashing, resulting in slower weight loading time.

@nvpohanh
Copy link
Collaborator Author

nvpohanh commented May 8, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4469 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4469 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3206 completed with status: 'FAILURE'

@nvpohanh nvpohanh force-pushed the user/pohanh/prefetch-safetensors branch from 8e825a1 to fdeee42 Compare May 8, 2025 08:29
@nvpohanh
Copy link
Collaborator Author

nvpohanh commented May 8, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4525 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4525 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3253 completed with status: 'FAILURE'

@nvpohanh nvpohanh force-pushed the user/pohanh/prefetch-safetensors branch from fdeee42 to 0616016 Compare May 9, 2025 01:00
@nvpohanh
Copy link
Collaborator Author

nvpohanh commented May 9, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4611 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4611 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3323 completed with status: 'SUCCESS'

@nvpohanh nvpohanh enabled auto-merge (squash) May 9, 2025 08:22
@nvpohanh nvpohanh disabled auto-merge May 9, 2025 08:22
@nvpohanh
Copy link
Collaborator Author

nvpohanh commented May 9, 2025

/bot reuse-build

Copy link

github-actions bot commented May 9, 2025

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@nvpohanh
Copy link
Collaborator Author

nvpohanh commented May 9, 2025

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4688 [ reuse-pipeline ] triggered by Bot

@nvpohanh
Copy link
Collaborator Author

nvpohanh commented May 9, 2025

/bot kill

Prefetching safetensors files so that they are stored in the system file
cache. This significantly speeds up the model weight loading for the
very first run after entering the docker container.

This is beneficial because model weight loading is done layer-by-layer,
which means reading from the safetensors chunk-by-chunk, and that cannot
utilize the internet bandwidth very well, assuming that these files are
stored in some network drives. Instead, loading the whole files in bulk
can achieve higher internet bandwidth utilization.

When running with world_size>1, all ranks collaboratedly prefetch these
files.

In theory, we should add heuristics to decide whether to prefetch the
files or not, but that is beyond the scope of this commit.

For example, when the CPU memory is small, doing prefetching may result
in file cache thrashing, resulting in slower weight loading time.

Signed-off-by: Po-Han Huang <[email protected]>
@nvpohanh nvpohanh force-pushed the user/pohanh/prefetch-safetensors branch from c670ae9 to 3b80ccb Compare May 9, 2025 08:30
@nvpohanh
Copy link
Collaborator Author

nvpohanh commented May 9, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4689 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4688 [ reuse-pipeline ] completed with state ABORTED
Can't reuse PR_Github #4611 with status: SUCCESS

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4689 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 3b80ccb

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4690 [ run ] triggered by Bot

@nvpohanh nvpohanh enabled auto-merge (squash) May 9, 2025 09:28
@tensorrt-cicd
Copy link
Collaborator

PR_Github #4690 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3380 completed with status: 'FAILURE'

@nvpohanh
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4802 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4802 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3474 completed with status: 'FAILURE'

@nvpohanh
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4837 [ run ] triggered by Bot

@Funatiq Funatiq removed request for Funatiq and dcampora May 12, 2025 07:05
@tensorrt-cicd
Copy link
Collaborator

PR_Github #4837 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3503 completed with status: 'SUCCESS'

@nvpohanh nvpohanh disabled auto-merge May 13, 2025 05:35
@nvpohanh nvpohanh merged commit 13c8e5a into NVIDIA:main May 13, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants