Skip to content

Conversation

tshmilnvidia
Copy link
Collaborator

@tshmilnvidia tshmilnvidia commented Jun 25, 2025

Nixl support for GDS

Commit 1 ("Pass mode & directory") is a dependency, seperate PR in: #5983

Summary by CodeRabbit

  • New Features

    • Pluggable loopback transfer path with loadable backends; KV-cache APIs now accept transfer mode and directory. Retention configs and requests expose directory as a string.
    • File-backed descriptors and loopback agent interfaces for file↔memory transfers; Nixl loopback backend added.
  • Tests

    • Unit tests extended to cover DRAM and filesystem-backed transfer flows with temp directories and integrity checks.
  • Chores

    • Build config updated to enable additional storage plugins (GDS, GDS_MT).

@tshmilnvidia tshmilnvidia marked this pull request as ready for review June 26, 2025 07:02
@tshmilnvidia
Copy link
Collaborator Author

/bot run

@tshmilnvidia
Copy link
Collaborator Author

/bot run

@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Jul 3, 2025
@tshmilnvidia tshmilnvidia force-pushed the nixl_agent branch 4 times, most recently from 7210604 to 96a28f6 Compare July 6, 2025 11:55
@schetlur-nv schetlur-nv requested a review from Tabrizian July 9, 2025 22:48
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal of this PR to use NIXL for loading/unloading KVBlocks from disc and use GDS when doing this?

@tshmilnvidia
Copy link
Collaborator Author

Rebased + opened 2 seperate PRs for the first 2 commits, which are bug fixes. Would be better to merge them first:
#5982
#5983

@tshmilnvidia tshmilnvidia force-pushed the nixl_agent branch 3 times, most recently from d5ddede to 57f2533 Compare July 15, 2025 07:42
@Shixiaowei02
Copy link
Collaborator

Shixiaowei02 commented Jul 15, 2025

Thank you for your contribution. This looks well modularized.

May I ask where the related tests are located? Is it possible to reuse any existing tests? And I didn't see changes to Python runtime, may I ask how this part is being handled?

@glevnv glevnv requested review from a team as code owners August 13, 2025 08:00
@glevnv glevnv requested a review from poweiw August 13, 2025 08:00
Copy link
Contributor

coderabbitai bot commented Aug 13, 2025

📝 Walkthrough

Walkthrough

Wires a loopback agent into KV-cache transfer flows, adds file-backed descriptors and a BaseLoopbackAgent/NixlLoopbackAgent, threads explicit transfer mode and a non-optional directory string through KV-cache APIs/constructors, and updates tests and build flags to exercise DRAM and GDS paths.

Changes

Cohort / File(s) Summary of Changes
KV cache public header
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
Adds #include for transferAgent.h, namespace kvc = tensorrt_llm::executor::kv_cache;, exposes GenerationRequest::getTransferMode() and getDirectory(), extends public methods/constructors to accept executor::KvCacheTransferMode and std::string const& directory, and adds loopback-agent members.
KV cache implementation
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
Propagates loopback agent and agentConfig into BlockManager/WindowBlockManager/KVCacheTransferManager; threads transfer mode and directory through onboarding/offload/allocation; enforces DRAM fallback when directory missing.
Transfer manager header & impl
cpp/include/tensorrt_llm/batch_manager/kvCacheTransferManager.h, cpp/tensorrt_llm/batch_manager/kvCacheTransferManager.cpp
Constructor accepts std::shared_ptr<kvc::BaseLoopbackAgent>; stores mLoopbackAgent/mDeviceId; signatures use std::string const& directory; copyBlock moved private; IO routed via loopback agent (registerFiles/registerMemory, submitLoopbackRequests, deregister).
TransferAgent interface
cpp/include/tensorrt_llm/executor/transferAgent.h
Adds FileDesc/FileDescs, aliases (TransferDescs, RegisterDescs, SyncMessage, ConnectionInfoType), BaseAgentConfig.multiThread, BaseLoopbackAgent interface, and makeLoopbackAgent helper for dynamic loopback agent creation.
Nixl loopback backend
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h, .../transferAgent.cpp
Adds FileDescs conversion helpers, NixlLoopbackAgent implementing BaseLoopbackAgent (register/deregister memory/files, submitLoopbackRequests) and C factory createNixlLoopbackAgent; supports GDS/GDS_MT and multi-threaded config.
Retention config / serialization / bindings
cpp/include/tensorrt_llm/executor/executor.h, cpp/tensorrt_llm/executor/kvCacheRetentionConfig.cpp, cpp/tensorrt_llm/executor/serialization.cpp, cpp/tensorrt_llm/pybind/executor/request.cpp, cpp/tensorrt_llm/nanobind/executor/request.cpp
KvCacheRetentionConfig now stores std::string mDirectory (non-optional) and exposes std::string const& getDirectory() const; ctor/serialization and Python bindings updated to use string (bindings adjust but nanobind still exposes Python default None).
KV cache transfer wiring & constructors
cpp/tensorrt_llm/batch_manager/kvCacheManager.*, .../kvCacheTransferManager.*
BlockManager/WindowBlockManager constructors and public transfer methods extended to accept/propagate transfer mode and directory; BlockManager may accept agentConfig and wires loopback agent; KVCacheManager stores unique_ptr loopback agent.
Unit tests
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp, cpp/tests/unit_tests/executor/transferAgentTest.cpp
Tests parameterized over KvCacheTransferMode (DRAM/GDS), add filesystem-backed GDS temp-dir helpers and pattern writers, update offload/onboard calls to pass transferMode+directory, and add loopback file/memory transfer tests.
Build config
docker/common/install_nixl.sh, jenkins/Build.groovy
Adds GDS and GDS_MT to Meson -Dstatic_plugins and appends NIXL_ROOT=/opt/nvidia/nvda_nixl to certain build configs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor App as Application
  participant Exec as Executor
  participant KVCM as KVCacheManager
  participant BM as BlockManager
  participant WBM as WindowBlockManager
  participant TM as KVCacheTransferManager
  participant LBA as BaseLoopbackAgent
  participant FS as Filesystem
  participant VRAM as DeviceMemory

  App->>Exec: generate(request)
  Exec->>KVCM: addSequence(request (mode, directory))
  KVCM->>BM: loadOrAllocateBlocks(..., mode, directory)
  BM->>WBM: getFreeBlock(..., mode, directory)
  alt offload needed
    WBM->>TM: offloadBlock(block, mode, directory)
    TM->>LBA: registerFiles(fileDescs) / registerMemory(memoryDescs)
    TM->>LBA: submitLoopbackRequests(memoryDescs,filedescs,isOffload=true)
    LBA-->>TM: TransferStatus
    TM->>LBA: deregisterFiles/Memory
    TM-->>WBM: offload complete
  else onboard needed
    WBM->>TM: onboardBlock(offloadBlock, mode, directory)
    TM->>LBA: registerFiles(fileDescs) / registerMemory(memoryDescs)
    TM->>LBA: submitLoopbackRequests(memoryDescs,filedescs,isOffload=false)
    LBA-->>TM: TransferStatus
    TM->>LBA: deregisterFiles/Memory
    TM-->>WBM: onboard complete
  end
  WBM-->>BM: block ready
  BM-->>KVCM: allocation result
  KVCM-->>Exec: sequence ready
  Exec-->>App: produce tokens
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

KV-Cache Management

Suggested reviewers

  • xinhe-nv
  • Shixiaowei02
  • chuangz0

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@tshmilnvidia tshmilnvidia changed the title Nixl agent [None][feat] Nixl agent Aug 13, 2025
@tshmilnvidia tshmilnvidia changed the title [None][feat] Nixl agent [None][feat] Nixl support for GDS Aug 13, 2025
@glevnv
Copy link
Contributor

glevnv commented Aug 13, 2025

Thank you for your contribution. This looks well modularized.

May I ask where the related tests are located? Is it possible to reuse any existing tests? And I didn't see changes to Python runtime, may I ask how this part is being handled?

Tests were added in the 2 final commits.
The nixl_wrapper lib is being copied to a directory searched by dlopen (in the commit before that)

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17675 [ run ] triggered by Bot

Passing mode & directory parameters to relevant onboard & offload
functions.

Signed-off-by: Tomer Shmilovich <[email protected]>
@tensorrt-cicd
Copy link
Collaborator

PR_Github #17675 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13288 completed with status: 'ABORTED'

@Shixiaowei02
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17723 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17723 [ run ] completed with state DISABLED
L0 testing is limited to prioritized users. User Shixiaowei02 is not in the prioritized list. L0 testing cannot be triggered.

@bo-nv
Copy link
Collaborator

bo-nv commented Sep 7, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17917 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17917 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13424 completed with status: 'FAILURE'

tshmilnvidia and others added 6 commits September 8, 2025 01:44
Implement class LoopbackAgent.

Signed-off-by: Tomer Shmilovich <[email protected]>
Signed-off-by: Tomer Shmilovich <[email protected]>
Signed-off-by: Tomer Shmilovich <[email protected]>
@BatshevaBlack
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18018 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18018 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13503 completed with status: 'FAILURE'

@bo-nv
Copy link
Collaborator

bo-nv commented Sep 8, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18073 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18073 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13546 completed with status: 'SUCCESS'

@bo-nv bo-nv merged commit ecc0e68 into NVIDIA:main Sep 9, 2025
5 checks passed
nv-guomingz added a commit to nv-guomingz/TensorRT-LLM that referenced this pull request Sep 9, 2025
nv-guomingz added a commit to nv-guomingz/TensorRT-LLM that referenced this pull request Sep 9, 2025
nv-guomingz added a commit that referenced this pull request Sep 10, 2025
Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025
Signed-off-by: Tomer Shmilovich <[email protected]>
Signed-off-by: Guy Lev <[email protected]>
Co-authored-by: Guy Lev <[email protected]>
Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community want to contribute PRs initiated from Community
Projects
None yet
Development

Successfully merging this pull request may close these issues.