Skip to content

[Bug]: pynccl leads to incorrect data in multi-thread GPU-worker #21252

@simpx

Description

@simpx

Your current environment

The output of python collect_env.py
Your output of `python collect_env.py` here

any cuda environment has this bug

🐛 Describe the bug

I built a KV connector based on the v1 KV connector API.
This connector starts a background thread in each worker process. After the main thread calls save_kv_layer, the background thread tries to move data from the GPU memory to the system memory using a special CUDA stream (swap_out_stream).

Here’s a simplified version of the logic:

async def run_in_background(self, blk_ids, kv_cache_layer):
    with torch.cuda.stream(self.swap_out_stream): # Using a dedicated CUDA stream
        host_memory = get_available_system_memory()  # Find space in system memory
        ops.swaps_out(kv_cache_layer, host_mem, blk_ids)
        event = # get cuda event
        event.record()
    while not event.query():
        await asyncio.sleep(0)

When running vLLM with tpsize=4, the model’s intermediate states sometimes contain invalid values (NaN), which leads to incorrect outputs.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions