Skip to content

Device mismatch error in multi-GPU setup: position embeddings on wrong device in LVU interleaved model #5

@kaijunhan

Description

@kaijunhan

Getting the following error: "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!" (Full stack trace can be found below)

The error occurs in apply_multimodal_rotary_pos_emb at line 683 in the transformers library: q_embed = (q * cos) + (rotate_half(q) * sin) Called from lvu_qwen25_vl_flash_attention_2_forward in qwen25_lvu_interleaved.py:61-63.

In the qwen25_lvu_interleaved.py, position_embeddings seem to be created/cached on a different device than the attention layer's query/key states. When the model runs across multiple GPUs, the position embeddings end up on cuda:3 while query states are on cuda:0. Moving tensors to the same device with .to(query_states.device) works but I think this defeats the purpose of multi-device optimization by forcing everything onto one GPU (?)

Please help.

Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.28it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.50, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Preprocessing time for video: 0.05s
Tokenizer time was: 0.28s
Processing total of 1024 frames of 16 frames each.
Processing video groups:   0%|                                                           | 0/64 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/data2/home/kaijun/QuickVideo/main.py", line 19, in <module>
    output = lvu.generate(question, video_path, max_new_tokens=128, do_sample=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/lvu.py", line 49, in generate
    output = self.run_model_func(question, video_path, **generation_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 724, in run_lvu_model
    return chat_lvu_model(self, messages, **generation_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 871, in chat_lvu_model
    outputs = model(**group_i_inputs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/accelerate/hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1861, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1207, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 177, in lvu_qwen25_vl_decoder_layer_forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 57, in lvu_qwen25_vl_flash_attention_2_forward
    query_states, key_states = apply_multimodal_rotary_pos_emb(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 683, in apply_multimodal_rotary_pos_emb
    q_embed = (q * cos) + (rotate_half(q) * sin)
               ~~^~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!
/home/kaijun/.local/share/uv/python/cpython-3.11.13-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions