Device mismatch error in multi-GPU setup: position embeddings on wrong device in LVU interleaved model

Getting the following error:  **_"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!"_** (Full stack trace can be found below) 

The error occurs in apply_multimodal_rotary_pos_emb at line 683 in the transformers library:  q_embed = (q * cos) + (rotate_half(q) * sin)  Called from lvu_qwen25_vl_flash_attention_2_forward in qwen25_lvu_interleaved.py:61-63.

In the qwen25_lvu_interleaved.py, position_embeddings seem to be created/cached on a different device than the attention layer's query/key states. When the model runs across multiple GPUs, the position embeddings end up on cuda:3 while query states are on cuda:0. Moving tensors to the same device with .to(query_states.device) works but I think this defeats the purpose of multi-device optimization by forcing everything onto one GPU (?)

Please help.

```
Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.28it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.50, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Preprocessing time for video: 0.05s
Tokenizer time was: 0.28s
Processing total of 1024 frames of 16 frames each.
Processing video groups:   0%|                                                           | 0/64 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/data2/home/kaijun/QuickVideo/main.py", line 19, in <module>
    output = lvu.generate(question, video_path, max_new_tokens=128, do_sample=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/lvu.py", line 49, in generate
    output = self.run_model_func(question, video_path, **generation_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 724, in run_lvu_model
    return chat_lvu_model(self, messages, **generation_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 871, in chat_lvu_model
    outputs = model(**group_i_inputs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/accelerate/hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1861, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1207, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 177, in lvu_qwen25_vl_decoder_layer_forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 57, in lvu_qwen25_vl_flash_attention_2_forward
    query_states, key_states = apply_multimodal_rotary_pos_emb(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 683, in apply_multimodal_rotary_pos_emb
    q_embed = (q * cos) + (rotate_half(q) * sin)
               ~~^~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!
/home/kaijun/.local/share/uv/python/cpython-3.11.13-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Device mismatch error in multi-GPU setup: position embeddings on wrong device in LVU interleaved model #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Device mismatch error in multi-GPU setup: position embeddings on wrong device in LVU interleaved model #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions