-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Getting the following error: "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!" (Full stack trace can be found below)
The error occurs in apply_multimodal_rotary_pos_emb at line 683 in the transformers library: q_embed = (q * cos) + (rotate_half(q) * sin) Called from lvu_qwen25_vl_flash_attention_2_forward in qwen25_lvu_interleaved.py:61-63.
In the qwen25_lvu_interleaved.py, position_embeddings seem to be created/cached on a different device than the attention layer's query/key states. When the model runs across multiple GPUs, the position embeddings end up on cuda:3 while query states are on cuda:0. Moving tensors to the same device with .to(query_states.device) works but I think this defeats the purpose of multi-device optimization by forcing everything onto one GPU (?)
Please help.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.28it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.50, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Preprocessing time for video: 0.05s
Tokenizer time was: 0.28s
Processing total of 1024 frames of 16 frames each.
Processing video groups: 0%| | 0/64 [00:01<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/data2/home/kaijun/QuickVideo/main.py", line 19, in <module>
output = lvu.generate(question, video_path, max_new_tokens=128, do_sample=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/lvu/lvu.py", line 49, in generate
output = self.run_model_func(question, video_path, **generation_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 724, in run_lvu_model
return chat_lvu_model(self, messages, **generation_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 871, in chat_lvu_model
outputs = model(**group_i_inputs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1861, in forward
outputs = self.model(
^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1207, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 177, in lvu_qwen25_vl_decoder_layer_forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/lvu/models/qwen25_lvu_interleaved.py", line 57, in lvu_qwen25_vl_flash_attention_2_forward
query_states, key_states = apply_multimodal_rotary_pos_emb(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data2/home/kaijun/QuickVideo/.venv/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 683, in apply_multimodal_rotary_pos_emb
q_embed = (q * cos) + (rotate_half(q) * sin)
~~^~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!
/home/kaijun/.local/share/uv/python/cpython-3.11.13-linux-x86_64-gnu/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '