VideoLLaMA on CPU Server(without GPU or CUDA Support)

**Issue 1: FlashAttention Compatibility**

The first issue we encountered was related to FlashAttention. This can be resolved by disabling Flash Attention explicitly:

Wherever **use_flash_attention is referenced, set its value to "eager"** to ensure compatibility and prevent errors on systems where Flash Attention is not supported.

Changes made in config.json file

**changed "mm_vision_tower": "google/siglip-so400m-patch14-384" to "mm_vision_tower": "openai/clip-vit-base-patch32"**
                 - Set `"use_flash_attention": false`
                 - Set `"sliding_window": 0`



**Issue 2: No CUDA GPUs Available**

Installed the CPU-only versions of PyTorch, TorchVision, and TorchAudio using:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Replaced hardcoded .cuda() calls in   videollama2/__init__.py,  videollama2/model/__init__.py

input_ids = tokenizer_multimodal_token(prompt, tokenizer, modal_token, return_tensors='pt').unsqueeze(0).long()
attention_masks = input_ids.ne(tokenizer.pad_token_id).long()

    if device != "cpu":
        kwargs['device_map'] = {"": device}




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VideoLLaMA on CPU Server(without GPU or CUDA Support) #169

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VideoLLaMA on CPU Server(without GPU or CUDA Support) #169

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions