The inference results of the local deployment of lixin4ever/VideoLLaMA2-AV are inconsistent with the web DEMO results

Hi. I downloaded the model you released on huggingface, used the config.json in the path **lixin4ever/VideoLLaMA2-AV**, sampled **32 frames**, and set **do_sample=False** for both. Why is the inference result inconsistent with the demo? 

config details:
`
{
  "_name_or_path": "DAMO-NLP-SG/VideoLLaMA2.1-7B-16F",
  "architectures": [
    "Videollama2Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "freeze_mm_mlp_adapter": false,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "hidden_size_a": 3584,
  "image_aspect_ratio": "pad",
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "mm_audio_tower": "work_dirs/VideoLLaMA2.1-7B-A/audio_tower.bin",
  "mm_hidden_size": 1152,
  "mm_hidden_size_a": 768,
  "mm_projector_a_type": "mlp2x_gelu",
  "mm_projector_lr": null,
  "mm_projector_type": "stc_connector_v35",
  "mm_vision_select_feature": "patch",
  "mm_vision_select_layer": -2,
  "mm_vision_tower": "google/siglip-so400m-patch14-384",
  "model_type": "videollama2_qwen2",
  "num_attention_heads": 28,
  "num_frames": 8,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "tokenizer_model_max_length": 2048,
  "tokenizer_padding_side": "right",
  "torch_dtype": "bfloat16",
  "transformers_version": "4.42.3",
  "tune_mm_mlp_adapter": false,
  "tune_mm_mlp_adapter_a": true,
  "use_cache": true,
  "use_mm_proj": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}
`
To explain this problem, do you need more detailed information?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The inference results of the local deployment of lixin4ever/VideoLLaMA2-AV are inconsistent with the web DEMO results #167

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The inference results of the local deployment of lixin4ever/VideoLLaMA2-AV are inconsistent with the web DEMO results #167

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions