Skip to content

HF Model has visibly lower performance than chat.qwen.ai #509

@johnxin-v

Description

@johnxin-v

I was using"Qwen2.5-VL-32B-Instruct" for person appearance matching, and I realized that the hf model of "Qwen/Qwen2.5-VL-32B-Instruct" running both on my local environment and on "https://huggingface.co/spaces/Qwen/Qwen2.5-VL-32B-Instruct" have visibly lower performance than if I select "Qwen2.5-VL-32B-Instruct" on chat.qwen.ai. Interestingly, my local inference and "https://huggingface.co/spaces/Qwen/Qwen2.5-VL-32B-Instruct" consistently have the same accuracy. The following is the text prompt, due to compliance requirements I cannot share the images. My current suspicion is that chat.qwen.ai's backend ignores my model selection and quietly uses a different, perhaps the bigger 72B model. Can someone from qwen team confirm this?

Prompt: Based on the appearance of the person in each image, are they likely the same person? You should ignore the background and only focus on the person's appearance, clothing, etc. If their clothing is visibly different, they are not the same person. Your output should be a score from 0 to 10, where 0 means definitely not the same person and 10 means definitely the same person. Please only output the score.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions