Now, I want to recognize objects in a specified box region of an image. what is the format of the box coordinates in prompt? #1745

cch2016 · 2025-11-17T07:10:11Z

cch2016
Nov 17, 2025

How is the performance on region recognition tasks? Were region recognition tasks included during training?

jklj077 · 2025-11-19T02:47:18Z

At the moment, the model series Qwen accepts text as input. For multi-modal input, please consider the model series Qwen-VL. There is also a cookbook for your usecase: https://github.com/QwenLM/Qwen3-VL/blob/main/cookbooks/2d_grounding.ipynb

0 replies