Qwen3-0.6B模型使用vllm推理时，think模式会出现截断问题

### Description

vllm使用--reasoning-parser解析think模式，默认模式（Think模式）输出时会出现think部分没有完全输出就截断，确定不是max_token问题，使用instruct模式，即添加配置"chat_template_kwargs": {"enable_thinking": True}时，输出正常不会截断；

观察result['choices'][0].get('finish_reason', 'unknown')，输出为stop，即发现是正常识别到结束标志。

唯一解决办法：软开关，即在messages[-1]中加入"/think"时，输出正常，不会被自动识别结束标志而提前截断；

不确定是vllm解析器问题，还是tokenizer_config关于结束标志配置问题

### Reproduction
```
def generate_response(self, messages, temperature=None, max_tokens=None):
        """生成响应"""
        if temperature is None:
            temperature = llm_config.temperature
        if max_tokens is None:
            max_tokens = llm_config.max_tokens

        print(temperature, max_tokens)

        # 添加软开关时正常输出
        messages = messages[:-1] + [{"role": "user", "content": messages[-1]["content"] + "/think"}]

        print(messages)
        payload = {
            "model": self.model_name,
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
            # "chat_template_kwargs": {"enable_thinking": False} # 使用Instruct模式，正常输出
        }

        try:
            response = requests.post(
                f"{self.base_url}/chat/completions",
                headers={"Content-Type": "application/json"},
                data=json.dumps(payload),
            )
            response.raise_for_status()
            result = response.json()

            content = result["choices"][0]["message"]["content"]
            print(f"content: {content}\n")
            # 使用vllm --reasoning-parser 解析器配置时，观察能否正确处理think结果
            print(
                f"reasoning_content: {result['choices'][0]['message'].get('reasoning_content', 'unknown')}"
            )
            # 观察结束标志，为stop，即识别到结束标志正常结束
            print(f"finish_reason: {result['choices'][0].get('finish_reason', 'unknown')}")
            # 去除think标签内容
            final_response = re.sub(
                r"<think>.*?</think>", "", content, flags=re.DOTALL
            ).strip()
            return final_response
        except requests.exceptions.RequestException as e:
            print(f"LLM调用失败: {e}")
            return None
```
### Logs

```shell
输出被截断日志：
content: <think>
Okay, the user is responding to me saying I have a turtle named Timothy. They mentioned dancing at the club, running a dog obedience school, and eating sweets. Now they are confirming that they like dancing, working with dogs, and eating sweets. I need to make sure my response stays true to all their traits: like dancing at a club, running a dog school, eating sweets, and being a sweet toothy person. Let me think of a natural and friendly way to respond. Maybe mention the turtle and how it's good for them. Keep it conversational and positive. Let me check if I need to add any other details or keep the tone consistent. Yep, that should work.

finish_reason: stop
```

### Environment Information

vllm版本： 0.11.0；

### Known Issue

- [ ] The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3-0.6B模型使用vllm推理时，think模式会出现截断问题 #1680

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-0.6B模型使用vllm推理时，think模式会出现截断问题 #1680

Description

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions