-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
Description
Description
vllm使用--reasoning-parser解析think模式,默认模式(Think模式)输出时会出现think部分没有完全输出就截断,确定不是max_token问题,使用instruct模式,即添加配置"chat_template_kwargs": {"enable_thinking": True}时,输出正常不会截断;
观察result['choices'][0].get('finish_reason', 'unknown'),输出为stop,即发现是正常识别到结束标志。
唯一解决办法:软开关,即在messages[-1]中加入"/think"时,输出正常,不会被自动识别结束标志而提前截断;
不确定是vllm解析器问题,还是tokenizer_config关于结束标志配置问题
Reproduction
def generate_response(self, messages, temperature=None, max_tokens=None):
"""生成响应"""
if temperature is None:
temperature = llm_config.temperature
if max_tokens is None:
max_tokens = llm_config.max_tokens
print(temperature, max_tokens)
# 添加软开关时正常输出
messages = messages[:-1] + [{"role": "user", "content": messages[-1]["content"] + "/think"}]
print(messages)
payload = {
"model": self.model_name,
"messages": messages,
"temperature": temperature,
"max_tokens": max_tokens,
# "chat_template_kwargs": {"enable_thinking": False} # 使用Instruct模式,正常输出
}
try:
response = requests.post(
f"{self.base_url}/chat/completions",
headers={"Content-Type": "application/json"},
data=json.dumps(payload),
)
response.raise_for_status()
result = response.json()
content = result["choices"][0]["message"]["content"]
print(f"content: {content}\n")
# 使用vllm --reasoning-parser 解析器配置时,观察能否正确处理think结果
print(
f"reasoning_content: {result['choices'][0]['message'].get('reasoning_content', 'unknown')}"
)
# 观察结束标志,为stop,即识别到结束标志正常结束
print(f"finish_reason: {result['choices'][0].get('finish_reason', 'unknown')}")
# 去除think标签内容
final_response = re.sub(
r"<think>.*?</think>", "", content, flags=re.DOTALL
).strip()
return final_response
except requests.exceptions.RequestException as e:
print(f"LLM调用失败: {e}")
return None
Logs
输出被截断日志:
content: <think>
Okay, the user is responding to me saying I have a turtle named Timothy. They mentioned dancing at the club, running a dog obedience school, and eating sweets. Now they are confirming that they like dancing, working with dogs, and eating sweets. I need to make sure my response stays true to all their traits: like dancing at a club, running a dog school, eating sweets, and being a sweet toothy person. Let me think of a natural and friendly way to respond. Maybe mention the turtle and how it's good for them. Keep it conversational and positive. Let me check if I need to add any other details or keep the tone consistent. Yep, that should work.
finish_reason: stopEnvironment Information
vllm版本: 0.11.0;
Known Issue
- The issue hasn't been already addressed in Documentation, Issues, and Discussions.