[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition #25355

frankwang28 · 2025-09-21T19:01:52Z

Purpose

When testing GLM-4.5 in streaming mode with both reasoning and tool calls enabled, it was noticed that after the first assistant response, reasoning was not being parsed and added to the reasoning_content properly.

This is due to calling is_reasoning_end on all the pervious prompt tokens .

GLM-4.5's chat template removes the reasoning content, but still appends <think><\think> after the assistant token for previous interactions (interactive example). Thus, when checking for just the <\think> token on all prompt tokens, the check would return true as the previous turn's <\think> token is present.

Thus, we must add the case that if there is an <|assistant|> token present, check that there is a <\think> token after the last <|assistant|> token. This can equivalently be accomplished by looping backwards over the prompt tokens and returning false upon encountering the <|assistant|> token or true upon encountering a <\think> token.

Test Plan

Added automated tests (tests/reasoning/test_glm4_moe_reasoning_parser.py).
Additionally performed manual test with the following curl command:

curl --location 'http://0.0.0.0:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "temperature": 0.7,
    "max_tokens": 5,
    "model": "zai-org/GLM-4.5-Air-FP8",
    "stream": true,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "This function gets the weather of a certain location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "weather location"
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "What'\''s the current weather in Vancouver?"
        },
        {
            "role": "assistant",
            "content": "\n\n",
            "tool_calls": [
                {
                    "id": "12345",
                    "type": "function",
                    "function": {
                        "name": "get_weather",
                        "arguments": "{\"location\": \"Vancouver\"}"
                    }
                }
            ],
            "reasoning_content": "The user is asking for the current weather in Vancouver. I need to search for this information. I'\''ll use the get_weather function to find current weather information for Vancouver.",
            "id": "chatcmpl-12345",
            "cited_sources": null
        },
        {
            "role": "tool",
            "content": "The weather in Vancouver is clear and 17 degrees Celsius."
        }
    ]
}'

Test Result

Automated tests pass.

Curl command results:
Main

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":null},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":"\n<think>"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":"The"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":" weather"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":" function"},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}]}

data: [DONE]

This PR

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null}

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":"\n"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"reasoning_content":"The"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"reasoning_content":" function"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"reasoning_content":" returned"},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}]}

data: [DONE]

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: frankwang28 <[email protected]>

gemini-code-assist

Code Review

This pull request effectively resolves a bug in the GLM-4 MoE reasoning parser for multi-turn streaming scenarios. The logic change in is_reasoning_end is correct and well-supported by the comprehensive new test suite. I have one suggestion to enhance the robustness of the parser by ensuring all necessary special tokens are validated during initialization.

vllm/reasoning/glm4_moe_reasoning_parser.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Frank Wang <[email protected]>

chaunceyjiang · 2025-09-22T06:19:11Z

/cc @zRzRzRzRzRzRzR PTAL.

Update GLM4 MoE is_reasoning_end

9c4e1f4

Signed-off-by: frankwang28 <[email protected]>

frankwang28 requested review from aarnphm and chaunceyjiang as code owners September 21, 2025 19:01

gemini-code-assist bot reviewed Sep 21, 2025

View reviewed changes

vllm/reasoning/glm4_moe_reasoning_parser.py Outdated Show resolved Hide resolved

Update vllm/reasoning/glm4_moe_reasoning_parser.py

5fcc0dc

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Frank Wang <[email protected]>

chaunceyjiang self-assigned this Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition #25355

[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition #25355

frankwang28 commented Sep 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chaunceyjiang commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition #25355

Are you sure you want to change the base?

[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition #25355

Conversation

frankwang28 commented Sep 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chaunceyjiang commented Sep 22, 2025

Uh oh!

Uh oh!

frankwang28 commented Sep 21, 2025 •

edited by github-actions bot

Loading