Skip to content

Conversation

frankwang28
Copy link
Contributor

@frankwang28 frankwang28 commented Sep 21, 2025

Purpose

When testing GLM-4.5 in streaming mode with both reasoning and tool calls enabled, it was noticed that after the first assistant response, reasoning was not being parsed and added to the reasoning_content properly.

This is due to calling is_reasoning_end on all the pervious prompt tokens .

GLM-4.5's chat template removes the reasoning content, but still appends <think><\think> after the assistant token for previous interactions (interactive example). Thus, when checking for just the <\think> token on all prompt tokens, the check would return true as the previous turn's <\think> token is present.

Thus, we must add the case that if there is an <|assistant|> token present, check that there is a <\think> token after the last <|assistant|> token. This can equivalently be accomplished by looping backwards over the prompt tokens and returning false upon encountering the <|assistant|> token or true upon encountering a <\think> token.

Test Plan

Added automated tests (tests/reasoning/test_glm4_moe_reasoning_parser.py).
Additionally performed manual test with the following curl command:

curl --location 'http://0.0.0.0:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "temperature": 0.7,
    "max_tokens": 5,
    "model": "zai-org/GLM-4.5-Air-FP8",
    "stream": true,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "This function gets the weather of a certain location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "weather location"
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "What'\''s the current weather in Vancouver?"
        },
        {
            "role": "assistant",
            "content": "\n\n",
            "tool_calls": [
                {
                    "id": "12345",
                    "type": "function",
                    "function": {
                        "name": "get_weather",
                        "arguments": "{\"location\": \"Vancouver\"}"
                    }
                }
            ],
            "reasoning_content": "The user is asking for the current weather in Vancouver. I need to search for this information. I'\''ll use the get_weather function to find current weather information for Vancouver.",
            "id": "chatcmpl-12345",
            "cited_sources": null
        },
        {
            "role": "tool",
            "content": "The weather in Vancouver is clear and 17 degrees Celsius."
        }
    ]
}'

Test Result

Automated tests pass.

Curl command results:
Main

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":null},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":"\n<think>"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":"The"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":" weather"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-e09005f6e45949af9e072efe5bacded5","object":"chat.completion.chunk","created":1758448039,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":" function"},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}]}

data: [DONE]

This PR

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null}

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"content":"\n"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"reasoning_content":"The"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"reasoning_content":" function"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-c7b139841cd146b19cebdc073814c4fa","object":"chat.completion.chunk","created":1758478817,"model":"zai-org/GLM-4.5-Air-FP8","choices":[{"index":0,"delta":{"reasoning_content":" returned"},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}]}

data: [DONE]

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a bug in the GLM-4 MoE reasoning parser for multi-turn streaming scenarios. The logic change in is_reasoning_end is correct and well-supported by the comprehensive new test suite. I have one suggestion to enhance the robustness of the parser by ensuring all necessary special tokens are validated during initialization.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Frank Wang <[email protected]>
@chaunceyjiang chaunceyjiang self-assigned this Sep 22, 2025
@chaunceyjiang
Copy link
Collaborator

/cc @zRzRzRzRzRzRzR PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants