Skip to content

Inconsistent failure to use thinking with Claude 4 Sonnet #958

@francisjervis

Description

@francisjervis

I am getting API responses which do not contain a thinking block (and where, based on output token count, extended thinking has definitely not been applied) on some calls. The behavior is consistent across calls with the same system prompt/messages. There is no API error and the messages parameter is well formed (a prior assistant message with thinking and tool use block, user message with tool result). See console logs (starting from relevant portion of the params passed to the API):

'thinking': {'type': 'enabled', 'budget_tokens': 1024}}
01:21:51.371 Message with 'claude-sonnet-4-0' [LLM]
2025-05-26 22:21:53 - HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
Processing event in _process_evaluation_cycle: RawMessageStartEvent(message=Message(id='msg_0129AsRpyXyD6dw3EX2zxZgv', content=[], model='claude-sonnet-4-20250514', role='assistant', stop_reason=None, stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=3327, output_tokens=41, server_tool_use=None, service_tier='standard')), type='message_start')
Processing event in _process_evaluation_cycle: RawContentBlockStartEvent(content_block=ToolUseBlock(id='toolu_017roy3ok9vf9RsbkbXZbm6Q', input={}, name='probing_question', type='tool_use'), index=0, type='content_block_start')
Handling block start event: RawContentBlockStartEvent(content_block=ToolUseBlock(id='toolu_017roy3ok9vf9RsbkbXZbm6Q', input={}, name='probing_question', type='tool_use'), index=0, type='content_block_start')

The API call comes from the same function as one which successfully returns a thinking block. This is not an issue with how I am handling streaming events - the same stream handling code logs this earlier in the conversation:

'thinking': {'type': 'enabled', 'budget_tokens': 1024}}
01:21:34.297 Message with 'claude-sonnet-4-0' [LLM]
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1748308894.318467 18618069 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1748308894.351447 18618069 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
2025-05-26 22:21:36 - HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
Processing event in _process_evaluation_cycle: RawMessageStartEvent(message=Message(id='msg_0131jdGMgkn2suhC33vn5jyu', content=[], model='claude-sonnet-4-20250514', role='assistant', stop_reason=None, stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=2736, output_tokens=2, server_tool_use=None, service_tier='standard')), type='message_start')
Processing event in _process_evaluation_cycle: RawContentBlockStartEvent(content_block=ThinkingBlock(signature='', thinking='', type='thinking'), index=0, type='content_block_start')

This is extremely undesirable due to the requirement that thinking blocks are included in the conversation history when extended thinking is enabled.

This appears to be a model-side issue based on the fact that 3.7 Sonnet does not return a tool use block sans thinking - it returns three (3) tokens, effectively an empty response:

01:54:59.528 Message with 'claude-3-7-sonnet-latest' [LLM]
2025-05-26 22:55:04 - HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
Processing event in _process_evaluation_cycle: RawMessageStartEvent(message=Message(id='msg_0144j8sbKgpsG7y4CAi1Mocj', content=[], model='claude-3-7-sonnet-20250219', role='assistant', stop_reason=None, stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=3155, output_tokens=3, server_tool_use=None, service_tier='standard')), type='message_start')
Processing event in _process_evaluation_cycle: RawMessageDeltaEvent(delta=Delta(stop_reason='end_turn', stop_sequence=None), type='message_delta', usage=MessageDeltaUsage(cache_creation_input_tokens=None, cache_read_input_tokens=None, input_tokens=None, output_tokens=3, server_tool_use=None))
Processing event in _process_evaluation_cycle: MessageStopEvent(type='message_stop', message=Message(id='msg_0144j8sbKgpsG7y4CAi1Mocj', content=[], model='claude-3-7-sonnet-20250219', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=3155, output_tokens=3, server_tool_use=None, service_tier='standard')))
01:55:04.595 streaming response from 'claude-3-7-sonnet-latest' took 0.03s [LLM]

Reporting it here as it's happening in Python. API results for calls with thinking enabled which do not contain a thinking block are malformed, as far as I'm concerned, and it is not mentioned in the docs that they may or may not be generated - indeed as noted above extended thinking basically does not work if they are not preserved and sent with subsequent calls. Adding code to fall back to no thinking if the model simply does not return a thinking block is not an acceptable solution.

Edit to add that I am also getting failed empty (but apparently billed) requests with other prompts, with Claude 4 Sonnet, eg:

...thinking': {'type': 'enabled', 'budget_tokens': 1024}}
04:06:18.383 Message with 'claude-sonnet-4-0' [LLM]
2025-05-27 01:06:20 - HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
04:06:20.653 streaming response from 'claude-sonnet-4-0' took 0.00s [LLM]

This is obviously not a desirable failure mode - the code has retry logic but every request fails (in very rapid succession, which is problematic given I am presumably being billed for input tokens!), if this is due to malformed inputs the API should raise an error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions