-
Couldn't load subscription status.
- Fork 182
ensure token_count is added to the final message #963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdded token-usage extraction and token counting: tool_calling_llm now calls Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant ToolCallingLLM as ToolCallingLLM.call()
participant LLM as LLM
participant HolmesLLMModule as holmes.core.llm
Caller->>ToolCallingLLM: call(messages)
ToolCallingLLM->>LLM: generate(messages)
alt tools_to_call is empty
ToolCallingLLM->>ToolCallingLLM: post-process -> full_response
ToolCallingLLM->>LLM: count_tokens_for_message(messages)
ToolCallingLLM->>HolmesLLMModule: get_llm_usage(full_response)
ToolCallingLLM-->>Caller: LLMResult (metadata["usage"])
else tools_to_call not empty
ToolCallingLLM-->>Caller: trigger tool calls
end
sequenceDiagram
autonumber
actor Subscriber
participant ToolCallingLLM as ToolCallingLLM.call_stream()
participant LLM as LLM
participant HolmesLLMModule as holmes.core.llm
Subscriber->>ToolCallingLLM: subscribe(messages)
ToolCallingLLM->>LLM: stream(messages)
alt tools_to_call is empty
ToolCallingLLM-->>Subscriber: streamed chunks
Note right of ToolCallingLLM: on finalization -> assemble full_response
ToolCallingLLM->>LLM: count_tokens_for_message(messages)
ToolCallingLLM->>HolmesLLMModule: get_llm_usage(full_response)
ToolCallingLLM-->>Subscriber: ANSWER_END (metadata["usage"])
else tools_to_call not empty
ToolCallingLLM-->>Subscriber: tool-call events
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks (1 passed, 1 warning, 1 inconclusive)❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
holmes/core/llm.py(2 hunks)holmes/core/tool_calling_llm.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- holmes/core/tool_calling_llm.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods
Files:
holmes/core/llm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
🔇 Additional comments (2)
holmes/core/llm.py (2)
533-534: No-op changeBlank lines only.
535-553: Tighten types, include TextCompletionResponse in signature, and verify stream consumption
- Apply the signature/typing changes and use typing.cast/getattr as shown in the diff below; add
from typing import cast.- Critically verify that
litellm.stream_chunk_builder(chunks=llm_response)does NOT consume theCustomStreamWrapper. If it does, move usage extraction to stream finalization or duplicate the stream upstream.- Run mypy/ruff and fix any warnings.
Apply within this hunk:
-def get_llm_usage(llm_response: Union[ModelResponse, CustomStreamWrapper]) -> dict: - usage: dict = {} - if ( - ( - isinstance(llm_response, ModelResponse) - or isinstance(llm_response, TextCompletionResponse) - ) - and hasattr(llm_response, "usage") - and llm_response.usage - ): # type: ignore - usage["prompt_tokens"] = llm_response.usage.prompt_tokens # type: ignore - usage["completion_tokens"] = llm_response.usage.completion_tokens # type: ignore - usage["total_tokens"] = llm_response.usage.total_tokens # type: ignore - elif isinstance(llm_response, CustomStreamWrapper): - complete_response = litellm.stream_chunk_builder(chunks=llm_response) # type: ignore - if complete_response: - return get_llm_usage(complete_response) # type: ignore - return usage +def get_llm_usage( + llm_response: Union[ModelResponse, TextCompletionResponse, CustomStreamWrapper] +) -> dict[str, int | None]: + usage: dict[str, int | None] = {} + if isinstance(llm_response, (ModelResponse, TextCompletionResponse)) and getattr( + llm_response, "usage", None + ): + u = llm_response.usage # type: ignore[assignment] + usage["prompt_tokens"] = getattr(u, "prompt_tokens", None) # type: ignore[attr-defined] + usage["completion_tokens"] = getattr(u, "completion_tokens", None) # type: ignore[attr-defined] + usage["total_tokens"] = getattr(u, "total_tokens", None) # type: ignore[attr-defined] + elif isinstance(llm_response, CustomStreamWrapper): + # Note: ensure this does not consume the stream before the caller iterates it. + complete_response = cast( + Optional[ModelResponse | TextCompletionResponse], + litellm.stream_chunk_builder(chunks=llm_response), + ) + if complete_response is not None: + return get_llm_usage(complete_response) + return usageAdd import (outside this hunk):
from typing import cast
No description provided.