ensure token_count is added to the final message #963

nherment · 2025-09-12T11:42:30Z

No description provided.

coderabbitai · 2025-09-12T11:42:37Z

Walkthrough

Added token-usage extraction and token counting: tool_calling_llm now calls self.llm.count_tokens_for_message(messages) and populates metadata["usage"] using get_llm_usage when no further tools are invoked; llm.py gained get_llm_usage to extract usage from ModelResponse, TextCompletionResponse, or streaming wrappers.

Changes

Cohort / File(s)	Summary
Tool-calling LLM updates `holmes/core/tool_calling_llm.py`	Imported `get_llm_usage`; added `self.llm.count_tokens_for_message(messages)` and set `metadata["usage"] = get_llm_usage(full_response)` in both `call()` and `call_stream()` branches where `tools_to_call` is empty. No other control-flow or return-value changes.
LLM usage extraction helper `holmes/core/llm.py`	Added `get_llm_usage(llm_response: Union[ModelResponse, CustomStreamWrapper, TextCompletionResponse]) -> dict` to extract token counts from `ModelResponse` or `TextCompletionResponse` (when `usage` exists) and to reconstruct streaming responses via `litellm.stream_chunk_builder`; returns `{}` if unavailable. Also imported `TextCompletionResponse`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant ToolCallingLLM as ToolCallingLLM.call()
  participant LLM as LLM
  participant HolmesLLMModule as holmes.core.llm

  Caller->>ToolCallingLLM: call(messages)
  ToolCallingLLM->>LLM: generate(messages)
  alt tools_to_call is empty
    ToolCallingLLM->>ToolCallingLLM: post-process -> full_response
    ToolCallingLLM->>LLM: count_tokens_for_message(messages)
    ToolCallingLLM->>HolmesLLMModule: get_llm_usage(full_response)
    ToolCallingLLM-->>Caller: LLMResult (metadata["usage"])
  else tools_to_call not empty
    ToolCallingLLM-->>Caller: trigger tool calls
  end

sequenceDiagram
  autonumber
  actor Subscriber
  participant ToolCallingLLM as ToolCallingLLM.call_stream()
  participant LLM as LLM
  participant HolmesLLMModule as holmes.core.llm

  Subscriber->>ToolCallingLLM: subscribe(messages)
  ToolCallingLLM->>LLM: stream(messages)
  alt tools_to_call is empty
    ToolCallingLLM-->>Subscriber: streamed chunks
    Note right of ToolCallingLLM: on finalization -> assemble full_response
    ToolCallingLLM->>LLM: count_tokens_for_message(messages)
    ToolCallingLLM->>HolmesLLMModule: get_llm_usage(full_response)
    ToolCallingLLM-->>Subscriber: ANSWER_END (metadata["usage"])
  else tools_to_call not empty
    ToolCallingLLM-->>Subscriber: tool-call events
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks (1 passed, 1 warning, 1 inconclusive)

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description Check	❓ Inconclusive	No explicit pull request description was provided in the input, so I cannot determine whether the PR description is related to the changeset; the provided raw_summary describes the changes but may not be the actual PR description.	Please provide the pull request description or confirm that the raw_summary should be treated as the PR description; if the description matches the raw_summary, this check should pass as the description is related to the changeset.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "ensure token_count is added to the final message" directly reflects the primary change in the diff, which adds token-counting and populates LLM usage metadata into the final response paths (call and call_stream). It is concise and focused on the main modification a reviewer would care about.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f89ecdc and 77d03a2.

📒 Files selected for processing (1)

holmes/core/llm.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

holmes/core/llm.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: llm_evals
GitHub Check: Pre-commit checks

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ROB-2039_report_context_window_tool_call

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5163a6 and f89ecdc.

📒 Files selected for processing (2)

holmes/core/llm.py (2 hunks)
holmes/core/tool_calling_llm.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

holmes/core/tool_calling_llm.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods

Files:

holmes/core/llm.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Pre-commit checks
GitHub Check: llm_evals

🔇 Additional comments (2)

holmes/core/llm.py (2)

533-534: No-op change

Blank lines only.

535-553: Tighten types, include TextCompletionResponse in signature, and verify stream consumption

Apply the signature/typing changes and use typing.cast/getattr as shown in the diff below; add from typing import cast.
Critically verify that litellm.stream_chunk_builder(chunks=llm_response) does NOT consume the CustomStreamWrapper. If it does, move usage extraction to stream finalization or duplicate the stream upstream.
Run mypy/ruff and fix any warnings.

Apply within this hunk:

-def get_llm_usage(llm_response: Union[ModelResponse, CustomStreamWrapper]) -> dict:
-    usage: dict = {}
-    if (
-        (
-            isinstance(llm_response, ModelResponse)
-            or isinstance(llm_response, TextCompletionResponse)
-        )
-        and hasattr(llm_response, "usage")
-        and llm_response.usage
-    ):  # type: ignore
-        usage["prompt_tokens"] = llm_response.usage.prompt_tokens  # type: ignore
-        usage["completion_tokens"] = llm_response.usage.completion_tokens  # type: ignore
-        usage["total_tokens"] = llm_response.usage.total_tokens  # type: ignore
-    elif isinstance(llm_response, CustomStreamWrapper):
-        complete_response = litellm.stream_chunk_builder(chunks=llm_response)  # type: ignore
-        if complete_response:
-            return get_llm_usage(complete_response)  # type: ignore
-    return usage
+def get_llm_usage(
+    llm_response: Union[ModelResponse, TextCompletionResponse, CustomStreamWrapper]
+) -> dict[str, int | None]:
+    usage: dict[str, int | None] = {}
+    if isinstance(llm_response, (ModelResponse, TextCompletionResponse)) and getattr(
+        llm_response, "usage", None
+    ):
+        u = llm_response.usage  # type: ignore[assignment]
+        usage["prompt_tokens"] = getattr(u, "prompt_tokens", None)  # type: ignore[attr-defined]
+        usage["completion_tokens"] = getattr(u, "completion_tokens", None)  # type: ignore[attr-defined]
+        usage["total_tokens"] = getattr(u, "total_tokens", None)  # type: ignore[attr-defined]
+    elif isinstance(llm_response, CustomStreamWrapper):
+        # Note: ensure this does not consume the stream before the caller iterates it.
+        complete_response = cast(
+            Optional[ModelResponse | TextCompletionResponse],
+            litellm.stream_chunk_builder(chunks=llm_response),
+        )
+        if complete_response is not None:
+            return get_llm_usage(complete_response)
+    return usage

Add import (outside this hunk):

from typing import cast

holmes/core/llm.py

github-actions · 2025-09-12T13:51:52Z

Results of HolmesGPT evals

ask_holmes: 30/37 test cases were successful, 4 regressions, 2 skipped, 1 setup failures

Test suite	Test case	Status
ask	01_how_many_pods	✅
ask	02_what_is_wrong_with_pod	✅
ask	04_related_k8s_events	↪️
ask	05_image_version	✅
ask	09_crashpod	✅
ask	10_image_pull_backoff	✅
ask	110_k8s_events_image_pull	✅
ask	11_init_containers	✅
ask	13a_pending_node_selector_basic	✅
ask	14_pending_resources	✅
ask	15_failed_readiness_probe	✅
ask	17_oom_kill	✅
ask	18_crash_looping_v2	✅
ask	19_detect_missing_app_details	✅
ask	20_long_log_file_search	✅
ask	24_misconfigured_pvc	✅
ask	24a_misconfigured_pvc_basic	✅
ask	28_permissions_error	🚧
ask	29_events_from_alert_manager	↪️
ask	39_failed_toolset	✅
ask	41_setup_argo	✅
ask	42_dns_issues_steps_new_tools	✅
ask	43_current_datetime_from_prompt	✅
ask	45_fetch_deployment_logs_simple	✅
ask	51_logs_summarize_errors	✅
ask	53_logs_find_term	✅
ask	54_not_truncated_when_getting_pods	❌
ask	59_label_based_counting	✅
ask	60_count_less_than	✅
ask	61_exact_match_counting	✅
ask	63_fetch_error_logs_no_errors	✅
ask	79_configmap_mount_issue	✅
ask	83_secret_not_found	✅
ask	86_configmap_like_but_secret	✅
ask	93_calling_datadog[0]	❌
ask	93_calling_datadog[1]	❌
ask	93_calling_datadog[2]	❌

Legend

✅ the test was successful
↪️ the test was skipped
⚠️ the test failed but is known to be flaky or known to fail
🚧 the test had a setup failure (not a code regression)
🔧 the test failed due to mock data issues (not a code regression)
❌ the test failed and should be fixed before merging the PR

ensure token_count is added to the final message

e5163a6

nherment requested review from aantn and moshemorad September 12, 2025 11:49

add total tokens to metadata

f89ecdc

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

holmes/core/llm.py Show resolved Hide resolved

aantn previously approved these changes Sep 12, 2025

View reviewed changes

holmes/core/llm.py Outdated Show resolved Hide resolved

chore: address PR comments

77d03a2

nherment dismissed aantn’s stale review via 77d03a2 September 12, 2025 13:36

nherment enabled auto-merge (squash) September 12, 2025 13:36

aantn approved these changes Sep 12, 2025

View reviewed changes

nherment merged commit a12c989 into master Sep 12, 2025
7 checks passed

nherment deleted the ROB-2039_report_context_window_tool_call branch September 12, 2025 13:44

This was referenced Sep 14, 2025

Add max tokens to holmes response #970

Merged

Add max output tokens to holmes response #972

Merged

coderabbitai bot mentioned this pull request Sep 22, 2025

ROB-2161 improved truncation methodology #984

Open

coderabbitai bot mentioned this pull request Oct 2, 2025

ROB-2136 increase prometheus tool limit #1025

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ensure token_count is added to the final message #963

ensure token_count is added to the final message #963

Uh oh!

nherment commented Sep 12, 2025

Uh oh!

coderabbitai bot commented Sep 12, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

ensure token_count is added to the final message #963

ensure token_count is added to the final message #963

Uh oh!

Conversation

nherment commented Sep 12, 2025

Uh oh!

coderabbitai bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks (1 passed, 1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 12, 2025

Results of HolmesGPT evals

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Sep 12, 2025 •

edited

Loading