Skip to content

Conversation

@nherment
Copy link
Collaborator

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 26, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces a computed max_output_tokens in holmes/core/llm.py using floor(min(64000, context_window_size/5)) and augments it by checking litellm.model_cost per model variant, updating the value if larger. Replaces the previous hard-coded 4096 default in logic and warnings. Adds math.floor import.

Changes

Cohort / File(s) Summary
Dynamic max_output_tokens computation
holmes/core/llm.py
Implement floor(min(64000, context_window_size/5)) baseline; iterate over litellm.model_cost entries to select higher max_output_tokens when available; replace hard-coded 4096 defaults and warnings with computed value; import floor from math.

Sequence Diagram(s)

sequenceDiagram
    actor Caller
    participant LLM as LLM.get_max_output_tokens
    participant Cost as litellm.model_cost

    Caller->>LLM: request max_output_tokens(model, context_window_size)
    activate LLM
    Note over LLM: Compute baseline = floor(min(64000, context_window_size/5))
    LLM->>Cost: Iterate model variants to read max_output_tokens
    activate Cost
    Cost-->>LLM: per-variant max_output_tokens (if any)
    deactivate Cost
    Note over LLM: Compare and update to the highest available value
    LLM-->>Caller: return computed max_output_tokens
    deactivate LLM
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • aantn

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is missing and fails to describe any part of the changeset, leaving reviewers without context for the implemented dynamic max_output_tokens feature. Please add a brief pull request description summarizing the changes, such as the introduction of a dynamic max_output_tokens calculation based on context window size and model cost, to provide necessary context for reviewers.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly summarizes the primary change by indicating that the pull request limits the maximum output tokens, which aligns with the added dynamic calculation of max_output_tokens in the code; the ticket reference is acceptable and does not detract from clarity.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
holmes/core/tool_calling_llm.py (4)

4-4: Drop unnecessary math.floor import

Switch to integer arithmetic in limit_max_output_tokens and remove this import.

-from math import floor

137-142: Clamp by context (>=), use integer math, and add a brief docstring

Covers misconfigurations where maximum_output_token >= context, avoids float + floor, and clarifies behavior.

-def limit_max_output_tokens(maximum_output_token: int, max_context_size: int) -> int:
-    if maximum_output_token == max_context_size:
-        return floor(min(64000, maximum_output_token / 5))
-    else:
-        return maximum_output_token
+def limit_max_output_tokens(maximum_output_token: int, max_context_size: int) -> int:
+    """
+    Clamp max output tokens when set to be >= the model's context window.
+    Returns 1/5 of context (integer), capped at 64k. Otherwise returns the input.
+    """
+    if maximum_output_token >= max_context_size:
+        return min(64_000, max_context_size // 5)
+    return maximum_output_token

438-446: Use reserved output tokens for the pre-truncation budget check

The truncation logic reserves min(max_output, MAX_OUTPUT_TOKEN_RESERVATION), but the pre-check uses maximum_output_token, which can trigger unnecessary truncation. Align the check with the actual reserved value.

             maximum_output_token = limit_max_output_tokens(
                 maximum_output_token=maximum_output_token,
                 max_context_size=max_context_size,
             )
-            if (total_tokens + maximum_output_token) > max_context_size:
+            reserved_output_tokens = min(
+                maximum_output_token, MAX_OUTPUT_TOKEN_RESERVATION
+            )
+            if (total_tokens + reserved_output_tokens) > max_context_size:
                 logging.warning("Token limit exceeded. Truncating tool responses.")
                 truncated_res = self.truncate_messages_to_fit_context(
                     messages, max_context_size, maximum_output_token
                 )

914-924: Same alignment for streaming path: check against reserved output tokens

Keeps call_stream consistent with call and avoids premature truncation.

             maximum_output_token = limit_max_output_tokens(
                 maximum_output_token=maximum_output_token,
                 max_context_size=max_context_size,
             )
-
-            if (total_tokens + maximum_output_token) > max_context_size:
+            reserved_output_tokens = min(
+                maximum_output_token, MAX_OUTPUT_TOKEN_RESERVATION
+            )
+            if (total_tokens + reserved_output_tokens) > max_context_size:
                 logging.warning("Token limit exceeded. Truncating tool responses.")
                 truncated_res = self.truncate_messages_to_fit_context(
                     messages, max_context_size, maximum_output_token
                 )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 429dd75 and 4e6c2eb.

📒 Files selected for processing (1)
  • holmes/core/tool_calling_llm.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

  • holmes/core/tool_calling_llm.py
holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

  • holmes/core/tool_calling_llm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pre-commit checks
  • GitHub Check: llm_evals

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
holmes/core/tool_calling_llm.py (1)

4-4: Remove unnecessary floor import

You can avoid float math and floor by using integer division. If you apply the refactor below, drop this import.

Apply:

-from math import floor
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e6c2eb and 6f56a77.

📒 Files selected for processing (1)
  • holmes/core/tool_calling_llm.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

  • holmes/core/tool_calling_llm.py
holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

  • holmes/core/tool_calling_llm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: llm_evals
🔇 Additional comments (1)
holmes/core/tool_calling_llm.py (1)

918-922: Mirror provider-side enforcement in streaming path

Apply the same provider-side cap in call_stream by passing the clamped max tokens to self.llm.completion.

Example:

full_response = self.llm.completion(
    messages=parse_messages_tags(messages),
    tools=tools,
    tool_choice=tool_choice,
    response_format=response_format,
    temperature=TEMPERATURE,
    stream=False,
    max_tokens=maximum_output_token,  # enforce clamp here too
    drop_params=True,
)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
holmes/core/llm.py (2)

4-4: Drop math.floor and use integer division for clarity and zero‑alloc math

Avoid float math and an extra import; // is clearer and sufficient since tokens are integers.

Apply:

-from math import floor
-        max_output_tokens = floor(min(64000, self.get_context_window_size() / 5))
+        cw = self.get_context_window_size()
+        max_output_tokens = min(64000, cw // 5)

Also applies to: 296-297


305-314: Narrow exception handling and avoid try/except/continue pattern in lookup

Catching Exception and returning inside try hurts readability and can mask errors. Use .get() and simple type checks; log only when no variant matched. This also addresses Ruff TRY300/BLE001 hints.

Apply:

-        # Try each name variant
-        for name in self._get_model_name_variants_for_lookup():
-            try:
-                litellm_max_output_tokens = litellm.model_cost[name][
-                    "max_output_tokens"
-                ]
-                if litellm_max_output_tokens < max_output_tokens:
-                    max_output_tokens = litellm_max_output_tokens
-                return max_output_tokens
-            except Exception:
-                continue
+        # Try each name variant
+        for name in self._get_model_name_variants_for_lookup():
+            entry = litellm.model_cost.get(name)
+            if not entry:
+                continue
+            litellm_max_output_tokens = entry.get("max_output_tokens")
+            if isinstance(litellm_max_output_tokens, int):
+                max_output_tokens = min(max_output_tokens, litellm_max_output_tokens)
+                return max_output_tokens
-        # Log which lookups we tried
-        logging.warning(
-            f"Couldn't find model {self.model} in litellm's model list (tried: {', '.join(self._get_model_name_variants_for_lookup())}), "
-            f"using {max_output_tokens} tokens for max_output_tokens. "
-            f"To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the correct value for your model."
-        )
-        return max_output_tokens
+        # Log which lookups we tried
+        logging.warning(
+            f"Couldn't find model {self.model} in litellm.model_cost "
+            f"(tried: {', '.join(self._get_model_name_variants_for_lookup())}); "
+            f"falling back to computed default {max_output_tokens} (min(64000, context_window/5)). "
+            f"To override, set OVERRIDE_MAX_OUTPUT_TOKEN env var."
+        )
+        return max_output_tokens

Based on static analysis hints

Also applies to: 316-323

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f56a77 and e2c2cb7.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • holmes/core/llm.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

  • holmes/core/llm.py
holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

  • holmes/core/llm.py
🪛 Ruff (0.13.1)
holmes/core/llm.py

312-312: Consider moving this statement to an else block

(TRY300)


313-314: try-except-continue detected, consider logging the exception

(S112)


313-313: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: llm_evals
  • GitHub Check: Pre-commit checks

arikalon1
arikalon1 previously approved these changes Sep 26, 2025
Copy link
Contributor

@arikalon1 arikalon1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work

@nherment nherment enabled auto-merge (squash) September 26, 2025 11:29
@nherment nherment merged commit c06a556 into master Sep 26, 2025
7 checks passed
@nherment nherment deleted the limit_max_output_context_window branch September 26, 2025 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants