ROB-2111: limit max output tokens #1005

nherment · 2025-09-26T07:52:09Z

No description provided.

coderabbitai · 2025-09-26T07:52:17Z

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces a computed max_output_tokens in holmes/core/llm.py using floor(min(64000, context_window_size/5)) and augments it by checking litellm.model_cost per model variant, updating the value if larger. Replaces the previous hard-coded 4096 default in logic and warnings. Adds math.floor import.

Changes

Cohort / File(s)	Summary
Dynamic max_output_tokens computation `holmes/core/llm.py`	Implement floor(min(64000, context_window_size/5)) baseline; iterate over litellm.model_cost entries to select higher max_output_tokens when available; replace hard-coded 4096 defaults and warnings with computed value; import floor from math.

Sequence Diagram(s)

sequenceDiagram
    actor Caller
    participant LLM as LLM.get_max_output_tokens
    participant Cost as litellm.model_cost

    Caller->>LLM: request max_output_tokens(model, context_window_size)
    activate LLM
    Note over LLM: Compute baseline = floor(min(64000, context_window_size/5))
    LLM->>Cost: Iterate model variants to read max_output_tokens
    activate Cost
    Cost-->>LLM: per-variant max_output_tokens (if any)
    deactivate Cost
    Note over LLM: Compare and update to the highest available value
    LLM-->>Caller: return computed max_output_tokens
    deactivate LLM

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Add max output tokens to holmes response #972: Modifies tool_calling_llm to include max_output_tokens from llm.get_maximum_output_token(), directly interacting with this PR’s new computation.

Suggested reviewers

aantn

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is missing and fails to describe any part of the changeset, leaving reviewers without context for the implemented dynamic max_output_tokens feature.	Please add a brief pull request description summarizing the changes, such as the introduction of a dynamic max_output_tokens calculation based on context window size and model cost, to provide necessary context for reviewers.
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly summarizes the primary change by indicating that the pull request limits the maximum output tokens, which aligns with the added dynamic calculation of max_output_tokens in the code; the ticket reference is acceptable and does not detract from clarity.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

holmes/core/tool_calling_llm.py (4)

4-4: Drop unnecessary math.floor import

Switch to integer arithmetic in limit_max_output_tokens and remove this import.

-from math import floor

137-142: Clamp by context (>=), use integer math, and add a brief docstring

Covers misconfigurations where maximum_output_token >= context, avoids float + floor, and clarifies behavior.

-def limit_max_output_tokens(maximum_output_token: int, max_context_size: int) -> int:
-    if maximum_output_token == max_context_size:
-        return floor(min(64000, maximum_output_token / 5))
-    else:
-        return maximum_output_token
+def limit_max_output_tokens(maximum_output_token: int, max_context_size: int) -> int:
+    """
+    Clamp max output tokens when set to be >= the model's context window.
+    Returns 1/5 of context (integer), capped at 64k. Otherwise returns the input.
+    """
+    if maximum_output_token >= max_context_size:
+        return min(64_000, max_context_size // 5)
+    return maximum_output_token

438-446: Use reserved output tokens for the pre-truncation budget check

The truncation logic reserves min(max_output, MAX_OUTPUT_TOKEN_RESERVATION), but the pre-check uses maximum_output_token, which can trigger unnecessary truncation. Align the check with the actual reserved value.

             maximum_output_token = limit_max_output_tokens(
                 maximum_output_token=maximum_output_token,
                 max_context_size=max_context_size,
             )
-            if (total_tokens + maximum_output_token) > max_context_size:
+            reserved_output_tokens = min(
+                maximum_output_token, MAX_OUTPUT_TOKEN_RESERVATION
+            )
+            if (total_tokens + reserved_output_tokens) > max_context_size:
                 logging.warning("Token limit exceeded. Truncating tool responses.")
                 truncated_res = self.truncate_messages_to_fit_context(
                     messages, max_context_size, maximum_output_token
                 )

914-924: Same alignment for streaming path: check against reserved output tokens

Keeps call_stream consistent with call and avoids premature truncation.

             maximum_output_token = limit_max_output_tokens(
                 maximum_output_token=maximum_output_token,
                 max_context_size=max_context_size,
             )
-
-            if (total_tokens + maximum_output_token) > max_context_size:
+            reserved_output_tokens = min(
+                maximum_output_token, MAX_OUTPUT_TOKEN_RESERVATION
+            )
+            if (total_tokens + reserved_output_tokens) > max_context_size:
                 logging.warning("Token limit exceeded. Truncating tool responses.")
                 truncated_res = self.truncate_messages_to_fit_context(
                     messages, max_context_size, maximum_output_token
                 )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 429dd75 and 4e6c2eb.

📒 Files selected for processing (1)

holmes/core/tool_calling_llm.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/core/tool_calling_llm.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/core/tool_calling_llm.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Pre-commit checks
GitHub Check: llm_evals

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

holmes/core/tool_calling_llm.py (1)
4-4: Remove unnecessary floor import

You can avoid float math and floor by using integer division. If you apply the refactor below, drop this import.

Apply:
-from math import floor

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e6c2eb and 6f56a77.

📒 Files selected for processing (1)

holmes/core/tool_calling_llm.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/core/tool_calling_llm.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/core/tool_calling_llm.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: llm_evals

🔇 Additional comments (1)

holmes/core/tool_calling_llm.py (1)
918-922: Mirror provider-side enforcement in streaming path

Apply the same provider-side cap in call_stream by passing the clamped max tokens to self.llm.completion.

Example:
full_response = self.llm.completion(
    messages=parse_messages_tags(messages),
    tools=tools,
    tool_choice=tool_choice,
    response_format=response_format,
    temperature=TEMPERATURE,
    stream=False,
    max_tokens=maximum_output_token,  # enforce clamp here too
    drop_params=True,
)

holmes/core/tool_calling_llm.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

holmes/core/llm.py (2)

4-4: Drop math.floor and use integer division for clarity and zero‑alloc math

Avoid float math and an extra import; // is clearer and sufficient since tokens are integers.

Apply:

-from math import floor

-        max_output_tokens = floor(min(64000, self.get_context_window_size() / 5))
+        cw = self.get_context_window_size()
+        max_output_tokens = min(64000, cw // 5)

Also applies to: 296-297

305-314: Narrow exception handling and avoid try/except/continue pattern in lookup

Catching Exception and returning inside try hurts readability and can mask errors. Use .get() and simple type checks; log only when no variant matched. This also addresses Ruff TRY300/BLE001 hints.

Apply:

-        # Try each name variant
-        for name in self._get_model_name_variants_for_lookup():
-            try:
-                litellm_max_output_tokens = litellm.model_cost[name][
-                    "max_output_tokens"
-                ]
-                if litellm_max_output_tokens < max_output_tokens:
-                    max_output_tokens = litellm_max_output_tokens
-                return max_output_tokens
-            except Exception:
-                continue
+        # Try each name variant
+        for name in self._get_model_name_variants_for_lookup():
+            entry = litellm.model_cost.get(name)
+            if not entry:
+                continue
+            litellm_max_output_tokens = entry.get("max_output_tokens")
+            if isinstance(litellm_max_output_tokens, int):
+                max_output_tokens = min(max_output_tokens, litellm_max_output_tokens)
+                return max_output_tokens

-        # Log which lookups we tried
-        logging.warning(
-            f"Couldn't find model {self.model} in litellm's model list (tried: {', '.join(self._get_model_name_variants_for_lookup())}), "
-            f"using {max_output_tokens} tokens for max_output_tokens. "
-            f"To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the correct value for your model."
-        )
-        return max_output_tokens
+        # Log which lookups we tried
+        logging.warning(
+            f"Couldn't find model {self.model} in litellm.model_cost "
+            f"(tried: {', '.join(self._get_model_name_variants_for_lookup())}); "
+            f"falling back to computed default {max_output_tokens} (min(64000, context_window/5)). "
+            f"To override, set OVERRIDE_MAX_OUTPUT_TOKEN env var."
+        )
+        return max_output_tokens

Based on static analysis hints

Also applies to: 316-323

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f56a77 and e2c2cb7.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (1)

holmes/core/llm.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/core/llm.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/core/llm.py

🪛 Ruff (0.13.1)

holmes/core/llm.py

312-312: Consider moving this statement to an else block

(TRY300)

313-314: try-except-continue detected, consider logging the exception

(S112)

313-313: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: llm_evals
GitHub Check: Pre-commit checks

arikalon1

nice work

fix: limit max output tokens

4e6c2eb

nherment requested review from aantn, arikalon1 and moshemorad September 26, 2025 07:52

nherment added 2 commits September 26, 2025 09:54

fix: limit max output tokens

3ae9a64

doc: add TODO comment

4d8b603

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

chore: linting

6f56a77

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

holmes/core/tool_calling_llm.py Outdated Show resolved Hide resolved

holmes/core/tool_calling_llm.py Outdated Show resolved Hide resolved

nherment added 5 commits September 26, 2025 10:17

move output token limiter logic to llm class

b483025

move output token limiter logic to llm class

8768523

move output token limiter logic to llm class

7b29c61

move output token limiter logic to llm class

dcd8df6

update poetry lock file

e2c2cb7

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

poetry lock

ba11d8d

arikalon1 previously approved these changes Sep 26, 2025

View reviewed changes

nherment enabled auto-merge (squash) September 26, 2025 11:29

poetry lock

e87c3e3

nherment dismissed arikalon1’s stale review via e87c3e3 September 26, 2025 11:33

nherment added 3 commits September 26, 2025 13:34

poetry lock

da98eb7

poetry lock

02b4f3e

poetry lock

4e9fc2f

arikalon1 approved these changes Sep 26, 2025

View reviewed changes

nherment merged commit c06a556 into master Sep 26, 2025
7 checks passed

nherment deleted the limit_max_output_context_window branch September 26, 2025 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ROB-2111: limit max output tokens #1005

ROB-2111: limit max output tokens #1005

Uh oh!

nherment commented Sep 26, 2025

Uh oh!

coderabbitai bot commented Sep 26, 2025 •

edited

Loading

Review skipped

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

arikalon1 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

ROB-2111: limit max output tokens #1005

ROB-2111: limit max output tokens #1005

Uh oh!

Conversation

nherment commented Sep 26, 2025

Uh oh!

coderabbitai bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

arikalon1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Sep 26, 2025 •

edited

Loading