ROB-2116: prevent tool calls responses that are too big #956

nherment · 2025-09-10T13:56:02Z

No description provided.

coderabbitai · 2025-09-10T13:56:12Z

Walkthrough

Rename ToolResultStatus → StructuredToolResultStatus across core, plugins, and tests; add TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT and a limiter that mutates oversized tool responses; extract ToolCallResult/Truncation types and formatting into tools_utils; add LLM cost tracking; small Git toolset tweaks and test fixture port update.

Changes

Cohort / File(s)	Summary
Core: status enum & LLM flow `holmes/core/tools.py`, `holmes/core/tool_calling_llm.py`, `holmes/core/safeguards.py`	Rename `ToolResultStatus` → `StructuredToolResultStatus`; update imports/usages; integrate LLM cost extraction/tracking; invoke context-window limiter after tool responses; update approval/error status handling.
Core: tools utils & limiter `holmes/core/tools_utils/data_types.py`, `holmes/core/tools_utils/tool_context_window_limiter.py`, `holmes/core/tools_utils/tool_executor.py`	Add `TruncationMetadata`, `TruncationResult`, `ToolCallResult`, and `format_tool_result_data`; add `prevent_overly_big_tool_response` which uses `TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT` to cap tool responses; update executor to use new enum.
Config `holmes/common/env_vars.py`	Add `TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT` (float env var, default 10) and explanatory comments.
Plugins: status enum migration `holmes/plugins/toolsets/...` (many files, e.g., `datadog/`, `git.py`, `grafana/`, `azure_sql/`, `kubernetes_logs.py`, `opensearch/`, `prometheus/`, `rabbitmq/`, etc.)	Replace `ToolResultStatus` with `StructuredToolResultStatus` in imports and all `StructuredToolResult` returns; preserve control flow and messages (some files have minor related tweaks, e.g., Datadog 429 handling).
Git toolset enhancements `holmes/plugins/toolsets/git.py`	Add tracking for created branches/PRs (`_created_branches`, `_created_prs`, add/is methods); improve PR update insertion indentation logic; adjust one-liner formatting and sanitize errors; adopt new status enum.
Tests & fixtures `tests/**`, `tests/llm/fixtures/...`	Update tests to import/use `StructuredToolResultStatus`; move ToolCallResult/formatting imports to `tools_utils/data_types`; add tests for limiter (`tests/core/tools_utils/test_tool_context_window_limiter.py`); update tempo fixture port 3200→3201.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Orch as Orchestrator
  participant LLM as ToolCallingLLM
  participant Tool as Tool
  participant Lim as ContextWindowLimiter
  participant App as ApprovalService

  Orch->>LLM: call(tool_request)
  LLM->>Tool: _invoke_tool_call(...)
  Tool-->>LLM: StructuredToolResult
  LLM->>Lim: prevent_overly_big_tool_response(tool_call_result, llm)
  Note right of Lim #d3f4ff: Mutates ToolCallResult → ERROR if over budget using TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT
  alt status == APPROVAL_REQUIRED
    LLM->>App: request_approval(result)
    App-->>LLM: approved / denied
    alt approved
      LLM->>Tool: continue/reinvoke
      Tool-->>LLM: StructuredToolResult
      LLM->>Lim: prevent_overly_big_tool_response(...)
    else denied
      LLM-->>Orch: return ERROR result
    end
  end
  LLM-->>Orch: ToolCallResult (may include costs)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

tracing tool call result and errors correctly #937 — refactors tool-calling control flow and structured tool-call handling; closely related to moved data types and limiter integration.
Enhance prompt caching #916 — touches env vars and LLM usage/cost-tracking; related to added env knob and cost instrumentation.
tempo traces test error 115 #925 — updates tempo test fixtures (port changes); directly overlaps the tempo port update in this PR.

Suggested labels

enhancement

Pre-merge checks (2 passed, 1 inconclusive)

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description Check	❓ Inconclusive	I cannot find the current PR description text in the provided input, so I cannot determine whether the PR description is related to the changeset or sufficiently descriptive.	Please provide the PR description text (or paste the GitHub PR description) so I can verify it is on-topic; a good description should briefly state purpose, the main changes (token-budget limiter, moved data types, enum rename), and any migration or compatibility notes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title concisely identifies the primary intent—preventing tool call responses that are too large—and includes the ticket ID, which matches the changes adding a token-budget limiter and related tests; it therefore accurately reflects the main change. It is short and readable, though there is a minor grammatical tweak possible ("tool calls responses" → "tool call responses") that would improve clarity.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51d33ba and b51963b.

📒 Files selected for processing (1)

tests/core/tools_utils/test_tool_context_window_limiter.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/core/tools_utils/test_tool_context_window_limiter.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: llm_evals
GitHub Check: Pre-commit checks

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ROB-2116_prevent_tool_call_results_that_are_too_big

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…e_too_big

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (11)

holmes/plugins/toolsets/newrelic.py (1)

82-85: Add HTTP timeouts to external calls

requests.post without a timeout can hang indefinitely.

Apply this diff:

-            response = requests.post(url, headers=headers, json=query)  # type: ignore[arg-type]
+            response = requests.post(url, headers=headers, json=query, timeout=30)  # type: ignore[arg-type]

-            response = requests.post(url, headers=headers, json=query)  # type: ignore[arg-type]
+            response = requests.post(url, headers=headers, json=query, timeout=30)  # type: ignore[arg-type]

Also applies to: 167-170

tests/plugins/toolsets/test_toolset_utils.py (1)

255-258: Bug: using list.sort() return value leads to None comparisons

.sort() sorts in place and returns None; the test will always compare None == None.

Apply this diff:
-    expected_toolsets_names = [t.name for t in expected_toolsets].sort()
-    filtered_toolsets_names = [t.name for t in filtered_toolsets].sort()
+    expected_toolsets_names = sorted(t.name for t in expected_toolsets)
+    filtered_toolsets_names = sorted(t.name for t in filtered_toolsets)

holmes/plugins/toolsets/kafka.py (1)

578-580: Avoid shared mutable class attributes (clients, kafka_config)

Defined at class level, shared across instances; can leak state between runs.

Apply this diff:

 class KafkaToolset(Toolset):
     model_config = ConfigDict(arbitrary_types_allowed=True)
-    clients: Dict[str, AdminClient] = {}
-    kafka_config: Optional[KafkaConfig] = None
-
-    def __init__(self):
+    def __init__(self):
+        self.clients: Dict[str, AdminClient] = {}
+        self.kafka_config: Optional[KafkaConfig] = None
         super().__init__(
             name="kafka/admin",
             description="Fetches metadata from multiple Kafka clusters",

Also applies to: 582-597

holmes/plugins/toolsets/mcp/toolset_mcp.py (1)

34-38: Error message uses e.args tuple — return the actual message.

Using str(e.args) renders a tuple; prefer str(e) (or {e!s}) for readable errors.

-            return StructuredToolResult(
-                status=StructuredToolResultStatus.ERROR,
-                error=str(e.args),
-                params=params,
-                invocation=f"MCPtool {self.name} with params {params}",
-            )
+            return StructuredToolResult(
+                status=StructuredToolResultStatus.ERROR,
+                error=str(e),
+                params=params,
+                invocation=f"MCPtool {self.name} with params {params}",
+            )

holmes/plugins/toolsets/grafana/toolset_grafana.py (2)

55-56: Add a request timeout to external call.

Avoid hanging on slow Grafana; prefer a configurable timeout from toolset config with a sane default.

-            response = requests.get(url, headers=headers)
+            response = requests.get(
+                url,
+                headers=headers,
+                timeout=getattr(self._toolset._grafana_config, "request_timeout", 30),
+            )

103-109: Use logging.exception and avoid f-string logging.

Prefer exception logging and a concise public error.

-        except requests.RequestException as e:
-            logging.error(f"Error fetching dashboards: {str(e)}")
+        except requests.RequestException:
+            logging.exception("Error fetching dashboards")
             return StructuredToolResult(
                 status=StructuredToolResultStatus.ERROR,
-                error=f"Error fetching dashboards: {str(e)}",
+                error="Error fetching dashboards. See logs for details.",
                 url=url,
                 params=params,
             )

holmes/plugins/toolsets/coralogix/toolset_coralogix_logs.py (1)

50-55: Add missing type hint for get_example_config.

Mypy requires type hints.

-    def get_example_config(self):
+    def get_example_config(self) -> dict[str, Any]:
         example_config = CoralogixConfig(
             api_key="<cxuw_...>", team_hostname="my-team", domain="eu2.coralogix.com"
         )
         return example_config.model_dump()

holmes/core/tools_utils/tool_context_window_limiter.py (1)

7-28: Add typing, fix math/wording, and avoid overriding non-success results.

Add return type annotation and a brief docstring (mypy requirement).
Use int budget (floor) to avoid float comparisons and strings like “1024.0”.
Don’t mutate when the result is already non-success or has no data.
Clarify message and include a percent sign; report both allowed and actual as ints.

Apply:

-def prevent_overly_big_tool_response(tool_call_result: ToolCallResult, llm: LLM):
+def prevent_overly_big_tool_response(tool_call_result: ToolCallResult, llm: LLM) -> None:
+    """
+    Mutates tool_call_result to ERROR if its serialized size exceeds the configured
+    percentage of the model's context window.
+    """
     if (
         0 < TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT
         and TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT <= 100
     ):
+        # If already not SUCCESS or there's no data, nothing to limit.
+        if tool_call_result.result.status != StructuredToolResultStatus.SUCCESS:
+            return
+        if tool_call_result.result.data in (None, ""):
+            return
         message = tool_call_result.as_tool_call_message()
 
         messages_token = llm.count_tokens_for_message(messages=[message])
         context_window_size = llm.get_context_window_size()
-        max_tokens_allowed = (
-            context_window_size * TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT / 100
-        )
+        max_tokens_allowed = int(
+            context_window_size * TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT / 100
+        )
 
-        if messages_token > max_tokens_allowed:
-            relative_pct = (
-                (messages_token - max_tokens_allowed) / messages_token
-            ) * 100
-            error_message = f"The tool call result is too large to return: {messages_token} tokens.\nThe maximum allowed tokens is {max_tokens_allowed} which is {format(relative_pct, '.1f')} smaller.\nInstructions for the LLM: try to repeat the query but proactively narrow down the result so that the tool answer fits within the allowed number of tokens."
+        if messages_token > max_tokens_allowed:
+            relative_pct = ((messages_token - max_tokens_allowed) / messages_token) * 100
+            error_message = (
+                f"The tool call result is too large to return: {messages_token} tokens.\n"
+                f"The maximum allowed is {max_tokens_allowed} tokens "
+                f"({relative_pct:.1f}% smaller).\n"
+                "Instructions for the LLM: repeat the query but proactively narrow down the result "
+                "so the tool answer fits within the allowed number of tokens."
+            )
             tool_call_result.result.status = StructuredToolResultStatus.ERROR
             tool_call_result.result.data = None
             tool_call_result.result.error = error_message

holmes/plugins/toolsets/opensearch/opensearch.py (3)

59-73: Client selection bug: always returns the current client when any host matches.

The generator expression shadows the outer variable and searches across all clients. If any client matches, you return the current client, possibly the wrong one.

Fix:
-    for client in clients:
-        found = any(host in client.hosts for client in clients)
-        if found:
-            return client
+    for client in clients:
+        if host in client.hosts:
+            return client
59-66: Type hints and return type for get_client.

mypy guideline requires explicit return type.
-def get_client(clients: List[OpenSearchClient], host: Optional[str]):
+def get_client(clients: List[OpenSearchClient], host: Optional[str]) -> OpenSearchClient:
140-143: Missing type hint for params.

Add type hints for consistency with mypy settings.
-    def get_parameterized_one_liner(self, params) -> str:
+    def get_parameterized_one_liner(self, params: Dict[str, Any]) -> str:

🧹 Nitpick comments (53)

tests/integration/test_tool_execution_pipeline.py (1)

101-132: Optional: add a test for “summary not smaller” revert.

We now revert llm_summarize if output isn’t smaller. Consider a small test ensuring status stays SUCCESS and data unchanged when summary ≥ input.
holmes/core/tools.py (2)
264-268: Tiny observability nit: log why transformers are skipped.

Helps debugging when outputs aren’t transformed due to status/registration.
-        if (
-            not self._transformer_instances
-            or result.status != StructuredToolResultStatus.SUCCESS
-        ):
-            return result
+        if not self._transformer_instances:
+            logger.debug(f"No transformers configured for tool '{self.name}', skipping transformation")
+            return result
+        if result.status != StructuredToolResultStatus.SUCCESS:
+            logger.debug(f"Tool '{self.name}' status is {result.status}, skipping transformers")
+            return result
393-401: Treat whitespace-only output as NO_DATA (optional).

Current check only catches exact empty string; whitespace-only outputs likely indicate “no data”.
-    def _get_status(
-        self, return_code: int, raw_output: str
-    ) -> StructuredToolResultStatus:
+    def _get_status(
+        self, return_code: int, raw_output: str
+    ) -> StructuredToolResultStatus:
         if return_code != 0:
             return StructuredToolResultStatus.ERROR
-        if raw_output == "":
+        if raw_output.strip() == "":
             return StructuredToolResultStatus.NO_DATA
         return StructuredToolResultStatus.SUCCESS
tests/plugins/toolsets/datadog/rds/test_datadog_rds_live.py (1)
8-9: Docstring env var mismatch (DD_SITE_API_URL vs DD_SITE_URL)

Docstring mentions DD_SITE_API_URL but code reads DD_SITE_URL. Align to avoid confusion.

Apply this diff:
- - DD_SITE_API_URL: Datadog site API URL (e.g., https://api.datadoghq.com)
+ - DD_SITE_URL: Datadog site API URL (e.g., https://api.datadoghq.com)
holmes/plugins/toolsets/kafka.py (2)
197-205: Improve exception logging and string conversion

Use logging.exception for stack traces and explicit f-string conversion flags.

Apply this diff (pattern repeated for each block):
-            error_msg = f"Failed to list consumer groups: {str(e)}"
-            logging.error(error_msg)
+            error_msg = f"Failed to list consumer groups: {e!s}"
+            logging.exception(error_msg)
-            error_msg = f"Failed to describe consumer group {group_id}: {str(e)}"
-            logging.error(error_msg)
+            error_msg = f"Failed to describe consumer group {group_id}: {e!s}"
+            logging.exception(error_msg)
-            error_msg = f"Failed to list topics: {str(e)}"
-            logging.error(error_msg)
+            error_msg = f"Failed to list topics: {e!s}"
+            logging.exception(error_msg)
-            error_msg = f"Failed to describe topic {topic_name}: {str(e)}"
-            logging.error(error_msg, exc_info=True)
+            error_msg = f"Failed to describe topic {topic_name}: {e!s}"
+            logging.exception(error_msg)
Also applies to: 260-268, 308-316, 381-387

411-447: Ensure Consumer is always closed

Close the Consumer in a finally block to avoid leaks on exceptions.

Apply this diff:
-        consumer = Consumer(consumer_config)
+        consumer = Consumer(consumer_config)
         # Check topic metadata to know which partitions exist
         if topic_name not in topic_metadata.topics:
-            consumer.close()
+            consumer.close()
             return False
         # Create TopicPartition objects for all partitions of the topic
         topic_partitions = []
         for partition_id in topic_metadata.topics[topic_name].partitions:
             topic_partitions.append(TopicPartition(topic_name, partition_id))
         # Check committed offsets for this consumer group on these topic partitions
-        committed_offsets = consumer.committed(topic_partitions, timeout=10.0)
-        consumer.close()
+        try:
+            committed_offsets = consumer.committed(topic_partitions, timeout=10.0)
+        finally:
+            consumer.close()
holmes/plugins/toolsets/prometheus/prometheus.py (4)
599-601: ListAvailableMetrics: enum migration — OK; minor logging/style nits.

Statuses look correct. Prefer logging.warning over logging.warn and explicit exception string conversion per Ruff.
-            logging.warn("Timeout while fetching prometheus metrics", exc_info=True)
+            logging.warning("Timeout while fetching prometheus metrics", exc_info=True)
-            logging.warn("Failed to fetch prometheus metrics", exc_info=True)
+            logging.warning("Failed to fetch prometheus metrics", exc_info=True)
-            logging.warn("Failed to process prometheus metrics", exc_info=True)
+            logging.warning("Failed to process prometheus metrics", exc_info=True)

-                error=f"Network error while fetching metrics: {str(e)}",
+                error=f"Network error while fetching metrics: {e!s}",
-                error=f"Unexpected error: {str(e)}",
+                error=f"Unexpected error: {e!s}",
Also applies to: 616-618, 649-652, 657-660, 665-667, 671-674

707-709: ExecuteInstantQuery: enum migration — OK; style cleanups optional.

Status handling is consistent. Consider the same logging.warning and {e!s} changes as above for consistency and Ruff compliance.

Also applies to: 751-754, 767-770, 775-777, 782-785, 789-792

847-849: ExecuteRangeQuery: enum migration — OK; apply minor nits similarly.

Statuses are correct; recommend switching logging.warn→logging.warning and use {e!s} in error strings.

Also applies to: 914-917, 929-932, 935-938, 943-946, 950-953

560-566: Avoid bare except Exception where feasible.

Where practical, catch narrower exceptions (e.g., ValueError, JSONDecodeError) to reduce false positives and improve debuggability.

Also applies to: 668-674, 786-792, 947-953
holmes/plugins/toolsets/robusta/robusta.py (2)
136-136: Use f-string formatting for consistency.

The string concatenation can be replaced with an f-string for better readability and consistency with the rest of the codebase.
-            msg = f"There was an internal error while fetching recommendations for {params}. {str(e)}"
+            msg = f"There was an internal error while fetching recommendations for {params}. {e!s}"
196-196: Use f-string formatting for consistency.

Similar to the previous comment, use explicit conversion flag for better readability.
-            msg = f"There was an internal error while fetching changes for {params}. {str(e)}"
+            msg = f"There was an internal error while fetching changes for {params}. {e!s}"
holmes/plugins/toolsets/rabbitmq/toolset_rabbitmq.py (1)
119-119: Consider using more specific exception types.

While catching generic Exception can be acceptable for top-level error handlers, consider catching more specific exceptions where possible to avoid masking unexpected errors.
-        except Exception as e:
+        except (ValueError, TypeError, KeyError) as e:
If you need to catch all exceptions, consider logging the full traceback for better debugging:
         except Exception as e:
-            logging.info("Failed to process RabbitMQ cluster status", exc_info=True)
+            logging.error("Failed to process RabbitMQ cluster status", exc_info=True)
Also note that the logging level should probably be error instead of info for failure cases.
holmes/plugins/toolsets/git.py (3)
275-280: Consider catching more specific exceptions.

While catching generic Exception can be acceptable here as a fallback, consider catching specific exceptions where possible.
-        except Exception as e:
+        except (ValueError, TypeError, KeyError, AttributeError) as e:
Or if you need to catch all exceptions, add proper logging:
        except Exception as e:
            logging.error(f"Unexpected error reading file {filepath}", exc_info=True)
            return StructuredToolResult(
                status=StructuredToolResultStatus.ERROR,
                data=self.toolset._sanitize_error(str(e)),
                params=params,
            )
738-744: Consider more specific exception handling.

While catching generic Exception is acceptable here as a fallback, consider being more specific about the types of exceptions you expect.
-            except Exception as e:
+            except (ValueError, KeyError, AttributeError) as e:
Also, for the error message formatting:
-                    error=self.toolset._sanitize_error(
-                        f"Tool call failed to run: Error updating PR: {str(e)}"
-                    ),
+                    error=self.toolset._sanitize_error(
+                        f"Tool call failed to run: Error updating PR: {e!s}"
+                    ),
750-750: Use f-string conversion flags for consistency.

Replace str(e) with {e!s} for consistency with f-string best practices.
-                    f"Tool call failed to run: Network error: {str(e)}"
+                    f"Tool call failed to run: Network error: {e!s}"
-                    f"Tool call failed to run: Unexpected error: {str(e)}"
+                    f"Tool call failed to run: Unexpected error: {e!s}"
Also applies to: 757-757
holmes/plugins/toolsets/datadog/toolset_datadog_rds.py (2)
586-587: Don’t silently swallow exceptions in data collection loop.

Swallowing all exceptions makes diagnostics hard. Log at debug level and narrow the exception to known types; otherwise, add a BLE001 noqa with justification.
-            except Exception:
-                continue
+            except DataDogRequestError:
+                logging.debug("Datadog request failed for %s on %s; continuing", metric_name, instance_id)
+                continue
+            except Exception:
+                logging.exception("Unexpected error collecting metric %s for %s; continuing", metric_name, instance_id)
+                continue
442-444: Potentially large payload appended to report.

Dumping worst_performers JSON inline can balloon outputs. Consider truncating to top_n and eliding per-instance metrics, relying on the new context-window limiter.
-            report += f"\n\nInstances:\n{json.dumps(worst_performers, indent=2)}"
+            trimmed = worst_performers[: params.get("top_n", self.toolset.dd_config.default_top_instances)]
+            report += f"\n\nInstances (truncated):\n{json.dumps(trimmed, indent=2)}"
tests/plugins/toolsets/kubernetes/test_kubernetes_logs_no_data.py (1)
66-68: Stabilize “future” time-range in test.

Docstring says “Future time range” but dates are now past. Use far-future dates to keep intent stable.
-            start_time="2025-01-01T00:00:00Z",  # Future time range
-            end_time="2025-01-02T00:00:00Z",
+            start_time="2099-01-01T00:00:00Z",  # Future time range
+            end_time="2099-01-02T00:00:00Z",
tests/plugins/toolsets/datadog/traces/test_datadog_traces.py (1)
95-98: Setup assigns dd_config to a tuple.

prerequisites_callable returns (bool, str), not a config. This assignment is unused later but can confuse readers. Remove it or set a valid config/magic mock here.
-        self.toolset.dd_config = self.toolset.prerequisites_callable(self.config)
+        # dd_config is set per-test where needed
holmes/plugins/toolsets/coralogix/toolset_coralogix_logs.py (1)
95-103: Avoid duplicating error in both data and error.

When logs_data.error is set, returning the same string in data can confuse consumers. Prefer data=None on error.

Apply:
-        if logs_data.error:
-            data = logs_data.error
-        else:
+        if logs_data.error:
+            data = None
+        else:
             logs = stringify_flattened_logs(logs_data.logs)
             # Remove link and query from results once the UI and slackbot properly handle the URL from the StructuredToolResult
             data = f"link: {url}\nquery: {query_string}\n{logs}"
holmes/plugins/toolsets/grafana/common.py (1)
25-36: Add return type to build_headers.

Aligns with “type hints are required”.
-def build_headers(api_key: Optional[str], additional_headers: Optional[Dict[str, str]]):
+def build_headers(
+    api_key: Optional[str], additional_headers: Optional[Dict[str, str]]
+) -> Dict[str, str]:
holmes/plugins/toolsets/azure_sql/tools/get_top_log_io_queries.py (1)
139-151: Use logging.exception and explicit conversion; narrow exception when possible.
-        except Exception as e:
-            error_msg = f"Failed to get top log I/O queries: {str(e)}"
-            logging.error(error_msg)
-            return StructuredToolResult(
-                status=StructuredToolResultStatus.ERROR,
-                error=error_msg,
-                params=params,
-            )
+        except Exception as e:
+            logging.exception("Failed to get top log I/O queries", exc_info=True)
+            return StructuredToolResult(
+                status=StructuredToolResultStatus.ERROR,
+                error=f"Failed to get top log I/O queries: {e!s}",
+                params=params,
+            )
holmes/plugins/toolsets/opensearch/opensearch_traces.py (3)
21-22: Consolidate imports from holmes.core.tools.

Minor cleanup; keeps imports DRY.
-from holmes.core.tools import (
-    CallablePrerequisite,
-    Tool,
-    ToolParameter,
-    ToolsetTag,
-)
+from holmes.core.tools import (
+    CallablePrerequisite,
+    Tool,
+    ToolParameter,
+    ToolsetTag,
+    StructuredToolResult,
+    StructuredToolResultStatus,
+)
- from holmes.core.tools import StructuredToolResult, StructuredToolResultStatus
97-103: Ruff nits: prefer explicit conversion flags in f-strings.

Replace str(e) with {e!s}.
-                error=f"Network error while opensearch traces fields: {str(e)}",
+                error=f"Network error while opensearch traces fields: {e!s}",
@@
-                error=f"Unexpected error: {str(e)}",
+                error=f"Unexpected error: {e!s}",
@@
-                error=f"Unexpected error {err_msg}: {str(e)}",
+                error=f"Unexpected error {err_msg}: {e!s}",
Also applies to: 107-110, 185-188

104-111: Consider narrowing broad Exception catches.

If feasible, catch concrete OpenSearch exceptions and keep the generic catch as a final fallback.

Also applies to: 180-188
holmes/common/env_vars.py (1)
78-83: Validate and parse percent safely.

Guard against negatives/invalid values; keep style consistent with string defaults.
-# Limit each tool response to N% of the total context window.
-# Number between 0 and 100
-# Setting to either 0 or any number above 100 disables the logic that limits tool response size
-TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT = float(
-    os.environ.get("TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT", 0)
-)
+# Limit each tool response to N% of the total context window.
+# 0 or >100 disables the limiter.
+def _load_percent(name: str, default: float) -> float:
+    raw = os.environ.get(name, str(default))
+    try:
+        val = float(raw)
+    except ValueError:
+        return default
+    return max(0.0, val)
+
+TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT = _load_percent(
+    "TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT", 0.0
+)
holmes/plugins/toolsets/azure_sql/tools/get_top_cpu_queries.py (3)
124-134: Return NO_DATA when no queries are returned.

This aligns status semantics with other toolsets (e.g., Loki) and improves downstream handling.

Apply this diff:
             # Format the report
-            report_text = self._format_cpu_queries_report(
-                queries, db_config, top_count, hours_back
-            )
-
-            return StructuredToolResult(
-                status=StructuredToolResultStatus.SUCCESS,
-                data=report_text,
-                params=params,
-            )
+            if not queries:
+                return StructuredToolResult(
+                    status=StructuredToolResultStatus.NO_DATA,
+                    data="No queries found for the specified time period.",
+                    params=params,
+                )
+
+            report_text = self._format_cpu_queries_report(
+                queries, db_config, top_count, hours_back
+            )
+
+            return StructuredToolResult(
+                status=StructuredToolResultStatus.SUCCESS,
+                data=report_text,
+                params=params,
+            )
134-141: Improve exception logging and satisfy Ruff (TRY400/RUF010).

Use logging.exception for stack traces and the explicit conversion flag.

Apply this diff:
-        except Exception as e:
-            error_msg = f"Failed to get top CPU queries: {str(e)}"
-            logging.error(error_msg)
+        except Exception as e:
+            error_msg = f"Failed to get top CPU queries: {e!s}"
+            logging.exception(error_msg)
             return StructuredToolResult(
                 status=StructuredToolResultStatus.ERROR,
                 error=error_msg,
                 params=params,
             )
108-110: Input validation for params (bounds and typing).

If params originate from JSON, enforce ints and sane bounds to avoid surprises.

Outside this hunk, consider:
top_count = int(params.get("top_count", 15))
hours_back = int(params.get("hours_back", 2))
top_count = max(1, min(top_count, 100))
hours_back = max(1, hours_back)
holmes/plugins/toolsets/azure_sql/tools/get_slow_queries.py (3)
126-136: Return NO_DATA when no slow queries are found.

Improves downstream handling and keeps parity with other toolsets.

Apply this diff:
-            # Format the report
-            report_text = self._format_slow_queries_report(
-                queries, db_config, top_count, hours_back
-            )
-
-            return StructuredToolResult(
-                status=StructuredToolResultStatus.SUCCESS,
-                data=report_text,
-                params=params,
-            )
+            # If nothing to report, return NO_DATA
+            if not queries:
+                return StructuredToolResult(
+                    status=StructuredToolResultStatus.NO_DATA,
+                    data="No queries found for the specified time period.",
+                    params=params,
+                )
+
+            # Format the report
+            report_text = self._format_slow_queries_report(
+                queries, db_config, top_count, hours_back
+            )
+
+            return StructuredToolResult(
+                status=StructuredToolResultStatus.SUCCESS,
+                data=report_text,
+                params=params,
+            )
136-143: Use logging.exception and explicit conversion flag (TRY400/RUF010).

Match Ruff’s guidance and keep stack traces.

Apply this diff:
-        except Exception as e:
-            error_msg = f"Failed to get slow queries: {str(e)}"
-            logging.error(error_msg)
+        except Exception as e:
+            error_msg = f"Failed to get slow queries: {e!s}"
+            logging.exception(error_msg)
             return StructuredToolResult(
                 status=StructuredToolResultStatus.ERROR,
                 error=error_msg,
                 params=params,
             )
110-112: Optional: validate and clamp inputs.

Outside this hunk, consider:
top_count = int(params.get("top_count", 15))
hours_back = int(params.get("hours_back", 2))
top_count = max(1, min(top_count, 100))
hours_back = max(1, hours_back)
holmes/plugins/toolsets/azure_sql/tools/analyze_database_performance.py (2)
213-221: Improve exception logging (TRY400/RUF010).

Capture stack traces and use explicit conversion.

Apply this diff:
-        except Exception as e:
-            error_msg = f"Failed to generate performance report: {str(e)}"
-            logging.error(error_msg)
+        except Exception as e:
+            error_msg = f"Failed to generate performance report: {e!s}"
+            logging.exception(error_msg)
             return StructuredToolResult(
                 status=StructuredToolResultStatus.ERROR,
                 error=error_msg,
                 params=params,
             )
37-45: Consider narrowing broad exceptions in data gathering/validation.

If AzureSQLAPIClient exposes specific exceptions, prefer those to avoid masking real issues.

Also applies to: 48-57, 232-249
holmes/plugins/toolsets/bash/common/bash.py (1)
5-33: Prevent oversized responses at the source (optional).

Even with the new global limiter, pre-clipping can save memory/latency for very chatty commands.

For example, cap to N bytes/lines before building result_data:
         stdout = process.stdout.strip() if process.stdout else ""
-        result_data = f"{cmd}\n" f"{stdout}"
+        max_lines = 5000  # tune or read from env
+        lines = stdout.splitlines()
+        clipped = "\n".join(lines[-max_lines:])
+        result_data = f"{cmd}\n{clipped}"
holmes/plugins/toolsets/kubernetes_logs.py (1)

397-399: Track TODOs.

There are two TODOs; if you want, I can open follow-up issues or propose a focused PR to address them.

Also applies to: 451-462
holmes/plugins/toolsets/azure_sql/tools/analyze_database_health_status.py (1)
152-159: Consider improving exception handling practices

The static analysis hints correctly identify opportunities to improve error handling:

Use specific exception types instead of catching Exception broadly

Use logging.exception() which automatically includes traceback information

The str(e) conversion is explicit but could be more robust

Apply this diff to improve error handling:
        except Exception as e:
            error_msg = f"Failed to generate health report: {str(e)}"
-           logging.error(error_msg)
+           logging.exception("Failed to generate health report", exc_info=True)
            return StructuredToolResult(
                status=StructuredToolResultStatus.ERROR,
-               error=error_msg,
+               error=f"Failed to generate health report: {str(e)}",
                params=params,
            )
holmes/plugins/toolsets/azure_sql/tools/get_active_alerts.py (1)
186-193: Consider improving exception handling practices

Similar to the previous file, the static analysis hints correctly identify opportunities to improve error handling practices.

Apply this diff to improve error handling:
        except Exception as e:
            error_msg = f"Failed to retrieve active alerts: {str(e)}"
-           logging.error(error_msg)
+           logging.exception("Failed to retrieve active alerts", exc_info=True)
            return StructuredToolResult(
                status=StructuredToolResultStatus.ERROR,
-               error=error_msg,
+               error=f"Failed to retrieve active alerts: {str(e)}",
                params=params,
            )
holmes/plugins/toolsets/runbook/runbook_fetcher.py (1)
103-110: Consider improving exception handling practices

The static analysis hints correctly identify the same opportunities for improvement as in the other files:

Catching broad Exception instead of specific types

Using logging.error instead of logging.exception

Apply this diff to improve error handling:
        except Exception as e:
            err_msg = f"Failed to read runbook {runbook_path}: {str(e)}"
-           logging.error(err_msg)
+           logging.exception(f"Failed to read runbook {runbook_path}", exc_info=True)
            return StructuredToolResult(
                status=StructuredToolResultStatus.ERROR,
-               error=err_msg,
+               error=f"Failed to read runbook {runbook_path}: {str(e)}",
                params=params,
            )
holmes/plugins/toolsets/azure_sql/tools/analyze_database_connections.py (1)
211-218: Consider improving exception handling practices

The static analysis hints identify the same improvement opportunities as in other Azure SQL tool files.

Apply this diff to improve error handling:
        except Exception as e:
            error_msg = f"Failed to generate connection report: {str(e)}"
-           logging.error(error_msg)
+           logging.exception("Failed to generate connection report", exc_info=True)
            return StructuredToolResult(
                status=StructuredToolResultStatus.ERROR,
-               error=error_msg,
+               error=f"Failed to generate connection report: {str(e)}",
                params=params,
            )
holmes/plugins/toolsets/azure_sql/tools/get_top_data_io_queries.py (1)
152-159: Consider improving exception handling practices

The static analysis hints identify the same improvement opportunities as in the other Azure SQL tools.

Apply this diff to improve error handling:
        except Exception as e:
            error_msg = f"Failed to get top data I/O queries: {str(e)}"
-           logging.error(error_msg)
+           logging.exception("Failed to get top data I/O queries", exc_info=True)
            return StructuredToolResult(
                status=StructuredToolResultStatus.ERROR,
-               error=error_msg,
+               error=f"Failed to get top data I/O queries: {str(e)}",
                params=params,
            )
holmes/plugins/toolsets/opensearch/opensearch.py (1)
195-197: Prefer consistent type hints across tools.

Align with others using Dict[str, Any].
-    def get_parameterized_one_liner(self, params: Dict) -> str:
+    def get_parameterized_one_liner(self, params: Dict[str, Any]) -> str:
holmes/core/safeguards.py (1)
41-46: Enum comparison against serialized results.

Comparing dict["status"] to StructuredToolResultStatus.NO_DATA relies on StrEnum semantics. Consider normalizing to string for robustness.
-                and result.get("status") == StructuredToolResultStatus.NO_DATA
+                and str(result.get("status")) == StructuredToolResultStatus.NO_DATA
tests/test_structured_toolcall_result.py (1)
164-164: Consider using f-string formatting.

The Ruff hint suggests using an f-string instead of str() for better readability.

Apply this diff to use f-string formatting:
-    expected = f"Tool execution failed:\n\n{str(obj)}"
+    expected = f"Tool execution failed:\n\n{obj!s}"
holmes/plugins/toolsets/opensearch/opensearch_logs.py (2)
152-152: Consider using f-string formatting.

Ruff suggests using an f-string instead of str() for better readability.

Apply this diff to use f-string formatting:
-                error=f"Network error while fetching OpenSearch logs: {str(e)}",
+                error=f"Network error while fetching OpenSearch logs: {e!s}",
155-155: Broad exception handling detected.

While catching Exception is sometimes necessary for final fallback error handling, consider being more specific about the types of exceptions you expect here. At a minimum, ensure proper logging is in place (which it is).
holmes/plugins/toolsets/azure_sql/tools/analyze_database_storage.py (1)
318-320: Improve exception handling with logging.exception.

The code catches a broad Exception and uses logging.error with manual string formatting. Consider using logging.exception which automatically includes the exception traceback.

Apply this diff to improve exception handling:
         except Exception as e:
-            error_msg = f"Failed to generate storage report: {str(e)}"
-            logging.error(error_msg)
+            error_msg = f"Failed to generate storage report: {e!s}"
+            logging.exception(error_msg)
             return StructuredToolResult(
holmes/core/tools_utils/data_types.py (2)
19-33: Consider more specific exception handling.

The format_tool_result_data function catches a broad Exception when JSON serialization fails. While the fallback to str() is reasonable, consider catching more specific exceptions like TypeError or ValueError that are typically raised during JSON serialization.

Apply this diff for more specific exception handling:
         try:
             if isinstance(tool_result.data, BaseModel):
                 tool_response = tool_result.data.model_dump_json(indent=2)
             else:
                 tool_response = json.dumps(tool_result.data, indent=2)
-        except Exception:
+        except (TypeError, ValueError, AttributeError) as e:
+            # Fall back to string representation for non-serializable objects
             tool_response = str(tool_result.data)
20-22: Remove redundant assignment.

Line 20 assigns tool_response = tool_result.data, but this is immediately overwritten in the subsequent conditions. Consider removing this redundant assignment or restructuring the logic.

Apply this diff to remove redundancy:
 def format_tool_result_data(tool_result: StructuredToolResult) -> str:
-    tool_response = tool_result.data
     if isinstance(tool_result.data, str):
         tool_response = tool_result.data
     else:
holmes/core/tool_calling_llm.py (3)
68-87: Consider adding type hints for better clarity.

The function would benefit from more explicit type hints for the full_response parameter.
-def _extract_cost_from_response(full_response) -> float:
+def _extract_cost_from_response(full_response: Any) -> float:
89-128: Improve error handling for usage extraction.

While the function safely handles missing data, consider being more specific about which attributes might be missing to aid debugging.
 try:
     cost = _extract_cost_from_response(full_response)
-    usage = getattr(full_response, "usage", {})
+    usage = getattr(full_response, "usage", None) or {}
354-356: Consider centralizing cost accumulation logic.

The cost processing logic appears in multiple places. Consider extracting this into a method to reduce duplication.

You could create a helper method within the class to handle cost accumulation:
def _accumulate_cost(self, full_response, costs: LLMCosts, log_prefix: str = "LLM call"):
    """Helper to accumulate costs from response."""
    _process_cost_info(full_response, costs, log_prefix)
Then use it consistently:
-_process_cost_info(full_response, costs, "LLM call")
+self._accumulate_cost(full_response, costs, "LLM call")

holmes/plugins/toolsets/datadog/toolset_datadog_rds.py

holmes/plugins/toolsets/grafana/common.py

tests/core/transformers/test_llm_summarize.py

tests/llm/fixtures/test_ask_holmes/115_checkout_errors_tracing/test_case.yaml

holmes/common/env_vars.py

holmes/core/tools_utils/tool_context_window_limiter.py

holmes/common/env_vars.py

holmes/core/tool_calling_llm.py

…e_too_big

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

holmes/core/tool_calling_llm.py (6)

68-87: Avoid relying on private _hidden_params for cost extraction.

This may break across SDKs/wrappers and is brittle. Prefer a public attribute if present, then fall back to _hidden_params.

Apply:

-def _extract_cost_from_response(full_response) -> float:
+def _extract_cost_from_response(full_response: Any) -> float:
@@
-    try:
-        cost_value = (
-            full_response._hidden_params.get("response_cost", 0)
-            if hasattr(full_response, "_hidden_params")
-            else 0
-        )
-        # Ensure cost is a float
-        return float(cost_value) if cost_value is not None else 0.0
+    try:
+        # Prefer public attrs first, then private fallback
+        if hasattr(full_response, "response_cost"):
+            cost_value = getattr(full_response, "response_cost", 0)
+        elif hasattr(full_response, "_hidden_params") and isinstance(
+            getattr(full_response, "_hidden_params"), dict
+        ):
+            cost_value = full_response._hidden_params.get("response_cost", 0)
+        else:
+            cost_value = 0
+        return float(cost_value) if cost_value is not None else 0.0

103-113: Handle non-dict usage objects safely.

OpenAI/litellm often return a model/object, not a dict. Using .get() can raise and lose usage logging.

Apply:

-        usage = getattr(full_response, "usage", {})
-
-        if usage:
-            if LOG_LLM_USAGE_RESPONSE:  # shows stats on token cache usage
-                logging.info(f"LLM usage response:\n{usage}\n")
-            prompt_toks = usage.get("prompt_tokens", 0)
-            completion_toks = usage.get("completion_tokens", 0)
-            total_toks = usage.get("total_tokens", 0)
+        usage_obj = getattr(full_response, "usage", None)
+        usage: dict[str, Any] = {}
+        if usage_obj is not None:
+            if hasattr(usage_obj, "model_dump"):
+                usage = usage_obj.model_dump()
+            elif isinstance(usage_obj, dict):
+                usage = usage_obj
+            else:
+                try:
+                    usage = json.loads(json.dumps(usage_obj))
+                except Exception:
+                    usage = {}
+        if usage:
+            if LOG_LLM_USAGE_RESPONSE:
+                logging.info(f"LLM usage response:\n{usage}\n")
+            prompt_toks = int(usage.get("prompt_tokens", 0) or 0)
+            completion_toks = int(usage.get("completion_tokens", 0) or 0)
+            total_toks = int(usage.get("total_tokens", 0) or 0)

514-517: Return a semantic code for “tool not found”.

Optional: set a 404-like return_code to help clients distinguish failures.

             return StructuredToolResult(
-                status=StructuredToolResultStatus.ERROR,
+                status=StructuredToolResultStatus.ERROR,
                 error=f"Failed to find tool {tool_name}",
                 params=tool_params,
+                return_code=404,
             )

529-531: Include a failure code on exceptions.

A 500-like code helps downstream handling.

             tool_response = StructuredToolResult(
-                status=StructuredToolResultStatus.ERROR,
+                status=StructuredToolResultStatus.ERROR,
                 error=f"Tool call failed: {e}",
                 params=tool_params,
+                return_code=500,
             )

570-573: Treat non-Structured tool returns as ERROR, not SUCCESS.

Marking SUCCESS may mask tool contract violations. Keep the data, but signal the issue.

-            tool_response = StructuredToolResult(
-                status=StructuredToolResultStatus.SUCCESS,
-                data=tool_response,
-                params=tool_params,
-            )
+            tool_response = StructuredToolResult(
+                status=StructuredToolResultStatus.ERROR,
+                error="Tool returned a non-StructuredToolResult payload; wrapping for safety.",
+                data=tool_response,
+                params=tool_params,
+                return_code=500,
+            )

662-663: Add an explicit error string when approvals aren’t supported.

Right now only the status flips to ERROR; include a reason for clients.

-            tool_call_result.result.status = StructuredToolResultStatus.ERROR
+            tool_call_result.result.status = StructuredToolResultStatus.ERROR
+            tool_call_result.result.error = (
+                "Approval required but no approval callback is configured on the server."
+            )
             return tool_call_result

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb54aee and 51d33ba.

📒 Files selected for processing (3)

holmes/common/env_vars.py (1 hunks)
holmes/core/tool_calling_llm.py (9 hunks)
holmes/core/tools_utils/tool_context_window_limiter.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

holmes/core/tools_utils/tool_context_window_limiter.py
holmes/common/env_vars.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods

Files:

holmes/core/tool_calling_llm.py

🧬 Code graph analysis (1)

holmes/core/tool_calling_llm.py (3)

holmes/core/tools.py (2)

StructuredToolResult (78-102)

StructuredToolResultStatus (51-75)

holmes/core/tools_utils/tool_context_window_limiter.py (1)

prevent_overly_big_tool_response (7-28)

holmes/core/tools_utils/data_types.py (3)

TruncationResult (14-16)

ToolCallResult (36-79)

TruncationMetadata (8-11)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Pre-commit checks
GitHub Check: llm_evals

🔇 Additional comments (6)

holmes/core/tool_calling_llm.py (6)

35-38: Limiter wiring looks good.

Import and use of StructuredToolResult/Status and the context-window limiter are correctly integrated.

46-50: Data types import migration LGTM.

Imports align with the new tools_utils module split.

476-477: Status check refactor LGTM.

Switch to StructuredToolResultStatus.APPROVAL_REQUIRED is correct.

621-623: Unsupported custom tool path: good error mapping.

Returning ERROR with a clear message is appropriate.

637-640: Limiter invocation placement LGTM.

Running prevent_overly_big_tool_response immediately after building ToolCallResult ensures oversized outputs don’t get appended to messages.

683-685: Denial path LGTM.

Clear error message on user denial.

github-actions · 2025-09-11T13:49:51Z

Results of HolmesGPT evals

ask_holmes: 28/37 test cases were successful, 3 regressions, 2 skipped, 3 setup failures

Test suite	Test case	Status
ask	01_how_many_pods	✅
ask	02_what_is_wrong_with_pod	✅
ask	04_related_k8s_events	↪️
ask	05_image_version	✅
ask	09_crashpod	✅
ask	10_image_pull_backoff	✅
ask	110_k8s_events_image_pull	✅
ask	11_init_containers	✅
ask	13a_pending_node_selector_basic	✅
ask	14_pending_resources	✅
ask	15_failed_readiness_probe	✅
ask	17_oom_kill	✅
ask	18_crash_looping_v2	🚧
ask	19_detect_missing_app_details	✅
ask	20_long_log_file_search	✅
ask	24_misconfigured_pvc	✅
ask	24a_misconfigured_pvc_basic	✅
ask	28_permissions_error	🚧
ask	29_events_from_alert_manager	↪️
ask	39_failed_toolset	✅
ask	41_setup_argo	✅
ask	42_dns_issues_steps_new_tools	⚠️
ask	43_current_datetime_from_prompt	✅
ask	45_fetch_deployment_logs_simple	✅
ask	51_logs_summarize_errors	✅
ask	53_logs_find_term	✅
ask	54_not_truncated_when_getting_pods	✅
ask	59_label_based_counting	✅
ask	60_count_less_than	🚧
ask	61_exact_match_counting	✅
ask	63_fetch_error_logs_no_errors	✅
ask	79_configmap_mount_issue	✅
ask	83_secret_not_found	✅
ask	86_configmap_like_but_secret	✅
ask	93_calling_datadog[0]	❌
ask	93_calling_datadog[1]	❌
ask	93_calling_datadog[2]	❌

Legend

✅ the test was successful
↪️ the test was skipped
⚠️ the test failed but is known to be flaky or known to fail
🚧 the test had a setup failure (not a code regression)
🔧 the test failed due to mock data issues (not a code regression)
❌ the test failed and should be fixed before merging the PR

feat: prevent tool calls responses that are too big

788497f

nherment changed the title ~~feat: prevent tool calls responses that are too big~~ ROB-2116: prevent tool calls responses that are too big Sep 10, 2025

Merge branch 'master' into ROB-2116_prevent_tool_call_results_that_ar…

3bff6de

…e_too_big

nherment enabled auto-merge (squash) September 10, 2025 14:06

nherment requested a review from moshemorad September 10, 2025 14:06

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

nherment added 2 commits September 11, 2025 07:08

fix: improve message for tool that violates allowed token size

bd67fdc

chore: address PR comments

bb54aee

moshemorad requested changes Sep 11, 2025

View reviewed changes

nherment and others added 3 commits September 11, 2025 12:58

Merge branch 'master' into ROB-2116_prevent_tool_call_results_that_ar…

0c6ff75

…e_too_big

Merge branch 'master' into ROB-2116_prevent_tool_call_results_that_ar…

500cdee

…e_too_big

chore: address PR comments

51d33ba

nherment requested a review from moshemorad September 11, 2025 11:35

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

moshemorad previously approved these changes Sep 11, 2025

View reviewed changes

fix: tests

b51963b

nherment dismissed moshemorad’s stale review via b51963b September 11, 2025 13:36

moshemorad approved these changes Sep 11, 2025

View reviewed changes

nherment merged commit 7d3ec48 into master Sep 11, 2025
7 checks passed

nherment deleted the ROB-2116_prevent_tool_call_results_that_are_too_big branch September 11, 2025 13:55

coderabbitai bot mentioned this pull request Sep 16, 2025

ROB-1933 tool approval workflow #919

Merged

coderabbitai bot mentioned this pull request Sep 23, 2025

ROB-2136 link tool limit to global truncation limit #987

Merged

coderabbitai bot mentioned this pull request Oct 15, 2025

DRAFT: toolsets for mongo, redis, Kfuse tempo trace #355

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ROB-2116: prevent tool calls responses that are too big #956

ROB-2116: prevent tool calls responses that are too big #956

Uh oh!

nherment commented Sep 10, 2025

Uh oh!

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

ROB-2116: prevent tool calls responses that are too big #956

ROB-2116: prevent tool calls responses that are too big #956

Uh oh!

Conversation

nherment commented Sep 10, 2025

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Pre-merge checks (2 passed, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 11, 2025

Results of HolmesGPT evals

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Sep 10, 2025 •

edited

Loading