Skip to content

Conversation

@nherment
Copy link
Collaborator

No description provided.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
holmes/interactive.py (1)

588-633: Docstring still references removed session param; also treat Ctrl+C as “deny”.
Clean up the stale docstring and handle KeyboardInterrupt to avoid crashing the approval flow.

 def handle_tool_approval(
     command: Optional[str],
     error_message: Optional[str],
     style: Style,
     console: Console,
 ) -> tuple[bool, Optional[str]]:
     """
     Handle user approval for potentially sensitive commands.

     Args:
         command: The command that needs approval
         error_message: The error message explaining why approval is needed
-        session: PromptSession for user input
         style: Style for prompts
         console: Rich console for output

     Returns:
         Tuple of (approved: bool, feedback: Optional[str])
         - approved: True if user approves, False if denied
         - feedback: User's optional feedback message when denying
     """
@@
-    approval_prompt = temp_session.prompt(
-        [("class:prompt", "Do you want to approve and execute this command? (y/N): ")],
-        style=style,
-    )
-
-    if approval_prompt.lower().startswith("y"):
-        return True, None
-    else:
-        # Ask for optional feedback when denying
-        feedback_prompt = temp_session.prompt(
-            [("class:prompt", "Optional feedback for the AI (press Enter to skip): ")],
-            style=style,
-        )
-
-        feedback = feedback_prompt.strip() if feedback_prompt.strip() else None
-        return False, feedback
+    try:
+        approval_prompt = temp_session.prompt(
+            [("class:prompt", "Do you want to approve and execute this command? (y/N): ")],
+            style=style,
+        )
+        if approval_prompt.lower().startswith("y"):
+            return True, None
+        # Ask for optional feedback when denying
+        feedback_prompt = temp_session.prompt(
+            [("class:prompt", "Optional feedback for the AI (press Enter to skip): ")],
+            style=style,
+        )
+        feedback = feedback_prompt.strip() if feedback_prompt.strip() else None
+        return False, feedback
+    except KeyboardInterrupt:
+        console.print(f"[bold {STATUS_COLOR}]Approval cancelled by user.[/bold {STATUS_COLOR}]")
+        return False, None
holmes/core/tool_calling_llm.py (3)

587-595: Fail fast on invalid JSON tool args instead of proceeding with {}.
Continuing with empty params risks wrong tool behavior.

-        tool_params = {}
-        try:
-            tool_params = json.loads(tool_arguments)
-        except Exception:
-            logging.warning(
-                f"Failed to parse arguments for tool: {tool_name}. args: {tool_arguments}"
-            )
+        try:
+            tool_params = json.loads(tool_arguments)
+        except Exception as e:
+            logging.warning(
+                f"Failed to parse arguments for tool: {tool_name}. args: {tool_arguments}"
+            )
+            return ToolCallResult(
+                tool_call_id=tool_to_call.id,
+                tool_name=tool_name,
+                description="NA",
+                result=StructuredToolResult(
+                    status=ToolResultStatus.ERROR,
+                    error=f"Invalid tool arguments (expected JSON). Raw: {tool_arguments}",
+                    params=None,
+                ),
+            )

631-676: Harden approval flow: set error when no handler, catch callback exceptions, avoid re-approval loops, and drop DummySpan() arg.
Prevents silent failures and infinite approval cycles.

         if not self.approval_callback:
-            tool_call_result.result.status = ToolResultStatus.ERROR
+            tool_call_result.result.status = ToolResultStatus.ERROR
+            tool_call_result.result.error = (
+                "Approval required but no approval handler is configured."
+            )
             return tool_call_result

         # Get approval from user
-        approved, feedback = self.approval_callback(tool_call_result.result)
+        try:
+            approved, feedback = self.approval_callback(tool_call_result.result)
+        except Exception as e:
+            tool_call_result.result.status = ToolResultStatus.ERROR
+            tool_call_result.result.error = f"Approval flow failed: {e}"
+            return tool_call_result

         if approved:
             logging.debug(
                 f"User approved command: {tool_call_result.result.invocation}"
             )

             new_response = self._directly_invoke_tool(
                 tool_name=tool_call_result.tool_name,
                 tool_params=tool_call_result.result.params or {},
                 user_approved=True,
-                trace_span=DummySpan(),
+                trace_span=None,
                 tool_number=None,  # Could be extracted if needed
             )
             tool_call_result.result = new_response
+            if tool_call_result.result.status == ToolResultStatus.APPROVAL_REQUIRED:
+                tool_call_result.result.status = ToolResultStatus.ERROR
+                tool_call_result.result.error = (
+                    "Tool requested approval again after user approval; aborting to avoid loops."
+                )
         else:

502-511: Fix B008: avoid calling DummySpan() in default args; make trace_span optional.
Prevents side effects at import time and satisfies linters.

 def _directly_invoke_tool(
     self,
     tool_name: str,
     tool_params: dict,
     user_approved: bool,
-    trace_span=DummySpan(),
+    trace_span=None,
     tool_number: Optional[int] = None,
 ) -> StructuredToolResult:
-    tool_span = trace_span.start_span(name=tool_name, type="tool")
+    trace_span = trace_span or DummySpan()
+    tool_span = trace_span.start_span(name=tool_name, type="tool")

Apply the same pattern to other functions in this file that use trace_span=DummySpan() defaults.

🧹 Nitpick comments (5)
holmes/core/tool_calling_llm.py (5)

483-499: Serialize UI prompts: handle approvals only after all futures complete (avoid prompt while other tools still run).
Prevents interleaving background logs with approval prompts.

-                for future in concurrent.futures.as_completed(futures):
-                    tool_call_result: ToolCallResult = future.result()
-
-                    tool_call_result = self.handle_tool_call_approval(tool_call_result)
-
-                    tool_calls.append(tool_call_result.as_tool_result_response())
-                    messages.append(tool_call_result.as_tool_call_message())
-
-                    perf_timing.measure(f"tool completed {tool_call_result.tool_name}")
+                results: list[ToolCallResult] = []
+                for future in concurrent.futures.as_completed(futures):
+                    results.append(future.result())
+
+                # Process approvals synchronously after all tool threads finished
+                for tool_call_result in results:
+                    tool_call_result = self.handle_tool_call_approval(tool_call_result)
+                    tool_calls.append(tool_call_result.as_tool_result_response())
+                    messages.append(tool_call_result.as_tool_call_message())
+                    perf_timing.measure(f"tool completed {tool_call_result.tool_name}")

Note: Keep streaming path as-is to preserve responsiveness, or gate prompts with a lock (see next comment).


873-874: Guard streaming approvals with a lock to avoid overlapping prompts in re-entrant contexts.
Optional hardening; safe even if today calls are serialized.

-                    tool_call_result = self.handle_tool_call_approval(tool_call_result)
+                    tool_call_result = self.handle_tool_call_approval(tool_call_result)

Add outside this hunk (constructor and import):

+import threading
@@
         self.investigation_id = str(uuid.uuid4())
+        self._approval_lock = threading.Lock()

Then wrap both approval sites:

with self._approval_lock:
    tool_call_result = self.handle_tool_call_approval(tool_call_result)

660-667: Align with the new _directly_invoke_tool signature.
If you apply the trace_span=None default, pass None here to avoid new DummySpan() instantiations.

-                trace_span=DummySpan(),
+                trace_span=None,

858-863: Minor: avoid constructing DummySpan() at call site if callee has a safe default.
Not critical, but reduces needless instantiations.

-                            trace_span=DummySpan(),  # Streaming mode doesn't support tracing yet
+                            trace_span=None,  # Streaming mode doesn't support tracing yet

647-651: Populate error when approval handler is missing.
Currently status is set to ERROR without context.

-        if not self.approval_callback:
-            tool_call_result.result.status = ToolResultStatus.ERROR
+        if not self.approval_callback:
+            tool_call_result.result.status = ToolResultStatus.ERROR
+            tool_call_result.result.error = (
+                "Approval required but no approval handler is configured."
+            )
             return tool_call_result
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1b2acde and 04650fd.

📒 Files selected for processing (2)
  • holmes/core/tool_calling_llm.py (8 hunks)
  • holmes/interactive.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
Type hints are required (mypy is configured in pyproject.toml)

Files:

  • holmes/interactive.py
  • holmes/core/tool_calling_llm.py
🧬 Code graph analysis (2)
holmes/interactive.py (2)
holmes/core/tool_calling_llm.py (2)
  • ToolCallingLLM (276-890)
  • ToolCallResult (215-258)
holmes/core/tools.py (1)
  • StructuredToolResult (50-74)
holmes/core/tool_calling_llm.py (4)
holmes/core/tools.py (5)
  • StructuredToolResult (50-74)
  • ToolResultStatus (23-47)
  • invoke (155-179)
  • get_parameterized_one_liner (192-193)
  • get_parameterized_one_liner (219-226)
holmes/core/tracing.py (4)
  • DummySpan (101-117)
  • start_span (104-105)
  • log (107-108)
  • end (110-111)
holmes/core/tools_utils/tool_executor.py (2)
  • get_tool_by_name (60-64)
  • invoke (49-58)
holmes/core/safeguards.py (1)
  • prevent_overly_repeated_tool_call (78-120)
🪛 Ruff (0.12.2)
holmes/core/tool_calling_llm.py

507-507: Do not perform function call DummySpan in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
🔇 Additional comments (7)
holmes/interactive.py (2)

30-35: Imports for approval flow are correctly placed at module top.
No issues.


862-875: Approval callback wiring looks good and respects the env toggle.
Matches callback type and isolates prompts via InMemoryHistory.

holmes/core/tool_calling_llm.py (5)

6-6: Importing Callable for the callback type is correct.


287-289: Public approval_callback attribute: type and placement are appropriate.


612-621: Type normalization to StructuredToolResult is good defensive coding.


597-606: Pass through tool_number when re-invoking (if available) for consistent logging.
If you have the index handy, forward it to keep numbering consistent.

Would you like me to wire the original tool_number here (e.g., by carrying it in ToolCallResult), or is it intentionally omitted?


631-676: Verified no missing params on APPROVAL_REQUIRED
The only tool emitting APPROVAL_REQUIRED is the BashTool’s _invoke in bash_toolset.py, and it explicitly includes params=params in its StructuredToolResult. No updates needed.

Copy link
Contributor

@moshemorad moshemorad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, few comments.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
holmes/plugins/toolsets/investigator/core_investigation.py (2)

81-83: Validate 'todos' is an array; fail fast with a clear error.

Prevents iterating non-lists and silently producing 0 tasks.

-            todos_data = params.get("todos", [])
+            todos_data = params.get("todos", [])
+            if not isinstance(todos_data, list):
+                return StructuredToolResult(
+                    status=ToolResultStatus.ERROR,
+                    error="Invalid 'todos': expected an array of task objects",
+                    params=params,
+                )

21-35: Align schema with runtime behavior: make task.id optional (auto-generate when missing).

Schema says id required, but code generates one if absent.

-        "todos": ToolParameter(
-            description="COMPLETE list of ALL tasks on the task list. Each task should have: id (string), content (string), status (pending/in_progress/completed)",
+        "todos": ToolParameter(
+            description="COMPLETE list of ALL tasks on the task list. Each task can include: id (string; optional, auto-generated if omitted), content (string), status (pending/in_progress/completed)",
             type="array",
             required=True,
             items=ToolParameter(
                 type="object",
                 properties={
-                    "id": ToolParameter(type="string", required=True),
+                    "id": ToolParameter(type="string", required=False),
                     "content": ToolParameter(type="string", required=True),
-                    "status": ToolParameter(type="string", required=True),
+                    "status": ToolParameter(type="string", required=True),
                 },
             ),
         ),
holmes/core/tool_calling_llm.py (1)

572-579: Apply the same B008 fix to _invoke_llm_tool_call.

Consistent Optional[DummySpan] default here too.

-    def _invoke_llm_tool_call(
+    def _invoke_llm_tool_call(
         self,
         tool_to_call: ChatCompletionMessageToolCall,
         previous_tool_calls: list[dict],
-        trace_span=DummySpan(),
+        trace_span: Optional[DummySpan] = None,
         tool_number=None,
     ) -> ToolCallResult:
♻️ Duplicate comments (3)
holmes/core/tool_calling_llm.py (3)

516-523: Fix B008: avoid calling DummySpan() in default args.

Use Optional and instantiate inside to satisfy linters and avoid import‑time side effects.

-    def _directly_invoke_tool(
+    def _directly_invoke_tool(
         self,
         tool_name: str,
         tool_params: dict,
         user_approved: bool,
-        trace_span=DummySpan(),
+        trace_span: Optional[DummySpan] = None,
         tool_number: Optional[int] = None,
     ) -> StructuredToolResult:
-        tool_span = trace_span.start_span(name=tool_name, type="tool")
+        trace_span = trace_span or DummySpan()
+        tool_span = trace_span.start_span(name=tool_name, type="tool")

601-609: Fail fast on invalid JSON tool args (don’t proceed with {}).

Returning an explicit error avoids running tools with empty/incorrect params and satisfies BLE001 by catching the specific decode error.

-        tool_params = {}
-        try:
-            tool_params = json.loads(tool_arguments)
-        except Exception:
-            logging.warning(
-                f"Failed to parse arguments for tool: {tool_name}. args: {tool_arguments}"
-            )
+        # Parse tool arguments strictly
+        tool_call_id = tool_to_call.id
+        try:
+            tool_params = json.loads(tool_arguments) if tool_arguments else {}
+        except json.JSONDecodeError as e:
+            logging.warning(
+                f"Failed to parse JSON arguments for tool: {tool_name}. args: {tool_arguments}",  # noqa: E501
+                exc_info=True,
+            )
+            return ToolCallResult(
+                tool_call_id=tool_call_id,
+                tool_name=tool_name,
+                description="Invalid tool arguments",
+                result=StructuredToolResult(
+                    status=ToolResultStatus.ERROR,
+                    error=f"Invalid tool arguments (expected JSON): {e}",
+                    params=None,
+                ),
+            )
-
-        tool_call_id = tool_to_call.id

645-691: Harden the approval flow: set error messages, catch callback exceptions, and prevent approval loops.

Currently, missing callback doesn’t set an error, callback exceptions can bubble, and a second APPROVAL_REQUIRED can loop.

     if tool_call_result.result.status != ToolResultStatus.APPROVAL_REQUIRED:
         return tool_call_result

     # If no approval callback, convert to ERROR because it is assumed the client may not be able to handle approvals
     if not self.approval_callback:
         tool_call_result.result.status = ToolResultStatus.ERROR
+        tool_call_result.result.error = (
+            "Approval required but no approval handler is configured."
+        )
         return tool_call_result

     # Get approval from user
-    approved, feedback = self.approval_callback(tool_call_result.result)
+    try:
+        approved, feedback = self.approval_callback(tool_call_result.result)
+    except Exception as e:
+        tool_call_result.result.status = ToolResultStatus.ERROR
+        tool_call_result.result.error = f"Approval flow failed: {e}"
+        return tool_call_result

     if approved:
         logging.debug(
             f"User approved command: {tool_call_result.result.invocation}"
         )

         new_response = self._directly_invoke_tool(
             tool_name=tool_call_result.tool_name,
             tool_params=tool_call_result.result.params or {},
             user_approved=True,
-            trace_span=DummySpan(),
+            trace_span=None,
             tool_number=tool_number,
         )
         tool_call_result.result = new_response
+        if tool_call_result.result.status == ToolResultStatus.APPROVAL_REQUIRED:
+            tool_call_result.result.status = ToolResultStatus.ERROR
+            tool_call_result.result.error = (
+                "Tool requested approval again after user approval; aborting to avoid loops."
+            )
     else:
         # User denied - update to error
         feedback_text = f" User feedback: {feedback}" if feedback else ""
         tool_call_result.result.status = ToolResultStatus.ERROR
         tool_call_result.result.error = (
             f"User denied command execution.{feedback_text}"
         )
🧹 Nitpick comments (5)
holmes/plugins/toolsets/investigator/core_investigation.py (4)

21-35: Constrain status to known values (schema-level).

If ToolParameter supports enum, add allowed values to help validation and UX.

-                    "status": ToolParameter(type="string", required=True),
+                    "status": ToolParameter(type="string", required=True, enum=["pending", "in_progress", "completed"]),

If enum isn't supported by ToolParameter, ignore this change.


3-3: Prefer built-in generics over typing aliases for consistency with the new signatures.

Unify with the project-wide move to dict annotations.

-from typing import Any, Dict
+from typing import Any
-    parameters: Dict[str, ToolParameter] = {
+    parameters: dict[str, ToolParameter] = {
-    def get_parameterized_one_liner(self, params: Dict) -> str:
+    def get_parameterized_one_liner(self, params: dict) -> str:
-    def get_example_config(self) -> Dict[str, Any]:
+    def get_example_config(self) -> dict[str, Any]:

Also applies to: 21-21, 119-119, 138-138


38-41: Add type hints to print_tasks_table for mypy completeness.

-    def print_tasks_table(self, tasks):
+    def print_tasks_table(self, tasks: list[Task]) -> None:

119-121: Make the one-liner concise (avoid dumping full task objects).

-        todos = params.get("todos", [])
-        return f"Write {todos} investigation tasks"
+        todos = params.get("todos", [])
+        return f"Write {len(todos)} investigation tasks"
holmes/core/tool_calling_llm.py (1)

678-680: Minor: avoid constructing DummySpan at call site.

After adopting Optional[DummySpan], pass None here and let callee instantiate.

-                trace_span=DummySpan(),
+                trace_span=None,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 04650fd and 1a283f1.

📒 Files selected for processing (4)
  • holmes/common/env_vars.py (1 hunks)
  • holmes/core/tool_calling_llm.py (6 hunks)
  • holmes/interactive.py (3 hunks)
  • holmes/plugins/toolsets/investigator/core_investigation.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • holmes/common/env_vars.py
  • holmes/interactive.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
Type hints are required (mypy is configured in pyproject.toml)

Files:

  • holmes/core/tool_calling_llm.py
  • holmes/plugins/toolsets/investigator/core_investigation.py
holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must live under holmes/plugins/toolsets as either {name}.yaml or a {name}/ directory

Files:

  • holmes/plugins/toolsets/investigator/core_investigation.py
🧬 Code graph analysis (2)
holmes/core/tool_calling_llm.py (4)
holmes/core/tools.py (5)
  • StructuredToolResult (50-74)
  • ToolResultStatus (23-47)
  • invoke (155-179)
  • get_parameterized_one_liner (192-193)
  • get_parameterized_one_liner (219-226)
holmes/core/tracing.py (4)
  • DummySpan (101-117)
  • start_span (104-105)
  • log (107-108)
  • end (110-111)
holmes/core/tools_utils/tool_executor.py (2)
  • get_tool_by_name (60-64)
  • invoke (49-58)
holmes/core/safeguards.py (1)
  • prevent_overly_repeated_tool_call (78-120)
holmes/plugins/toolsets/investigator/core_investigation.py (1)
holmes/core/tools.py (3)
  • _invoke (182-189)
  • _invoke (240-270)
  • StructuredToolResult (50-74)
🪛 Ruff (0.12.2)
holmes/core/tool_calling_llm.py

521-521: Do not perform function call DummySpan in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


604-604: Do not catch blind exception: Exception

(BLE001)

holmes/plugins/toolsets/investigator/core_investigation.py

78-78: Unused method argument: user_approved

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: llm_evals
  • GitHub Check: Pre-commit checks
  • GitHub Check: Pre-commit checks
🔇 Additional comments (8)
holmes/core/tool_calling_llm.py (8)

5-6: Typing import expansion is correct.

Callable import aligns with the new callback type usage.


289-292: Callback surface looks good.

Public attribute and precise typing are appropriate for CLI wiring.


471-489: Good: preserve tool_number across futures.

Mapping futures→tool_number removes ordering ambiguity in as_completed().


493-501: Approval handled on the main thread — avoids prompt concurrency.

Processing approvals after futures complete prevents interleaved stdin.


611-616: LGTM: safeguard for overly repeated tool calls.

Nice integration with prevent_overly_repeated_tool_call.


617-625: Direct tool invocation path is clean.

Passes tool_number and user_approved flag correctly for auditability.


626-636: Robustness: coerce non-StructuredToolResult.

Prevents crashes if a tool misbehaves.


870-877: Streaming path: confirm where approval is handled.

call_stream doesn’t call handle_tool_call_approval; verify the CLI consumes TOOL_RESULT with APPROVAL_REQUIRED and re-invokes via a separate path. If not, consider emitting a dedicated STREAM_APPROVAL_REQUIRED event or reusing handle_tool_call_approval here.

Also applies to: 879-879

@nherment nherment requested a review from moshemorad September 1, 2025 11:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (1)

390-391: Add missing user_approved param to FetchTracesSimpleComparison._invoke
FetchTracesSimpleComparison._invoke in holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py must include user_approved to align with the base Tool API and avoid TypeError when invoked with that keyword.

--- a/holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
+++ b/holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
@@ -390,7 +390,10 @@ class FetchTracesSimpleComparison(Tool):
-    def _invoke(self, params: Dict) -> StructuredToolResult:
+    def _invoke(self, params: dict, user_approved: bool = False) -> StructuredToolResult:
+        # Intentionally unused; this tool is read-only and doesn't require approval.
+        _ = user_approved
         try:
             # Build query
             …
holmes/plugins/toolsets/datadog/toolset_datadog_metrics.py (1)

300-301: Fix UTC formatting: fromtimestamp() + "Z" is wrong; use utcfromtimestamp()

Current code formats local time as UTC. Use UTC to avoid skewed timestamps in the response.

-            start_rfc = datetime.fromtimestamp(from_time).strftime("%Y-%m-%dT%H:%M:%SZ")
-            end_rfc = datetime.fromtimestamp(to_time).strftime("%Y-%m-%dT%H:%M:%SZ")
+            start_rfc = datetime.utcfromtimestamp(from_time).strftime("%Y-%m-%dT%H:%M:%SZ")
+            end_rfc = datetime.utcfromtimestamp(to_time).strftime("%Y-%m-%dT%H:%M:%SZ")
🧹 Nitpick comments (11)
holmes/plugins/toolsets/logging_utils/logging_api.py (1)

177-179: Consume unused user_approved to silence Ruff ARG002 (keep API stable).

Signature aligns with the new approval flow, but the param isn’t used here. Either consume it or annotate to avoid CI noise.

Apply one of the following (pick one):

Option A — minimal, no behavior change:

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        # Part of the standardized approval signature; not used by this tool.
+        del user_approved  # avoid Ruff ARG002 until approval gating is needed here

Option B — annotate (if you prefer no-op over del):

-    ) -> StructuredToolResult:
+    ) -> StructuredToolResult:  # noqa: ARG002
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (3)

185-187: Silence Ruff ARG002 for unused user_approved (interface param).

Parameter is required by the interface but unused here; mark it intentionally used.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:
+        # Intentionally unused; this tool is read-only and doesn't require approval.
+        _ = user_approved

255-257: Silence Ruff ARG002 for unused user_approved (interface param).

Same rationale as above.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:
+        # Intentionally unused; this tool is read-only and doesn't require approval.
+        _ = user_approved

304-306: Silence Ruff ARG002 for unused user_approved (interface param).

Same rationale as above.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:
+        # Intentionally unused; this tool is read-only and doesn't require approval.
+        _ = user_approved
holmes/plugins/toolsets/datadog/toolset_datadog_metrics.py (7)

78-84: Silence Ruff ARG002 by marking user_approved as intentionally unused

These Datadog metrics tools don’t currently gate on approval; keep the param for interface compatibility but reference it to satisfy Ruff.

If any of these calls should be approval-gated, say the word and I’ll add an early APPROVAL_REQUIRED branch.

 def _invoke(
     self, params: dict, user_approved: bool = False
 ) -> StructuredToolResult:
+        _ = user_approved  # intentionally unused (reserved for approval gating)

123-129: Return NO_DATA (not ERROR) when no metrics match; include invocation for debuggability

Aligns with other tools’ semantics and helps users tweak filters.

-            if not metrics:
-                return StructuredToolResult(
-                    status=ToolResultStatus.ERROR,
-                    data="Your filter returned no metrics. Change your filter and try again",
-                    params=params,
-                )
+            if not metrics:
+                return StructuredToolResult(
+                    status=ToolResultStatus.NO_DATA,
+                    data="Your filter returned no metrics. Change your filter and try again.",
+                    params=params,
+                    invocation=json.dumps({"url": url, "params": query_params}),
+                )

220-224: Silence Ruff ARG002 for user_approved here as well

Same rationale as above.

 def _invoke(
     self, params: dict, user_approved: bool = False
 ) -> StructuredToolResult:
+        _ = user_approved  # intentionally unused (reserved for approval gating)

378-383: Silence Ruff ARG002 for user_approved in metadata tool

Keeps interface while passing lint.

 def _invoke(
     self, params: dict, user_approved: bool = False
 ) -> StructuredToolResult:
+        _ = user_approved  # intentionally unused (reserved for approval gating)

494-498: Silence Ruff ARG002 for user_approved in tags tool

Same lint fix.

 def _invoke(
     self, params: dict, user_approved: bool = False
 ) -> StructuredToolResult:
+        _ = user_approved  # intentionally unused (reserved for approval gating)

504-519: Always capture invocation in errors; plumb query_params through

Right now invocation is often None because query_params is None. Initialize it to {} and reuse it in the request for consistency with other tools.

-        url = None
-        query_params = None
+        url = None
+        query_params = {}
...
-            data = execute_datadog_http_request(
+            data = execute_datadog_http_request(
                 url=url,
                 headers=headers,
                 timeout=self.toolset.dd_config.request_timeout,
                 method="GET",
-                payload_or_params={},
+                payload_or_params=query_params,
             )

542-549: Ensure invocation is set even when params are empty

Include the URL (and empty params) unconditionally when available.

-                invocation=json.dumps({"url": url, "params": query_params})
-                if url and query_params
-                else None,
+                invocation=json.dumps({"url": url, "params": query_params}) if url else None,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1a283f1 and 1e09ceb.

📒 Files selected for processing (3)
  • holmes/plugins/toolsets/datadog/toolset_datadog_metrics.py (4 hunks)
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (3 hunks)
  • holmes/plugins/toolsets/logging_utils/logging_api.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
Type hints are required (mypy is configured in pyproject.toml)

Files:

  • holmes/plugins/toolsets/datadog/toolset_datadog_metrics.py
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
  • holmes/plugins/toolsets/logging_utils/logging_api.py
holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must live under holmes/plugins/toolsets as either {name}.yaml or a {name}/ directory

Files:

  • holmes/plugins/toolsets/datadog/toolset_datadog_metrics.py
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
  • holmes/plugins/toolsets/logging_utils/logging_api.py
🧬 Code graph analysis (3)
holmes/plugins/toolsets/datadog/toolset_datadog_metrics.py (5)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (4)
  • _invoke (185-229)
  • _invoke (255-283)
  • _invoke (304-322)
  • _invoke (390-548)
holmes/plugins/toolsets/logging_utils/logging_api.py (1)
  • _invoke (177-194)
holmes/plugins/toolsets/datadog/toolset_datadog_traces.py (3)
  • _invoke (214-355)
  • _invoke (380-494)
  • _invoke (563-702)
holmes/plugins/toolsets/servicenow/servicenow.py (3)
  • _invoke (118-145)
  • _invoke (163-179)
  • _invoke (197-219)
holmes/core/tools.py (1)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (1)
holmes/core/tools.py (3)
  • _invoke (182-189)
  • _invoke (240-270)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/logging_utils/logging_api.py (2)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (4)
  • _invoke (185-229)
  • _invoke (255-283)
  • _invoke (304-322)
  • _invoke (390-548)
holmes/core/tools.py (1)
  • StructuredToolResult (50-74)
🪛 Ruff (0.12.2)
holmes/plugins/toolsets/datadog/toolset_datadog_metrics.py

79-79: Unused method argument: user_approved

(ARG002)


221-221: Unused method argument: user_approved

(ARG002)


379-379: Unused method argument: user_approved

(ARG002)


495-495: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py

186-186: Unused method argument: user_approved

(ARG002)


256-256: Unused method argument: user_approved

(ARG002)


305-305: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/logging_utils/logging_api.py

178-178: Unused method argument: user_approved

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Pre-commit checks
  • GitHub Check: llm_evals
  • GitHub Check: Pre-commit checks

moshemorad
moshemorad previously approved these changes Sep 1, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (12)
holmes/plugins/toolsets/internet/notion.py (2)

52-58: Prevent leaking Notion auth headers to arbitrary domains.

scrape() is called with additional_headers even when url isn’t api.notion.com, risking token leakage. Gate headers by host and pass empty headers otherwise.

Apply:

-        # Get headers from the toolset configuration
-        additional_headers = (
-            self.toolset.additional_headers if self.toolset.additional_headers else {}
-        )
-        url = self.convert_notion_url(url)
-        content, _ = scrape(url, additional_headers)
+        # Get headers from the toolset configuration, but never send Notion creds to non-Notion hosts
+        url = self.convert_notion_url(url)
+        parsed_host = urlparse(url).netloc
+        headers = (
+            self.toolset.additional_headers
+            if self.toolset.additional_headers and parsed_host == "api.notion.com"
+            else {}
+        )
+        content, mime_type = scrape(url, headers)

Also add the import (see Lines 1-4 comment).


1-4: Add missing import for urlparse.

 import re
 import logging
 import json
-from typing import Any, Dict, Tuple
+from typing import Any, Dict, Tuple
+from urllib.parse import urlparse
holmes/plugins/toolsets/rabbitmq/toolset_rabbitmq.py (1)

72-80: List all configured clusters (don’t filter by SUCCESS).

The description says “List all configured clusters,” but the code hides errored ones. This makes discovery impossible when health checks fail.

         available_clusters = [
             {
                 "cluster_id": c.id,
                 "management_url": c.management_url,
                 "connection_status": c.connection_status,
             }
-            for c in self.toolset.config.clusters
-            if c.connection_status == ClusterConnectionStatus.SUCCESS
+            for c in self.toolset.config.clusters
         ]
holmes/plugins/toolsets/opensearch/opensearch.py (5)

66-73: Fix client resolution bug — returns the wrong client when multiple are configured

The comprehension re-iterates over clients instead of checking the current client, so any match returns the first client, not the matching one. Also improve the error to list available hosts.

-    for client in clients:
-        found = any(host in client.hosts for client in clients)
-        if found:
-            return client
-
-    raise Exception(
-        f"Failed to resolve opensearch client. Could not find a matching host: {host}"
-    )
+    for client in clients:
+        if host in client.hosts:
+            return client
+
+    available = [h for c in clients for h in c.hosts]
+    raise Exception(
+        f"Failed to resolve opensearch client. Could not find a matching host: {host}. "
+        f"Available hosts: {available}"
+    )

96-105: Harden the tool call: handle errors, set a timeout, and silence unused user_approved

Adds resilience similar to other toolsets (e.g., Prometheus), and fixes the Ruff ARG002 finding.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
-        client = get_client(self.toolset.clients, host=params.get("host", ""))
-        shards = client.client.cat.shards()
-        return StructuredToolResult(
-            status=ToolResultStatus.SUCCESS,
-            data=str(shards),
-            params=params,
-        )
+    def _invoke(
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:
+        del user_approved  # keep signature for approval workflow, silence ARG002
+        try:
+            client = get_client(self.toolset.clients, host=params.get("host", ""))
+            shards = client.client.cat.shards(format="json", request_timeout=180)
+            return StructuredToolResult(
+                status=ToolResultStatus.SUCCESS,
+                data=str(shards),
+                params=params,
+            )
+        except Exception as e:
+            logging.warning("Failed to list OpenSearch shards", exc_info=True)
+            return StructuredToolResult(
+                status=ToolResultStatus.ERROR,
+                error=f"Unexpected error while listing shards: {e}",
+                params=params,
+            )

If you prefer not to stringify, we can return the raw object and let StructuredToolResult.get_stringified_data() handle it.


127-138: Add error handling/timeout and silence unused user_approved

Parallels the Prometheus tool’s robustness and resolves Ruff ARG002.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
-        client = get_client(self.toolset.clients, host=params.get("host"))
-        response = client.client.cluster.get_settings(
-            include_defaults=True, flat_settings=True
-        )
-        return StructuredToolResult(
-            status=ToolResultStatus.SUCCESS,
-            data=str(response),
-            params=params,
-        )
+    def _invoke(
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:
+        del user_approved  # silence ARG002 until used
+        try:
+            client = get_client(self.toolset.clients, host=params.get("host"))
+            response = client.client.cluster.get_settings(
+                include_defaults=True, flat_settings=True, request_timeout=180
+            )
+            return StructuredToolResult(
+                status=ToolResultStatus.SUCCESS,
+                data=str(response),
+                params=params,
+            )
+        except Exception as e:
+            logging.warning("Failed to get OpenSearch cluster settings", exc_info=True)
+            return StructuredToolResult(
+                status=ToolResultStatus.ERROR,
+                error=f"Unexpected error while getting cluster settings: {e}",
+                params=params,
+            )

160-169: Add error handling/timeout and silence unused user_approved

Same rationale as above for consistency and resilience.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
-        client = get_client(self.toolset.clients, host=params.get("host", ""))
-        health = client.client.cluster.health()
-        return StructuredToolResult(
-            status=ToolResultStatus.SUCCESS,
-            data=str(health),
-            params=params,
-        )
+    def _invoke(
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:
+        del user_approved  # silence ARG002 until used
+        try:
+            client = get_client(self.toolset.clients, host=params.get("host", ""))
+            health = client.client.cluster.health(request_timeout=180)
+            return StructuredToolResult(
+                status=ToolResultStatus.SUCCESS,
+                data=str(health),
+                params=params,
+            )
+        except Exception as e:
+            logging.warning("Failed to get OpenSearch cluster health", exc_info=True)
+            return StructuredToolResult(
+                status=ToolResultStatus.ERROR,
+                error=f"Unexpected error while getting cluster health: {e}",
+                params=params,
+            )

250-252: Fix typo in example config (“OPENSEACH” → “OPENSEARCH”)

User-facing string; please correct.

-                    hosts=[OpenSearchHost(host="YOUR OPENSEACH HOST")],
+                    hosts=[OpenSearchHost(host="YOUR OPENSEARCH HOST")],
holmes/plugins/toolsets/opensearch/opensearch_traces.py (1)

112-114: Add type hints to comply with repo typing policy.

Annotate params in one-liners.

-    def get_parameterized_one_liner(self, params) -> str:
+    def get_parameterized_one_liner(self, params: dict) -> str:
@@
-    def get_parameterized_one_liner(self, params) -> str:
+    def get_parameterized_one_liner(self, params: dict) -> str:

Also applies to: 190-194

holmes/plugins/toolsets/prometheus/prometheus.py (3)

666-672: TLS verification flag is ignored in POST requests

Both POST calls omit verify=self.toolset.config.prometheus_ssl_enabled. This breaks clusters using self-signed certs or when SSL verification is disabled.

 response = requests.post(
     url=url,
     headers=self.toolset.config.headers,
     auth=self.toolset.config.get_auth(),
     data=payload,
-    timeout=60,
+    timeout=60,
+    verify=self.toolset.config.prometheus_ssl_enabled,
 )

 # ...

 response = requests.post(
     url=url,
     headers=self.toolset.config.headers,
     auth=self.toolset.config.get_auth(),
     data=payload,
-    timeout=120,
+    timeout=120,
+    verify=self.toolset.config.prometheus_ssl_enabled,
 )

Also applies to: 823-829


674-717: Handle Prometheus 200 responses with "status": "error" as failures

When Prometheus returns HTTP 200 with status=error, the tool currently returns SUCCESS. That hides real errors from callers.

-            if response.status_code == 200:
-                data = response.json()
-                status = data.get("status")
+            if response.status_code == 200:
+                data = response.json()
+                status = data.get("status")
+                if status != "success":
+                    err = data.get("error") or data.get("message") or data.get("errorType") or "Unknown error"
+                    return StructuredToolResult(
+                        status=ToolResultStatus.ERROR,
+                        error=f"Query execution failed: {err}",
+                        params=params,
+                    )
                 error_message = None
                 if status == "success" and not result_has_data(data):
                     status = "Failed"
                     error_message = (
                         "The prometheus query returned no result. Is the query correct?"
                     )

Apply the same pattern to ExecuteRangeQuery._invoke.

Also applies to: 831-883


806-813: Parse duration strings for step and enforce requirement

Param description allows duration format (e.g., 5m), but code only accepts floats and even defaults to MAX_GRAPH_POINTS when missing. Parse durations, validate presence, and adjust safely.

-            step = params.get("step", "")
-
-            step = adjust_step_for_max_points(
+            step_raw = get_param_or_raise(params, "step")
+            try:
+                step_seconds = parse_duration_to_seconds(step_raw)
+            except ValueError as e:
+                return StructuredToolResult(
+                    status=ToolResultStatus.ERROR,
+                    error=str(e),
+                    params=params,
+                )
+
+            step_seconds = adjust_step_for_max_points(
                 start_timestamp=start,
                 end_timestamp=end,
-                step=float(step) if step else MAX_GRAPH_POINTS,
+                step=float(step_seconds),
             )
 
             description = params.get("description", "")
             output_type = params.get("output_type", "Plain")
             payload = {
                 "query": query,
                 "start": start,
                 "end": end,
-                "step": step,
+                "step": step_seconds,
             }

Add this helper (place near other helpers in this module):

def parse_duration_to_seconds(value: Union[str, float, int]) -> float:
    if isinstance(value, (int, float)):
        if value <= 0:
            raise ValueError("Invalid step: must be > 0 seconds")
        return float(value)
    s = str(value).strip().lower()
    m = re.fullmatch(r"(\d+(?:\.\d+)?)([smh])", s)
    if not m:
        raise ValueError("Invalid step. Use seconds (number) or <number><s|m|h>, e.g., 30, 30s, 5m, 1h")
    num = float(m.group(1))
    unit = m.group(2)
    mult = {"s": 1, "m": 60, "h": 3600}[unit]
    seconds = num * mult
    if seconds <= 0:
        raise ValueError("Invalid step: must be > 0 seconds")
    return seconds

Also applies to: 816-821

♻️ Duplicate comments (7)
holmes/plugins/toolsets/internet/notion.py (4)

67-71: Harden parsing and surface errors instead of crashing.

parse_notion_content may raise JSONDecodeError; catch and return a structured error. Also populate url in the result for UX parity with other tools.

-        return StructuredToolResult(
-            status=ToolResultStatus.SUCCESS,
-            data=self.parse_notion_content(content),
-            params=params,
-        )
+        try:
+            parsed = self.parse_notion_content(content)
+        except Exception as e:
+            logging.error(f"Failed to parse Notion content from {url}: {e}", exc_info=True)
+            return StructuredToolResult(
+                status=ToolResultStatus.ERROR,
+                error=f"Failed to parse Notion content: {e}",
+                params=params,
+                url=url,
+            )
+        return StructuredToolResult(
+            status=ToolResultStatus.SUCCESS,
+            data=parsed,
+            params=params,
+            url=url,
+        )

38-45: Make URL conversion robust and typed; handle query/fragment.

Current regex only matches trailing -{32 word chars}. Support canonical 32-hex IDs anywhere, strip query/fragment, and add type hints.

-    def convert_notion_url(self, url):
+    def convert_notion_url(self, url: str) -> str:
         if "api.notion.com" in url:
             return url
-        match = re.search(r"-(\w{32})$", url)
+        clean_url = url.split("?", 1)[0].split("#", 1)[0]
+        match = re.search(r"([0-9a-fA-F]{32})", clean_url)
         if match:
             notion_id = match.group(1)
             return f"https://api.notion.com/v1/blocks/{notion_id}/children"
         return url  # Return original URL if no match is found

95-110: Avoid KeyError on rich_text; support plain_text fallback.

Notion returns mention/equation types without text["text"]["content"]. Use plain_text or safe get().

     def format_rich_text(self, rich_texts: list) -> str:
         """Helper function to apply formatting (bold, code, etc.)"""
         formatted_text = []
         for text in rich_texts:
-            plain_text = text["text"]["content"]
+            plain_text = text.get("plain_text") or text.get("text", {}).get("content", "")
+            if not plain_text:
+                continue
             annotations = text.get("annotations", {})

112-114: Add type hints to params.

Match project typing standards.

-    def get_parameterized_one_liner(self, params) -> str:
+    def get_parameterized_one_liner(self, params: dict) -> str:
holmes/plugins/toolsets/robusta/robusta.py (1)

40-47: Fix success-on-error when DAL is disabled (return ERROR, not SUCCESS/NO_DATA).

When DAL is disabled, _fetch_finding returns a truthy dict ({"error": ...}) which makes _invoke return SUCCESS. Return None from _fetch_finding and short-circuit _invoke with ERROR if DAL is disabled.

@@
-    def _fetch_finding(self, finding_id: str) -> Optional[Dict]:
-        if self._dal and self._dal.enabled:
-            return self._dal.get_issue_data(finding_id)
-        else:
-            error = f"Failed to find a finding with finding_id={finding_id}: Holmes' data access layer is not enabled."
-            logging.error(error)
-            return {"error": error}
+    def _fetch_finding(self, finding_id: str) -> Optional[Dict]:
+        if self._dal and self._dal.enabled:
+            return self._dal.get_issue_data(finding_id)
+        return None
@@
-        try:
-            finding = self._fetch_finding(finding_id)
+        try:
+            if not (self._dal and self._dal.enabled):
+                msg = "Holmes' data access layer is not enabled."
+                logging.error(msg)
+                return StructuredToolResult(
+                    status=ToolResultStatus.ERROR,
+                    error=f"Failed to fetch finding with finding_id={finding_id}: {msg}",
+                    params=params,
+                )
+            finding = self._fetch_finding(finding_id)
             if finding:
                 return StructuredToolResult(
                     status=ToolResultStatus.SUCCESS,
                     data=finding,
                     params=params,
                 )
             else:
                 return StructuredToolResult(
                     status=ToolResultStatus.NO_DATA,
                     data=f"Could not find a finding with finding_id={finding_id}",
                     params=params,
                 )

Also applies to: 51-66

holmes/plugins/toolsets/opensearch/opensearch_traces.py (2)

72-78: Prefer POST with JSON body for OpenSearch queries (GET bodies can be dropped).

Switch to POST and use the json= param. Also aligns with prior feedback.

-            logs_response = requests.get(
+            logs_response = requests.post(
                 url=get_search_url(self._toolset.opensearch_config),
                 timeout=180,
                 verify=True,
-                data=json.dumps(body),
+                json=body,
                 headers=headers,
             )

148-154: Same here: use POST + json for search.

Avoid GET-with-body for reliability and proxy compatibility.

-            logs_response = requests.get(
+            logs_response = requests.post(
                 url=get_search_url(self._toolset.opensearch_config),
                 timeout=180,
                 verify=True,
-                data=json.dumps(full_query),
+                json=full_query,
                 headers=headers,
             )
🧹 Nitpick comments (42)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (3)

185-187: Suppress unused-arg lint for user_approved (override keeps interface, not used here).

Ruff flags ARG002. Since this param is required by the base interface and intentionally unused here, suppress at the signature level.

-        self, params: dict, user_approved: bool = False
+        self, params: dict, user_approved: bool = False  # noqa: ARG002

255-257: Same here: silence ARG002 for user_approved.

Keep the parameter for interface consistency; suppress the linter.

-        self, params: dict, user_approved: bool = False
+        self, params: dict, user_approved: bool = False  # noqa: ARG002

304-306: Repeat: suppress ARG002 on the override.

Consistent with the other methods.

-        self, params: dict, user_approved: bool = False
+        self, params: dict, user_approved: bool = False  # noqa: ARG002
holmes/plugins/toolsets/internet/notion.py (1)

73-75: Tighten type hints for parse_notion_content.

content is a JSON string; reflect that in the signature.

-    def parse_notion_content(self, content: Any) -> str:
+    def parse_notion_content(self, content: str) -> str:
holmes/plugins/toolsets/robusta/robusta.py (5)

48-50: Silence Ruff ARG002 for unused user_approved.

Parameter is required by the interface; explicitly consume it to satisfy Ruff.

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        _ = user_approved  # reserved for approval gating; silence Ruff ARG002

118-120: Add early ERROR when DAL is disabled + silence Ruff ARG002.

Mirror the fix in FetchRobustaFinding so disabled DAL yields ERROR (not NO_DATA). Also consume user_approved.

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        _ = user_approved  # silence Ruff ARG002
+        if not (self._dal and self._dal.enabled):
+            msg = "Holmes' data access layer is not enabled."
+            logging.error(msg)
+            return StructuredToolResult(
+                status=ToolResultStatus.ERROR,
+                error=f"Failed to fetch recommendations for {params}: {msg}",
+                params=params,
+            )
         try:
             recommendations = self._resource_recommendation(params)

178-180: Add early ERROR when DAL is disabled + silence Ruff ARG002.

Same rationale as above.

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        _ = user_approved  # silence Ruff ARG002
+        if not (self._dal and self._dal.enabled):
+            msg = "Holmes' data access layer is not enabled."
+            logging.error(msg)
+            return StructuredToolResult(
+                status=ToolResultStatus.ERROR,
+                error=f"Failed to fetch changes for {params}: {msg}",
+                params=params,
+            )
         try:
             changes = self._fetch_change_history(params)

162-162: Correct parameter description (ending vs starting).

Minor text fix for clarity.

-                    description="The starting time boundary for the search period. String in RFC3339 format.",
+                    description="The ending time boundary for the search period. String in RFC3339 format.",

66-76: Prefer logging.exception(...) and populate error field on ERROR.

Align with other toolsets (e.g., Prometheus) and include traceback.

-        except Exception as e:
-            logging.error(e)
-            logging.error(
-                f"There was an internal error while fetching finding {finding_id}. {str(e)}"
-            )
-
-            return StructuredToolResult(
-                status=ToolResultStatus.ERROR,
-                data=f"There was an internal error while fetching finding {finding_id}",
-                params=params,
-            )
+        except Exception as e:
+            logging.exception(
+                f"There was an internal error while fetching finding {finding_id}. {str(e)}"
+            )
+            return StructuredToolResult(
+                status=ToolResultStatus.ERROR,
+                error=f"Internal error while fetching finding {finding_id}",
+                params=params,
+            )
holmes/plugins/toolsets/kafka.py (7)

156-158: Silence unused user_approved (ruff ARG002) while keeping the interface stable.

Until the approval flag is actually used here, suppress the warning at the signature.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(  # noqa: ARG002
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:

231-233: Same here: suppress unused user_approved (ruff ARG002).

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(  # noqa: ARG002
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:

289-291: Same here: suppress unused user_approved (ruff ARG002).

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(  # noqa: ARG002
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:

347-349: Same here: suppress unused user_approved (ruff ARG002).

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(  # noqa: ARG002
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:

472-474: Same here: suppress unused user_approved (ruff ARG002).

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(  # noqa: ARG002
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:

563-564: Same here: suppress unused user_approved (ruff ARG002).

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(  # noqa: ARG002
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:

523-529: Grammar nit: pluralize “consumer group(s)”.

Tiny UX polish in the user-facing string.

-            else:
-                result_text = f"No consumer group were found for topic {topic_name}"
+            else:
+                result_text = f"No consumer groups were found for topic {topic_name}"
holmes/plugins/toolsets/rabbitmq/toolset_rabbitmq.py (7)

66-68: Silence unused-argument warnings while keeping the new interface.

Ruff flags params/user_approved as unused. Keep names for keyword-compatibility, but mark user_approved explicitly unused.

 def _invoke(
-        self, params: dict, user_approved: bool = False
+        self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        _ = user_approved  # explicitly unused (read-only tool)

81-83: Return the input params to improve traceability (and use the arg).

Include params=params in StructuredToolResult.

-        return StructuredToolResult(
-            status=ToolResultStatus.SUCCESS, data=available_clusters
-        )
+        return StructuredToolResult(
+            status=ToolResultStatus.SUCCESS,
+            data=available_clusters,
+            params=params,
+        )

106-108: Same unused-arg fix for GetRabbitMQClusterStatus.

 def _invoke(
-        self, params: dict, user_approved: bool = False
+        self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
-        try:
+        _ = user_approved  # explicitly unused (read-only tool)
+        try:

114-116: Return params in success result for parity and auditability.

-            return StructuredToolResult(status=ToolResultStatus.SUCCESS, data=result)
+            return StructuredToolResult(
+                status=ToolResultStatus.SUCCESS, data=result, params=params
+            )

117-123: Use error-level logging and include params on failure.

logging.info(..., exc_info=True) under-reports. Prefer logging.exception and return params=params.

-        except Exception as e:
-            logging.info("Failed to process RabbitMQ cluster status", exc_info=True)
+        except Exception as e:
+            logging.exception("Failed to process RabbitMQ cluster status")
             return StructuredToolResult(
                 status=ToolResultStatus.ERROR,
                 error=f"Unexpected error fetching RabbitMQ cluster status: {str(e)}",
-                data=None,
+                data=None,
+                params=params,
             )

41-46: Fix misleading error message when multiple clusters are configured.

Message says “No cluster is configured” while multiple exist.

-        elif not cluster_id and len(cluster_ids) > 0:
-            raise ValueError(
-                f"No cluster is configured. Possible cluster_id values are: {', '.join(cluster_ids)}"
-            )
+        elif not cluster_id and len(cluster_ids) > 0:
+            raise ValueError(
+                f"Multiple clusters are configured. 'cluster_id' is required. Possible cluster_id values are: {', '.join(cluster_ids)}"
+            )

85-88: Add missing type hints to satisfy mypy policy.

-    def get_parameterized_one_liner(self, params) -> str:
+    def get_parameterized_one_liner(self, params: dict) -> str:
-    def get_parameterized_one_liner(self, params) -> str:
+    def get_parameterized_one_liner(self, params: dict) -> str:

Also applies to: 125-130

holmes/plugins/toolsets/runbook/runbook_fetcher.py (4)

38-41: Silence Ruff ARG002 without breaking the interface

user_approved is intentionally unused here; Ruff flags it. Keep the param name for compatibility and mark it used.

Apply this diff:

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        # Part of the standard tool interface; not used for read-only runbook fetches.
+        _ = user_approved  # avoid Ruff ARG002
         link: str = params["link"]

41-46: Validate and normalize link to avoid KeyError and empty input

Defensive check keeps behavior consistent with other toolsets that validate params.

-        link: str = params["link"]
+        if "link" not in params or not isinstance(params["link"], str) or not params["link"].strip():
+            return StructuredToolResult(
+                status=ToolResultStatus.ERROR,
+                error="Missing or invalid 'link' parameter",
+                params=params,
+            )
+        link: str = params["link"].strip()

62-63: Specify file encoding for deterministic reads

Prevents locale-dependent decoding issues when reading runbooks.

-            with open(runbook_path, "r") as file:
+            with open(runbook_path, "r", encoding="utf-8") as file:

98-102: Return invocation for observability

Including the resolved path helps trace what was fetched (parity with other toolsets that set invocation).

                 return StructuredToolResult(
                     status=ToolResultStatus.SUCCESS,
                     data=wrapped_content,
-                    params=params,
+                    params=params,
+                    invocation=str(runbook_path),
                 )
holmes/plugins/toolsets/internet/internet.py (2)

151-153: Remove dead substitutions after markdownify.

After markdownify, raw <div> tags shouldn’t be present; these substitutions are likely no-ops. Trim for clarity.

-    md = re.sub(r"</div>", "      ", md)
-    md = re.sub(r"<div>", "     ", md)

219-221: Add type hints to satisfy project typing policy.

params lacks a type hint; annotate as Dict[str, Any] for consistency with tool params elsewhere.

-    def get_parameterized_one_liner(self, params) -> str:
+    def get_parameterized_one_liner(self, params: Dict[str, Any]) -> str:
holmes/plugins/toolsets/opensearch/opensearch.py (5)

185-193: Silence unused user_approved to satisfy Ruff ARG002

No external calls here; just resolve the linter warning.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
-        hosts = [host for client in self.toolset.clients for host in client.hosts]
-        return StructuredToolResult(
-            status=ToolResultStatus.SUCCESS,
-            data=str(hosts),
-            params=params,
-        )
+    def _invoke(
+        self, params: dict, user_approved: bool = False
+    ) -> StructuredToolResult:
+        del user_approved  # keep signature consistent across toolsets
+        hosts = [host for client in self.toolset.clients for host in client.hosts]
+        return StructuredToolResult(
+            status=ToolResultStatus.SUCCESS,
+            data=str(hosts),
+            params=params,
+        )

140-143: Add missing type hint to satisfy mypy configuration

Aligns with repo typing guidelines.

-    def get_parameterized_one_liner(self, params) -> str:
+    def get_parameterized_one_liner(self, params: Dict) -> str:

171-174: Add missing type hint to satisfy mypy configuration

Same as above.

-    def get_parameterized_one_liner(self, params) -> str:
+    def get_parameterized_one_liner(self, params: Dict) -> str:

201-205: Avoid mutable class-level default for clients

Using a shared list at the class level can lead to cross-instance leakage. Prefer a default factory or instance field.

Option A (pydantic): use Field

# add at top:
from pydantic import Field
-    clients: List[OpenSearchClient] = []
+    clients: List[OpenSearchClient] = Field(default_factory=list)

Option B (plain init): initialize in init

     def __init__(self):
-        super().__init__(
+        self.clients = []
+        super().__init__(

1-5: Import for suggested pydantic Field (only if you apply that option)

Add this import at the top to support Field(default_factory=list).

-from pydantic import BaseModel, ConfigDict
+from pydantic import BaseModel, ConfigDict, Field
holmes/plugins/toolsets/logging_utils/logging_api.py (3)

178-189: Silence Ruff ARG002 or wire approval gating.

user_approved is added to the signature but unused, triggering Ruff ARG002. Either consume it minimally or gate sensitive log access behind approval.

Option A (minimal linter fix):

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        # Part of unified tool signature; not used here (yet)
+        _ = user_approved  # silence Ruff ARG002
         structured_params = FetchPodLogsParams(
             namespace=get_param_or_raise(params, "namespace"),
             pod_name=get_param_or_raise(params, "pod_name"),
             start_time=params.get("start_time"),
             end_time=params.get("end_time"),
             filter=params.get("filter"),
             exclude_filter=params.get("exclude_filter"),
             limit=params.get("limit"),
         )

Option B (optional): if logs are considered sensitive, add approval gating here and return APPROVAL_REQUIRED when not approved (I can draft this if you confirm the policy).


181-189: Normalize time parameters before building the request (optional).

You already have process_time_parameters; using it here standardizes inputs across providers and aligns with the documented defaults.

-        structured_params = FetchPodLogsParams(
+        # Normalize time parameters to RFC3339/relative defaults
+        processed_start, processed_end = process_time_parameters(
+            params.get("start_time"),
+            params.get("end_time"),
+        )
+        structured_params = FetchPodLogsParams(
             namespace=get_param_or_raise(params, "namespace"),
             pod_name=get_param_or_raise(params, "pod_name"),
-            start_time=params.get("start_time"),
-            end_time=params.get("end_time"),
+            start_time=processed_start,
+            end_time=processed_end,
             filter=params.get("filter"),
             exclude_filter=params.get("exclude_filter"),
             limit=params.get("limit"),
         )

51-58: Add type hints to the validator to satisfy mypy.

The codebase requires type hints. This validator can declare parameter and return types without changing behavior.

-    def convert_start_time_to_string(cls, v):
-        """Convert integer start_time values to strings."""
-        if v is not None and isinstance(v, int):
-            return str(v)
-        return v
+    def convert_start_time_to_string(cls, v: object) -> Optional[str]:
+        """Convert integer start_time values to strings."""
+        if v is None:
+            return None
+        if isinstance(v, int):
+            return str(v)
+        if isinstance(v, str):
+            return v
+        # Fallback: stringify unexpected types to keep schema contract
+        return str(v)
holmes/plugins/toolsets/opensearch/opensearch_traces.py (2)

37-39: Use user_approved and surface invocation metadata to quell Ruff ARG002 and aid debugging.

Attach minimal redacted invocation details to results.

 def _invoke(
-        self, params: dict, user_approved: bool = False
+        self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
@@
                     return StructuredToolResult(
                         status=ToolResultStatus.SUCCESS,
                         data=cached_response,
                         params=params,
+                        invocation=json.dumps(
+                            {"action": "get_traces_fields", "from_cache": True, "user_approved": user_approved}
+                        ),
                     )
@@
             return StructuredToolResult(
                 status=ToolResultStatus.SUCCESS,
                 data=response,
                 params=params,
+                invocation=json.dumps(
+                    {"action": "get_traces_fields", "from_cache": False, "user_approved": user_approved}
+                ),
             )
     def _invoke(
-        self, params: dict, user_approved: bool = False
+        self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
@@
             return StructuredToolResult(
                 status=ToolResultStatus.SUCCESS,
                 data=json.dumps(logs_response.json()),
                 params=params,
+                invocation=json.dumps(
+                    {"action": "traces_in_range_search", "user_approved": user_approved, "size": full_query.get("size")}
+                ),
             )

Also applies to: 50-54, 83-87, 159-163


139-141: Don’t clobber caller-provided size; use setdefault.

Preserves explicit size in the incoming query.

-            full_query["size"] = int(
+            full_query.setdefault("size", int(
                 os.environ.get("OPENSEARCH_TRACES_SEARCH_SIZE", "5000")
-            )
+            ))
holmes/plugins/toolsets/prometheus/prometheus.py (3)

443-445: Silence unused user_approved warnings without changing the public signature

Param is required by the base interface but currently unused; ruff flags it (ARG002). Mark as intentionally unused in each method body.

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        _ = user_approved  # intentionally unused; approval is handled by the CLI flow

 # ... same in ListAvailableMetrics._invoke:

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        _ = user_approved  # intentionally unused; approval is handled by the CLI flow

 # ... same in ExecuteInstantQuery._invoke:

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        _ = user_approved  # intentionally unused; approval is handled by the CLI flow

 # ... same in ExecuteRangeQuery._invoke:

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
+        _ = user_approved  # intentionally unused; approval is handled by the CLI flow

Also applies to: 541-543, 649-651, 787-789


580-582: Normalize and validate type_filter

Prevent subtle mismatches and user errors by normalizing case and validating allowed values.

-            type_filter = params.get("type_filter")
-            if type_filter:
-                metrics = filter_metrics_by_type(metrics, type_filter)
+            type_filter = params.get("type_filter")
+            if type_filter:
+                tf = str(type_filter).lower().strip()
+                if tf not in {"counter", "gauge", "histogram", "summary"}:
+                    return StructuredToolResult(
+                        status=ToolResultStatus.ERROR,
+                        error="Invalid type_filter. Must be one of: counter, gauge, histogram, summary.",
+                        params=params,
+                    )
+                metrics = filter_metrics_by_type(metrics, tf)

602-618: Use logging.warning (logging.warn is deprecated)

Minor API deprecation cleanup.

-        except requests.Timeout:
-            logging.warn("Timeout while fetching prometheus metrics", exc_info=True)
+        except requests.Timeout:
+            logging.warning("Timeout while fetching prometheus metrics", exc_info=True)
@@
-        except RequestException as e:
-            logging.warn("Failed to fetch prometheus metrics", exc_info=True)
+        except RequestException as e:
+            logging.warning("Failed to fetch prometheus metrics", exc_info=True)
@@
-        except Exception as e:
-            logging.warn("Failed to process prometheus metrics", exc_info=True)
+        except Exception as e:
+            logging.warning("Failed to process prometheus metrics", exc_info=True)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1e09ceb and f3150ae.

📒 Files selected for processing (12)
  • holmes/common/env_vars.py (1 hunks)
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (3 hunks)
  • holmes/plugins/toolsets/internet/internet.py (1 hunks)
  • holmes/plugins/toolsets/internet/notion.py (1 hunks)
  • holmes/plugins/toolsets/kafka.py (6 hunks)
  • holmes/plugins/toolsets/logging_utils/logging_api.py (1 hunks)
  • holmes/plugins/toolsets/opensearch/opensearch.py (4 hunks)
  • holmes/plugins/toolsets/opensearch/opensearch_traces.py (3 hunks)
  • holmes/plugins/toolsets/prometheus/prometheus.py (5 hunks)
  • holmes/plugins/toolsets/rabbitmq/toolset_rabbitmq.py (2 hunks)
  • holmes/plugins/toolsets/robusta/robusta.py (3 hunks)
  • holmes/plugins/toolsets/runbook/runbook_fetcher.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • holmes/common/env_vars.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
Type hints are required (mypy is configured in pyproject.toml)

Files:

  • holmes/plugins/toolsets/rabbitmq/toolset_rabbitmq.py
  • holmes/plugins/toolsets/prometheus/prometheus.py
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
  • holmes/plugins/toolsets/internet/internet.py
  • holmes/plugins/toolsets/internet/notion.py
  • holmes/plugins/toolsets/kafka.py
  • holmes/plugins/toolsets/logging_utils/logging_api.py
  • holmes/plugins/toolsets/opensearch/opensearch.py
  • holmes/plugins/toolsets/opensearch/opensearch_traces.py
  • holmes/plugins/toolsets/robusta/robusta.py
  • holmes/plugins/toolsets/runbook/runbook_fetcher.py
holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must live under holmes/plugins/toolsets as either {name}.yaml or a {name}/ directory

Files:

  • holmes/plugins/toolsets/rabbitmq/toolset_rabbitmq.py
  • holmes/plugins/toolsets/prometheus/prometheus.py
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
  • holmes/plugins/toolsets/internet/internet.py
  • holmes/plugins/toolsets/internet/notion.py
  • holmes/plugins/toolsets/kafka.py
  • holmes/plugins/toolsets/logging_utils/logging_api.py
  • holmes/plugins/toolsets/opensearch/opensearch.py
  • holmes/plugins/toolsets/opensearch/opensearch_traces.py
  • holmes/plugins/toolsets/robusta/robusta.py
  • holmes/plugins/toolsets/runbook/runbook_fetcher.py
🧬 Code graph analysis (11)
holmes/plugins/toolsets/rabbitmq/toolset_rabbitmq.py (2)
holmes/core/tools.py (3)
  • _invoke (182-189)
  • _invoke (240-270)
  • StructuredToolResult (50-74)
tests/llm/utils/mock_toolset.py (1)
  • _invoke (439-472)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
holmes/plugins/toolsets/logging_utils/logging_api.py (1)
  • _invoke (178-195)
holmes/plugins/toolsets/opensearch/opensearch.py (4)
  • _invoke (96-105)
  • _invoke (127-138)
  • _invoke (160-169)
  • _invoke (185-193)
holmes/plugins/toolsets/opensearch/opensearch_traces.py (2)
  • _invoke (37-110)
  • _invoke (132-188)
holmes/core/tools.py (3)
  • _invoke (182-189)
  • _invoke (240-270)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (1)
holmes/core/tools.py (3)
  • _invoke (182-189)
  • _invoke (240-270)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/internet/internet.py (5)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
  • _invoke (443-514)
  • _invoke (541-622)
  • _invoke (649-738)
  • _invoke (787-897)
holmes/plugins/toolsets/internet/notion.py (1)
  • _invoke (47-71)
holmes/plugins/toolsets/kafka.py (6)
  • _invoke (156-204)
  • _invoke (231-267)
  • _invoke (289-315)
  • _invoke (347-387)
  • _invoke (472-546)
  • _invoke (562-570)
holmes/plugins/toolsets/opensearch/opensearch.py (3)
  • _invoke (96-105)
  • _invoke (127-138)
  • _invoke (160-169)
holmes/core/tools.py (1)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/internet/notion.py (2)
holmes/plugins/toolsets/internet/internet.py (1)
  • _invoke (189-217)
holmes/core/tools.py (1)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/kafka.py (2)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
  • _invoke (443-514)
  • _invoke (541-622)
  • _invoke (649-738)
  • _invoke (787-897)
holmes/core/tools.py (2)
  • _invoke (182-189)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/logging_utils/logging_api.py (2)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
  • _invoke (443-514)
  • _invoke (541-622)
  • _invoke (649-738)
  • _invoke (787-897)
holmes/core/tools.py (1)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/opensearch/opensearch.py (3)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
  • _invoke (443-514)
  • _invoke (541-622)
  • _invoke (649-738)
  • _invoke (787-897)
holmes/plugins/toolsets/opensearch/opensearch_traces.py (2)
  • _invoke (37-110)
  • _invoke (132-188)
holmes/core/tools.py (1)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/opensearch/opensearch_traces.py (1)
holmes/plugins/toolsets/utils.py (2)
  • get_param_or_raise (137-141)
  • toolset_name_for_one_liner (144-148)
holmes/plugins/toolsets/robusta/robusta.py (2)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
  • _invoke (443-514)
  • _invoke (541-622)
  • _invoke (649-738)
  • _invoke (787-897)
holmes/core/tools.py (3)
  • _invoke (182-189)
  • _invoke (240-270)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/runbook/runbook_fetcher.py (4)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
  • _invoke (443-514)
  • _invoke (541-622)
  • _invoke (649-738)
  • _invoke (787-897)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (4)
  • _invoke (185-229)
  • _invoke (255-283)
  • _invoke (304-322)
  • _invoke (390-548)
holmes/plugins/toolsets/internet/internet.py (1)
  • _invoke (189-217)
holmes/core/tools.py (1)
  • StructuredToolResult (50-74)
🪛 Ruff (0.12.2)
holmes/plugins/toolsets/rabbitmq/toolset_rabbitmq.py

67-67: Unused method argument: params

(ARG002)


67-67: Unused method argument: user_approved

(ARG002)


107-107: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/prometheus/prometheus.py

444-444: Unused method argument: user_approved

(ARG002)


542-542: Unused method argument: user_approved

(ARG002)


650-650: Unused method argument: user_approved

(ARG002)


788-788: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py

186-186: Unused method argument: user_approved

(ARG002)


256-256: Unused method argument: user_approved

(ARG002)


305-305: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/internet/internet.py

190-190: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/internet/notion.py

48-48: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/kafka.py

157-157: Unused method argument: user_approved

(ARG002)


232-232: Unused method argument: user_approved

(ARG002)


290-290: Unused method argument: user_approved

(ARG002)


348-348: Unused method argument: user_approved

(ARG002)


473-473: Unused method argument: user_approved

(ARG002)


563-563: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/logging_utils/logging_api.py

179-179: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/opensearch/opensearch.py

97-97: Unused method argument: user_approved

(ARG002)


128-128: Unused method argument: user_approved

(ARG002)


161-161: Unused method argument: user_approved

(ARG002)


186-186: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/opensearch/opensearch_traces.py

38-38: Unused method argument: user_approved

(ARG002)


133-133: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/robusta/robusta.py

49-49: Unused method argument: user_approved

(ARG002)


119-119: Unused method argument: user_approved

(ARG002)


179-179: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/runbook/runbook_fetcher.py

39-39: Unused method argument: user_approved

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: llm_evals
  • GitHub Check: Pre-commit checks
  • GitHub Check: Pre-commit checks
🔇 Additional comments (2)
holmes/plugins/toolsets/runbook/runbook_fetcher.py (1)

47-58: Confirm realpath containment in get_runbook_by_path
The get_runbook_by_path implementation must resolve the candidate path to its absolute form and verify it remains within one of the allowed search_paths. If it does not already perform a realpath/abspath + containment check before returning the path, add:

 from pathlib import Path

 def get_runbook_by_path(...):
     ...
-    candidate = os.path.join(root, runbook_relative_path)
+    candidate = os.path.join(root, runbook_relative_path)
     # Protect against path traversal
-    if os.path.exists(candidate):
+    abs_candidate = Path(candidate).resolve()
+    if not any(abs_candidate.is_relative_to(Path(p).resolve()) for p in search_paths):
+        return None
+    if abs_candidate.exists():
         return candidate
     ...

Ensure this guard is in place to prevent resolving files outside the intended roots.

holmes/plugins/toolsets/internet/internet.py (1)

95-106: Return (None, None) on error in scrape
Both call sites in notion.py and internet.py check if not content, so switching error returns to (None, None) will correctly trigger the failure path without breaking existing logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (1)

255-283: Return StructuredToolResult on errors (don’t raise) and include invocation URL

Keeps behavior consistent with other tools and surfaces errors to the caller without blowing up the flow; also fixes Ruff ARG002.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(
+        self, params: dict, user_approved: bool = False  # noqa: ARG002
+    ) -> StructuredToolResult:
@@
-        try:
+        try:
             response = requests.get(
                 url,
                 headers=build_headers(api_key=api_key, additional_headers=headers),
                 timeout=60,
             )
             response.raise_for_status()  # Raise an error for non-2xx responses
             data = response.json()
-            return StructuredToolResult(
+            scopes = data.get("scopes", data)
+            return StructuredToolResult(
                 status=ToolResultStatus.SUCCESS,
-                data=yaml.dump(data.get("scopes")),
+                data=yaml.dump(scopes),
                 params=params,
+                invocation=url,
             )
         except requests.exceptions.RequestException as e:
-            raise Exception(f"Failed to retrieve tags: {e} \n for URL: {url}")
+            return StructuredToolResult(
+                status=ToolResultStatus.ERROR,
+                error=f"Failed to retrieve tags: {e}",
+                params=params,
+                invocation=url,
+            )
holmes/plugins/toolsets/datadog/toolset_datadog_general.py (2)

322-334: Prevent crashes from json.dumps on non-JSON-serializable inputs; also surface approval state to silence Ruff ARG002.

json.dumps in the logging path can raise if query_params contains non-serializable types (e.g., datetimes), aborting the tool before issuing the request. Also, user_approved is unused, triggering ARG002. Log the approval flag and use default=str when dumping inputs; also guard the error-path invocation dump similarly.

Apply this diff:

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
@@
         logging.info("DatadogAPIGet Tool Invocation:")
         logging.info(f"  Description: {params.get('description', 'No description')}")
         logging.info(f"  Endpoint: {params.get('endpoint', '')}")
-        logging.info(
-            f"  Query Params: {json.dumps(params.get('query_params', {}), indent=2)}"
-        )
+        logging.info(f"  User approved: {user_approved}")
+        logging.info(
+            f"  Query Params: {json.dumps(params.get('query_params', {}), indent=2, default=str)}"
+        )
         logging.info("=" * 60)
@@
             return StructuredToolResult(
                 status=ToolResultStatus.ERROR,
                 error=error_msg,
                 params=params,
-                invocation=json.dumps({"url": url, "params": query_params})
+                invocation=json.dumps({"url": url, "params": query_params}, default=str)
                 if url
                 else None,
             )

Also applies to: 410-417


460-470: Same json.dumps robustness + approval surfacing for POST path.

Body logging and error-path invocation can crash on non-serializable objects; also silence ARG002 by logging approval.

Apply this diff:

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
@@
         logging.info("DatadogAPIPostSearch Tool Invocation:")
         logging.info(f"  Description: {params.get('description', 'No description')}")
         logging.info(f"  Endpoint: {params.get('endpoint', '')}")
-        logging.info(f"  Body: {json.dumps(params.get('body', {}), indent=2)}")
+        logging.info(f"  User approved: {user_approved}")
+        logging.info(f"  Body: {json.dumps(params.get('body', {}), indent=2, default=str)}")
         logging.info("=" * 60)
@@
             return StructuredToolResult(
                 status=ToolResultStatus.ERROR,
                 error=error_msg,
                 params=params,
-                invocation=json.dumps({"url": url, "body": body}) if url else None,
+                invocation=json.dumps({"url": url, "body": body}, default=str) if url else None,
             )

Also applies to: 546-551

🧹 Nitpick comments (6)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (3)

185-229: Silence unused param and coerce limit to int before API call

Prevents Ruff ARG002 and avoids passing a string to the client.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(
+        self, params: dict, user_approved: bool = False  # noqa: ARG002
+    ) -> StructuredToolResult:
@@
-        traces = query_tempo_traces(
+        limit = int(params.get("limit", 50) or 50)
+        traces = query_tempo_traces(
@@
-            limit=params.get("limit", 50),
+            limit=limit,

304-322: Add noqa for unused param and include trace_id in invocation

Improves traceability and silences Ruff ARG002.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(
+        self, params: dict, user_approved: bool = False  # noqa: ARG002
+    ) -> StructuredToolResult:
@@
-        return StructuredToolResult(
+        return StructuredToolResult(
             status=ToolResultStatus.SUCCESS,
             data=trace_data,
             params=params,
+            invocation=str(params.get("trace_id")),
         )

390-544: Coerce sample_count, include invocation, and silence unused param

Prevents odd inputs, improves result provenance, and fixes Ruff ARG002.

-    def _invoke(
-        self, params: dict, user_approved: bool = False
-    ) -> StructuredToolResult:
+    def _invoke(
+        self, params: dict, user_approved: bool = False  # noqa: ARG002
+    ) -> StructuredToolResult:
@@
-            sample_count = params.get("sample_count", 3)
+            sample_count = params.get("sample_count", 3)
+            sample_count = max(1, int(sample_count))
@@
-            return StructuredToolResult(
+            return StructuredToolResult(
                 status=ToolResultStatus.SUCCESS,
                 data=yaml.dump(result, default_flow_style=False, sort_keys=False),
                 params=params,
+                invocation=stats_query,
             )
holmes/plugins/toolsets/datadog/toolset_datadog_general.py (3)

390-394: Include invocation details on success for auditability.

Returning the invoked URL/params helps with approvals/auditing and reproducing calls.

Apply this diff:

         return StructuredToolResult(
             status=ToolResultStatus.SUCCESS,
             data=response_str,
-            params=params,
+            params=params,
+            invocation=json.dumps({"url": url, "params": query_params}, default=str),
         )

526-531: Add invocation details on success for parity with GET and better traceability.

Helps the CLI approval UX and downstream observability.

Apply this diff:

         return StructuredToolResult(
             status=ToolResultStatus.SUCCESS,
             data=response_str,
-            params=params,
+            params=params,
+            invocation=json.dumps({"url": url, "body": body}, default=str),
         )

584-594: Silence unused user_approved and make invocations consistent.

This tool is read-only, but keeping/logging the approval flag avoids ARG002 and keeps output consistent.

Apply this diff:

 def _invoke(
         self, params: dict, user_approved: bool = False
     ) -> StructuredToolResult:
     """List available API resources."""
     category = params.get("category", "all").lower()

     logging.info("=" * 60)
     logging.info("ListDatadogAPIResources Tool Invocation:")
+    logging.info(f"  User approved: {user_approved}")
     logging.info(f"  Category: {category}")
     logging.info("=" * 60)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f3150ae and 8cadae8.

📒 Files selected for processing (3)
  • holmes/common/env_vars.py (1 hunks)
  • holmes/plugins/toolsets/datadog/toolset_datadog_general.py (3 hunks)
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • holmes/common/env_vars.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
Type hints are required (mypy is configured in pyproject.toml)

Files:

  • holmes/plugins/toolsets/datadog/toolset_datadog_general.py
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must live under holmes/plugins/toolsets as either {name}.yaml or a {name}/ directory

Files:

  • holmes/plugins/toolsets/datadog/toolset_datadog_general.py
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
🧬 Code graph analysis (2)
holmes/plugins/toolsets/datadog/toolset_datadog_general.py (2)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (4)
  • _invoke (185-229)
  • _invoke (255-283)
  • _invoke (304-322)
  • _invoke (390-550)
holmes/core/tools.py (1)
  • StructuredToolResult (50-74)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (2)
holmes/plugins/toolsets/datadog/toolset_datadog_general.py (3)
  • _invoke (322-425)
  • _invoke (460-559)
  • _invoke (584-722)
holmes/core/tools.py (3)
  • _invoke (182-189)
  • _invoke (240-270)
  • StructuredToolResult (50-74)
🪛 Ruff (0.12.2)
holmes/plugins/toolsets/datadog/toolset_datadog_general.py

323-323: Unused method argument: user_approved

(ARG002)


461-461: Unused method argument: user_approved

(ARG002)


585-585: Unused method argument: user_approved

(ARG002)

holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py

186-186: Unused method argument: user_approved

(ARG002)


256-256: Unused method argument: user_approved

(ARG002)


305-305: Unused method argument: user_approved

(ARG002)


391-391: Unused method argument: user_approved

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Pre-commit checks
  • GitHub Check: llm_evals
  • GitHub Check: Pre-commit checks

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2025

Results of HolmesGPT evals

  • ask_holmes: 33/37 test cases were successful, 0 regressions, 2 skipped, 1 setup failures
Test suite Test case Status
ask 01_how_many_pods
ask 02_what_is_wrong_with_pod
ask 04_related_k8s_events ↪️
ask 05_image_version
ask 09_crashpod
ask 10_image_pull_backoff
ask 110_k8s_events_image_pull
ask 11_init_containers
ask 13a_pending_node_selector_basic
ask 14_pending_resources
ask 15_failed_readiness_probe
ask 17_oom_kill
ask 18_crash_looping_v2
ask 19_detect_missing_app_details
ask 20_long_log_file_search
ask 24_misconfigured_pvc
ask 24a_misconfigured_pvc_basic
ask 28_permissions_error 🚧
ask 29_events_from_alert_manager ↪️
ask 39_failed_toolset
ask 41_setup_argo
ask 42_dns_issues_steps_new_tools ⚠️
ask 43_current_datetime_from_prompt
ask 45_fetch_deployment_logs_simple
ask 51_logs_summarize_errors
ask 53_logs_find_term
ask 54_not_truncated_when_getting_pods
ask 59_label_based_counting
ask 60_count_less_than
ask 61_exact_match_counting
ask 63_fetch_error_logs_no_errors
ask 79_configmap_mount_issue
ask 83_secret_not_found
ask 86_configmap_like_but_secret
ask 93_calling_datadog[0]
ask 93_calling_datadog[1]
ask 93_calling_datadog[2]

Legend

  • ✅ the test was successful
  • ↪️ the test was skipped
  • ⚠️ the test failed but is known to be flaky or known to fail
  • 🚧 the test had a setup failure (not a code regression)
  • 🔧 the test failed due to mock data issues (not a code regression)
  • ❌ the test failed and should be fixed before merging the PR

@moshemorad moshemorad removed the request for review from arikalon1 September 2, 2025 11:12
@nherment nherment merged commit bfbf45c into master Sep 2, 2025
11 checks passed
@nherment nherment deleted the rob-1932_bash_tool_cli_approval branch September 2, 2025 11:13
0xLeo258 pushed a commit to shenglei5859/holmesgpt that referenced this pull request Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants