-
Notifications
You must be signed in to change notification settings - Fork 182
ROB-2136 link tool limit to global truncation limit #987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROB-2136 link tool limit to global truncation limit #987
Conversation
WalkthroughAdds a ToolInvokeContext model and threads LLM and per-tool max_token_count through tool invocation paths; introduces token-counting and per-tool max-token helpers; changes Tool.invoke/_invoke signatures to accept a context across many plugins; integrates truncation and updates tests/mocks to construct and pass contexts. (≤50 words) Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant Caller as ToolCallingLLM
participant LLM
participant Limiter as WindowLimiter
participant Executor as ToolExecutor
participant ToolImpl as Tool
User->>Caller: request tool call
Caller->>LLM: get_context_window_size()/get_maximum_output_token()
Caller->>Limiter: get_max_token_count_for_single_tool(llm)
Limiter-->>Caller: max_token_count
Note right of Caller #DFF2E1: build ToolInvokeContext {llm, tool_number, user_approved, max_token_count}
Caller->>Executor: invoke(tool_name, params, context)
Executor->>ToolImpl: invoke(params, context)
ToolImpl->>ToolImpl: _invoke(params, context)
ToolImpl-->>Executor: StructuredToolResult
Executor-->>Caller: StructuredToolResult
Caller->>LLM: count_tool_response_tokens(llm, result)
alt result too large
Caller->>Caller: truncate_logs(result, llm, max_token_count)
end
Caller-->>User: tool result (possibly truncated)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (11)
holmes/plugins/toolsets/datadog/toolset_datadog_traces.py (3)
303-317: Include query/time/index details in NO_DATA responsePer plugin guidelines, specify what was searched and where; also include invocation (url/payload).
- if not formatted_output: - return StructuredToolResult( - status=StructuredToolResultStatus.NO_DATA, - params=params, - data="No matching traces found.", - ) + if not formatted_output: + return StructuredToolResult( + status=StructuredToolResultStatus.NO_DATA, + params=params, + data=( + f"No matching traces found for query='{query}', " + f"time_range_ms=[{from_time_ms},{to_time_ms}], " + f"indexes={self.toolset.dd_config.indexes}" + ), + invocation=json.dumps({"url": url, "payload": payload}), + )
441-447: Include time window and invocation in NO_DATA responseMake clear what was searched (trace_id + time range + indexes) and where.
- if not formatted_output: - return StructuredToolResult( - status=StructuredToolResultStatus.NO_DATA, - params=params, - data=f"No trace found for trace_id: {trace_id}", - ) + if not formatted_output: + return StructuredToolResult( + status=StructuredToolResultStatus.NO_DATA, + params=params, + data=( + f"No trace found for trace_id='{trace_id}' within " + f"time_range_ms=[{from_time_ms},{to_time_ms}] " + f"on indexes={self.toolset.dd_config.indexes}" + ), + invocation=json.dumps({"url": url, "payload": payload}), + )
648-656: Include query/time/index details in NO_DATA responseReturn exactly what was searched and where; add invocation details too.
- if not formatted_output: - return StructuredToolResult( - status=StructuredToolResultStatus.NO_DATA, - params=params, - data="No matching spans found.", - ) + if not formatted_output: + return StructuredToolResult( + status=StructuredToolResultStatus.NO_DATA, + params=params, + data=( + f"No matching spans found for query='{query}', " + f"time_range_ms=[{from_time_ms},{to_time_ms}], " + f"indexes={self.toolset.dd_config.indexes}" + ), + invocation=json.dumps({"url": url, "payload": payload}), + )tests/llm/utils/test_mock_toolset.py (3)
249-255: Fix assertion to expect context forwarding to the underlying tool.wrapper.invoke({}, context) should forward the same context to tool.invoke; update the assertion accordingly.
Apply this diff:
- tool.invoke.assert_called_once_with({}) + tool.invoke.assert_called_once_with({}, context)
305-307: Pass context to wrapper.invoke to match new signature.This callsite still omits context.
Apply this diff:
- with pytest.raises(MockDataNotFoundError) as exc_info: - wrapper.invoke({}) + with pytest.raises(MockDataNotFoundError) as exc_info: + wrapper.invoke({}, create_mock_tool_invoke_context())
475-477: Update ToolExecutor.invoke callsites to pass a ToolInvokeContext.ToolExecutor.invoke now requires a context; these callsites will fail without it.
Apply these diffs:
- result = tool_executor.invoke("kubectl_describe", params) + context = create_mock_tool_invoke_context() + result = tool_executor.invoke("kubectl_describe", params, context)- result = tool_executor.invoke("kubectl_describe", params) + context = create_mock_tool_invoke_context() + result = tool_executor.invoke("kubectl_describe", params, context)- with pytest.raises(MockDataNotFoundError): - tool_executor.invoke("kubectl_describe", params) + with pytest.raises(MockDataNotFoundError): + tool_executor.invoke("kubectl_describe", params, create_mock_tool_invoke_context())- result = tool_executor.invoke( - "kubectl_describe", {"foo": "bar"} - ) + context = create_mock_tool_invoke_context() + result = tool_executor.invoke( + "kubectl_describe", {"foo": "bar"}, context + )Also applies to: 531-533, 592-595, 637-639
holmes/plugins/toolsets/grafana/toolset_grafana.py (1)
47-56: Add an HTTP timeout to prevent request-thread hangs.External calls without timeouts can hang and cause cascading issues.
Apply this diff:
- try: - response = requests.get(url, headers=headers) + try: + # Consume context to satisfy linters and for future use + _ = context + timeout = getattr(self._toolset._grafana_config, "request_timeout", 30) + response = requests.get(url, headers=headers, timeout=timeout)holmes/plugins/toolsets/internet/internet.py (1)
190-216: Return ERROR when scrape reports failure; add invocation (uses context).Currently, scrape returns an error string which is treated as SUCCESS data. Fix by detecting and surfacing it as an error; also add invocation to resolve Ruff ARG002.
@@ - content, mime_type = scrape(url, additional_headers) + content, mime_type = scrape(url, additional_headers) + + # Detect and surface upstream fetch errors returned by scrape() + if content and mime_type is None and not looks_like_html(content) and content.lower().startswith("failed to load"): + return StructuredToolResult( + status=StructuredToolResultStatus.ERROR, + error=content, + params=params, + invocation=f"{self.get_parameterized_one_liner(params)} (tool_no={context.tool_number})", + ) @@ - if not content: + if not content: logging.error(f"Failed to retrieve content from {url}") return StructuredToolResult( status=StructuredToolResultStatus.ERROR, error=f"Failed to retrieve content from {url}", - params=params, + params=params, + invocation=f"{self.get_parameterized_one_liner(params)} (tool_no={context.tool_number})", ) @@ - return StructuredToolResult( - status=StructuredToolResultStatus.SUCCESS, - data=content, - params=params, - ) + return StructuredToolResult( + status=StructuredToolResultStatus.SUCCESS, + data=content, + params=params, + invocation=f"{self.get_parameterized_one_liner(params)} (tool_no={context.tool_number})", + )Optional: consider truncating content based on context.max_token_count if available.
holmes/plugins/toolsets/opensearch/opensearch.py (1)
60-74: Wrong client selection logic; may return the first client even when host belongs to a different client.
found = any(host in client.hosts for client in clients)checks across all clients, not the current one, causing incorrect matches. Return the client whose hosts contain the requested host.Apply:
def get_client(clients: List[OpenSearchClient], host: Optional[str]): if len(clients) == 1: return clients[0] if not host: raise Exception("Missing host to resolve opensearch client") for client in clients: - found = any(host in client.hosts for client in clients) + found = host in client.hosts if found: return clienttests/llm/utils/mock_toolset.py (1)
442-470: Fix missingcontextwhen calling live/mock during GENERATE mode.Both calls currently omit
contextand will raise TypeError.if self._mode == MockMode.GENERATE: try: - return self._call_mock_invoke(params) + return self._call_mock_invoke(params, context) except MockDataNotFoundError as e: logging.debug(str(e)) - result = self._call_live_invoke(params) + result = self._call_live_invoke(params, context)holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (1)
199-346: Moveimport loggingto top-level to comply with project import policy.Inline import at Line 241 violates the “imports at top of file” rule. Remove the inline import and add a top-level import.
- # Debug log the query (useful for troubleshooting) - import logging - + # Debug log the query (useful for troubleshooting) logger = logging.getLogger(__name__)Also add at file top:
+import logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
tests/llm/utils/test_mock_toolset.py (4)
249-255: Fix assertion to match new Tool.invoke(params, context) signature
MockableToolWrappershould calltool.invoke(params, context). Update the assertion.Apply this diff:
- tool.invoke.assert_called_once_with({}) + tool.invoke.assert_called_once_with({}, context)
328-334: Fix assertion to match new Tool.invoke(params, context) signatureIn generate mode, the real tool is called with
context. Update the assertion.Apply this diff:
- tool.invoke.assert_called_once_with({}) + tool.invoke.assert_called_once_with({}, context)
592-599: Update ToolExecutor.invoke call to include contextThis call uses the old signature and will raise a TypeError. Create a context and pass it.
Apply this diff:
- # In mock mode, calling with non-matching params should raise MockDataNotFoundError - with pytest.raises(MockDataNotFoundError): - tool_executor.invoke("kubectl_describe", params) + # In mock mode, calling with non-matching params should raise MockDataNotFoundError + context = create_mock_tool_invoke_context() + with pytest.raises(MockDataNotFoundError): + tool_executor.invoke("kubectl_describe", params, context)
624-644: Remove duplicate local import; rely on top-level importConsolidate imports at module scope; drop this local import.
Apply this diff:
- from holmes.core.tools_utils.tool_executor import ToolExecutorThe subsequent context creation and invocation are already correct.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/llm/utils/test_mock_toolset.py(8 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)
Files:
tests/llm/utils/test_mock_toolset.py
tests/llm/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Always run LLM tests with RUN_LIVE=true to ensure results match real-world behavior
Files:
tests/llm/utils/test_mock_toolset.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
Files:
tests/llm/utils/test_mock_toolset.py
tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Test layout should mirror the source structure under tests/
Files:
tests/llm/utils/test_mock_toolset.py
🧬 Code graph analysis (1)
tests/llm/utils/test_mock_toolset.py (3)
tests/conftest.py (1)
create_mock_tool_invoke_context(107-133)holmes/core/tools_utils/tool_executor.py (1)
invoke(52-62)holmes/core/tools.py (1)
invoke(235-261)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (3.10)
- GitHub Check: build (3.12)
- GitHub Check: build (3.11)
- GitHub Check: llm_evals
🔇 Additional comments (6)
tests/llm/utils/test_mock_toolset.py (6)
14-14: Context import LGTMImporting the factory aligns tests with the new context-aware API.
283-289: LGTM: context passed through wrapper in mock modeInvocation correctly supplies
context; assertion of no real call is valid.
292-311: LGTM: error path validates context flowContext is created and passed; exception assertions look correct.
478-480: LGTM: ToolExecutor.invoke now correctly receives contextThe invocation is aligned with the new
invoke(tool_name, params, context)API.
535-537: LGTM: matches new ToolExecutor signatureContext creation and passing are correct.
641-644: LGTM: generate mode invocation passes contextAligned with updated executor API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/llm/utils/test_mock_toolset.py (2)
249-254: Fix assertion: include context in call to tool.invokeNow that wrapper invokes tool.invoke(params, context), the assertion must include the context.
Apply this diff:
- tool.invoke.assert_called_once_with({}) + tool.invoke.assert_called_once_with({}, context)
328-334: Fix assertion: include context in call to tool.invoke (generate mode)The real tool is called with (params, context); update the assertion accordingly.
Apply this diff:
- tool.invoke.assert_called_once_with({}) + tool.invoke.assert_called_once_with({}, context)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/llm/utils/test_mock_toolset.py(9 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)
Files:
tests/llm/utils/test_mock_toolset.py
tests/llm/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Always run LLM tests with RUN_LIVE=true to ensure results match real-world behavior
Files:
tests/llm/utils/test_mock_toolset.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
Files:
tests/llm/utils/test_mock_toolset.py
tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Test layout should mirror the source structure under tests/
Files:
tests/llm/utils/test_mock_toolset.py
🧬 Code graph analysis (1)
tests/llm/utils/test_mock_toolset.py (4)
tests/conftest.py (1)
create_mock_tool_invoke_context(107-133)holmes/core/tools_utils/tool_executor.py (1)
invoke(52-62)holmes/core/tools.py (1)
invoke(235-261)tests/llm/utils/mock_toolset.py (1)
MockDataNotFoundError(45-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
🔇 Additional comments (10)
tests/llm/utils/test_mock_toolset.py (10)
14-14: LGTM: importing the context factoryImporting create_mock_tool_invoke_context at module scope aligns with the new context-aware API.
283-289: LGTM: context passed through in mock modePassing context to wrapper.invoke is correct; assert_not_called remains valid.
292-309: LGTM: context passed to wrapper.invoke in error pathGood propagation of context for the error case.
8-16: Move ToolExecutor import to module top (and deduplicate local imports)Per guidelines, keep imports at module scope. Add ToolExecutor here and remove local imports below.
Apply this diff to add the import at the top:
from holmes.core.tools import ( Tool, StructuredToolResult, StructuredToolResultStatus, ToolsetStatusEnum, ) +from holmes.core.tools_utils.tool_executor import ToolExecutor from tests.conftest import create_mock_tool_invoke_context
474-476: Remove local import; rely on top-level importKeep imports at module scope; this local import should be deleted.
Apply this diff:
- from holmes.core.tools_utils.tool_executor import ToolExecutor
478-480: LGTM: context threaded into ToolExecutor.invokeInvocation now correctly passes (tool_name, params, context).
532-536: Remove duplicate local import; keep context usageUse the top-level ToolExecutor import and keep the context‑aware invocation.
Apply this diff:
- from holmes.core.tools_utils.tool_executor import ToolExecutor @@ - tool_executor = ToolExecutor(mock_toolsets.toolsets) - context = create_mock_tool_invoke_context() - result = tool_executor.invoke("kubectl_describe", params, context) + tool_executor = ToolExecutor(mock_toolsets.toolsets) + context = create_mock_tool_invoke_context() + result = tool_executor.invoke("kubectl_describe", params, context)
592-600: Remove duplicate local import; rely on module‑level importDelete the local import; invocation with context is correct.
Apply this diff:
- from holmes.core.tools_utils.tool_executor import ToolExecutor @@ - context = create_mock_tool_invoke_context() - with pytest.raises(MockDataNotFoundError): - tool_executor.invoke("kubectl_describe", params, context) + context = create_mock_tool_invoke_context() + with pytest.raises(MockDataNotFoundError): + tool_executor.invoke("kubectl_describe", params, context)
625-627: Remove duplicate local import; rely on module‑level importSame refactor here.
Apply this diff:
- from holmes.core.tools_utils.tool_executor import ToolExecutor
642-645: LGTM: context threaded in generate mode pathCorrectly passing context to ToolExecutor.invoke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/llm/utils/mock_toolset.py (1)
416-416: Fix Ruff ARG002: unused parametercontextin mock path.Rename the parameter to
_contextand add a return type for consistency.- def _call_mock_invoke(self, params: Dict, context: ToolInvokeContext): + def _call_mock_invoke(self, params: Dict, _context: ToolInvokeContext) -> StructuredToolResult:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/llm/utils/mock_toolset.py(4 hunks)tests/llm/utils/test_mock_toolset.py(9 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)
Files:
tests/llm/utils/mock_toolset.pytests/llm/utils/test_mock_toolset.py
tests/llm/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Always run LLM tests with RUN_LIVE=true to ensure results match real-world behavior
Files:
tests/llm/utils/mock_toolset.pytests/llm/utils/test_mock_toolset.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
Files:
tests/llm/utils/mock_toolset.pytests/llm/utils/test_mock_toolset.py
tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Test layout should mirror the source structure under tests/
Files:
tests/llm/utils/mock_toolset.pytests/llm/utils/test_mock_toolset.py
🧬 Code graph analysis (2)
tests/llm/utils/mock_toolset.py (1)
holmes/core/tools.py (5)
ToolInvokeContext(163-169)StructuredToolResult(79-103)invoke(235-261)_invoke(351-360)_invoke(413-445)
tests/llm/utils/test_mock_toolset.py (3)
tests/conftest.py (1)
create_mock_tool_invoke_context(107-133)holmes/core/tools_utils/tool_executor.py (1)
invoke(52-62)holmes/core/tools.py (1)
invoke(235-261)
🪛 Ruff (0.13.1)
tests/llm/utils/mock_toolset.py
416-416: Unused method argument: context
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
🔇 Additional comments (16)
tests/llm/utils/mock_toolset.py (3)
17-17: Good: threading ToolInvokeContext into the wrapper.Import aligns with the updated Tool API.
409-415: Good: live path now forwards context to the real tool.This keeps tests consistent with the new signature.
442-471: Good: _invoke consistently forwards context across all modes.Behavior in GENERATE mode (prefer existing mocks, generate only when missing) matches the documented guidance elsewhere.
tests/llm/utils/test_mock_toolset.py (13)
14-14: Good: centralized helper for creating ToolInvokeContext.Keeps tests uniform with the new interface.
249-254: Good: live mode test now asserts context propagation.Assertion matches the new call signature.
283-285: Good: mock mode uses context-aware wrapper.invoke.
292-292: Context prepared up-front.Looks fine.
306-308: Good: error path uses context-aware invoke.
328-333: Good: generate mode passes context end-to-end and asserts real tool call.
474-475: Move local import to module top (duplicate of prior review).Keep imports at module scope per guidelines; remove this local import and add a single top-level import.
- from holmes.core.tools_utils.tool_executor import ToolExecutorAdd at the top of the file with other imports:
+from holmes.core.tools_utils.tool_executor import ToolExecutor
478-480: Good: ToolExecutor.invoke now receives and forwards context.
532-533: Remove duplicate local import; rely on the top-level import (duplicate of prior review).- from holmes.core.tools_utils.tool_executor import ToolExecutor
535-537: Good: context forwarded to executor invocation (params-any).
596-600: Good: negative path asserts with context-aware executor invocation.
625-625: Remove this local import as well; use the module-level import.Consistent with earlier refactor.
- from holmes.core.tools_utils.tool_executor import ToolExecutor
642-645: Good: generate-mode flow uses context with executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
holmes/plugins/toolsets/logging_utils/logging_api.py (2)
81-87: Minor: clarify the chars-per-token comment (optional).
The note reads as if picking a lower value reduces the risk of over-truncation; it's the opposite—using a lower chars-per-token will keep fewer characters and may over-truncate. Consider rewording to avoid confusion.
250-255: Confirm truncation conditions (avoid chopping error payloads and zero/negative limits).
- Consider skipping truncation when result.error is set to preserve full error detail, per tools guidelines.
- Also guard against zero/negative token limits to avoid unnecessary looping.
If desired, wrap the call:
if context.max_token_count > 0: truncate_logs(result, context.llm, context.max_token_count)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
holmes/plugins/toolsets/logging_utils/logging_api.py(4 hunks)tests/plugins/toolsets/test_logging_api.py(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/plugins/toolsets/test_logging_api.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)
Files:
holmes/plugins/toolsets/logging_utils/logging_api.py
holmes/plugins/toolsets/**
📄 CodeRabbit inference engine (CLAUDE.md)
Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories
Files:
holmes/plugins/toolsets/logging_utils/logging_api.py
holmes/{core,plugins}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where
Files:
holmes/plugins/toolsets/logging_utils/logging_api.py
🧬 Code graph analysis (1)
holmes/plugins/toolsets/logging_utils/logging_api.py (3)
holmes/core/llm.py (1)
LLM(38-66)holmes/core/tools.py (4)
StructuredToolResult(79-103)Tool(172-364)ToolInvokeContext(163-169)ToolParameter(155-160)holmes/core/tools_utils/token_counting.py (1)
count_tool_response_tokens(6-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
🔇 Additional comments (2)
holmes/plugins/toolsets/logging_utils/logging_api.py (2)
4-4: LGTM: imports added are necessary for the new truncation flow.Also applies to: 10-10, 14-14, 18-18
235-235: LGTM: _invoke signature updated to accept ToolInvokeContext.
This aligns with the new tool execution flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
holmes/plugins/toolsets/logging_utils/logging_api.py (2)
81-89: Use or remove the unused char↔token ratio constant.APPROXIMATE_CHAR_TO_TOKEN_RATIO is defined but not used. Either leverage it in truncation math (see next comment) or drop it to avoid confusion.
233-253: Guard truncation when the tool context limit is non-positive.Small safety net: skip truncation if context.max_token_count <= 0.
Apply this diff:
- truncate_logs( - logging_structured_tool_result=result, - llm=context.llm, - token_limit=context.max_token_count, - ) + if context.max_token_count > 0: + truncate_logs( + logging_structured_tool_result=result, + llm=context.llm, + token_limit=context.max_token_count, + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
holmes/plugins/toolsets/logging_utils/logging_api.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)
Files:
holmes/plugins/toolsets/logging_utils/logging_api.py
holmes/plugins/toolsets/**
📄 CodeRabbit inference engine (CLAUDE.md)
Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories
Files:
holmes/plugins/toolsets/logging_utils/logging_api.py
holmes/{core,plugins}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where
Files:
holmes/plugins/toolsets/logging_utils/logging_api.py
🧬 Code graph analysis (1)
holmes/plugins/toolsets/logging_utils/logging_api.py (3)
holmes/core/llm.py (1)
LLM(38-66)holmes/core/tools.py (4)
StructuredToolResult(79-103)Tool(172-364)ToolInvokeContext(163-169)get_stringified_data(90-103)holmes/core/tools_utils/token_counting.py (1)
count_tool_response_tokens(6-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
🔇 Additional comments (2)
holmes/plugins/toolsets/logging_utils/logging_api.py (2)
4-4: LLM/context threading and imports look correct.Top-level imports comply with style; LLM/context are correctly threaded for token counting/truncation.
Also applies to: 10-10, 14-14, 18-18
91-130: Avoid duplicating the truncation prefix across iterations; add return type and trim by overflow tokens.
- Prefix is prepended on every loop iteration, producing repeated banners and extra tokens.
- Missing return type annotation.
- Prefer trimming ≈ overflow tokens worth of characters to converge faster and use the defined ratio constant.
Apply this diff:
-def truncate_logs( - logging_structured_tool_result: StructuredToolResult, llm: LLM, token_limit: int -): - token_count = count_tool_response_tokens( - llm=llm, structured_tool_result=logging_structured_tool_result - ) - text = None - while token_count > token_limit: - # Loop because we are counting tokens but trimming characters. This means we try to trim a number of - # characters proportional to the number of tokens but we may still have too many tokens - if not text: - text = logging_structured_tool_result.get_stringified_data() - if not text: - # Weird scenario where the result exceeds the token allowance but there is not data. - # Exit and do nothing because I don't know how to handle such scenario. - logging.warning( - f"The calculated token count for logs is {token_count} but the limit is {token_limit}. However the data field is empty so there are no logs to truncate." - ) - return - ratio = token_count / token_limit - character_count = len(text) - number_of_characters_to_truncate = character_count - ceil( - character_count / ratio - ) - number_of_characters_to_truncate = max( - MIN_NUMBER_OF_CHARACTERS_TO_TRUNCATE, number_of_characters_to_truncate - ) - - if len(text) <= number_of_characters_to_truncate: - logging.warning( - f"The calculated token count for logs is {token_count} (max allowed tokens={token_limit}) but the logs are only {len(text)} characters which is below the intended truncation of {number_of_characters_to_truncate} characters. Logs will no longer be truncated" - ) - return - else: - text = TRUNCATION_PROMPT_PREFIX + text[number_of_characters_to_truncate:] - logging_structured_tool_result.data = text - token_count = count_tool_response_tokens( - llm=llm, structured_tool_result=logging_structured_tool_result - ) +def truncate_logs( + logging_structured_tool_result: StructuredToolResult, llm: LLM, token_limit: int +) -> None: + token_count = count_tool_response_tokens( + llm=llm, structured_tool_result=logging_structured_tool_result + ) + original_text = logging_structured_tool_result.get_stringified_data() + if not original_text: + if token_count > token_limit: + logging.warning( + f"The calculated token count for logs is {token_count} but the limit is {token_limit}. However the data field is empty so there are no logs to truncate." + ) + return + + start_idx = 0 # cumulative number of original characters dropped (from the front) + while token_count > token_limit: + overflow = token_count - token_limit + # Drop enough characters to reduce approximately 'overflow' tokens. + number_of_characters_to_truncate = max( + MIN_NUMBER_OF_CHARACTERS_TO_TRUNCATE, + ceil(overflow * APPROXIMATE_CHAR_TO_TOKEN_RATIO), + ) + + remaining_len = len(original_text) - start_idx + if remaining_len <= number_of_characters_to_truncate: + logging.warning( + f"The calculated token count for logs is {token_count} (max allowed tokens={token_limit}) but the logs are only {remaining_len} characters which is below the intended truncation of {number_of_characters_to_truncate} characters. Logs will no longer be truncated" + ) + return + + start_idx += number_of_characters_to_truncate + truncated = TRUNCATION_PROMPT_PREFIX + original_text[start_idx:] + logging_structured_tool_result.data = truncated + token_count = count_tool_response_tokens( + llm=llm, structured_tool_result=logging_structured_tool_result + )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, left some comments.
In accordance with the ticket, can you also please replace DEFAULT_QUERY_RESPONSE_SIZE_LIMIT (used in Prometheus) with a percentage of tokens, instead of a hardcoded character count.
Given the vast number of metric queries holmes tends to run in a single investigation, I'd suggest using 2% as the actual value. (The current value is 5000 tokens which is 2.5% of a 200K context window. But in practice our 200K window is exposed to the LLM as a smaller window, so I'm reducing to 2%. The rationale is that the current threshold works OK so we can preserve it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
holmes/plugins/toolsets/logging_utils/logging_api.py (1)
90-92: Add return type annotation.The function signature is missing a return type annotation. Per coding guidelines, type hints are required for mypy compatibility.
Apply this diff:
def truncate_logs( logging_structured_tool_result: StructuredToolResult, llm: LLM, token_limit: int -): +) -> None:As per coding guidelines.
🧹 Nitpick comments (6)
holmes/core/tools_utils/tool_context_window_limiter.py (3)
9-19: Consider removing redundantint()wrapper.The floor division operator
//already returns an integer, making theint()wrapper on line 16 redundant.Apply this diff:
- return int(context_window_size * TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT // 100) + return context_window_size * TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT // 100
21-21: Add return type hint for completeness.The function lacks an explicit return type hint. Add
-> Noneto satisfy the type-checking requirement per coding guidelines.Apply this diff:
-def prevent_overly_big_tool_response(tool_call_result: ToolCallResult, llm: LLM): +def prevent_overly_big_tool_response(tool_call_result: ToolCallResult, llm: LLM) -> None:As per coding guidelines
43-43: Consider smarter truncation for error messages.The fixed 100-character truncation may cut off mid-word or mid-character (especially with multi-byte UTF-8). Consider using a word-boundary-aware truncation or adding an ellipsis indicator.
Example with ellipsis:
- truncated_error = original_error[:100] + truncated_error = original_error[:100] + ("..." if len(original_error) > 100 else "")holmes/plugins/toolsets/logging_utils/logging_api.py (1)
29-36: Consider grouping truncation constants with other module constants.The truncation constants are well-placed after imports, but a past review suggested placing constants at the top of the file. Consider moving these constants up to group them with
DEFAULT_LOG_LIMIT,SECONDS_PER_DAY, etc. (lines 21-28) for better organization.Based on past review comments.
tests/plugins/toolsets/test_logging_api.py (2)
199-202: Consider using MockLLM for more reliable testing.The test currently requires a real LLM via the
MODELenvironment variable, which makes it dependent on external services and potentially slow or flaky. Consider usingMockLLM()from conftest (which is used bycreate_mock_tool_invoke_context) for faster, more reliable unit tests. However, if accurate token counting is critical for this test, the current approach with a skip guard is acceptable.Example using MockLLM:
def test_truncate_logs_nominal_scenario(self): """Test that truncate_logs correctly truncates logs when they exceed the token limit.""" from tests.conftest import MockLLM llm = MockLLM() # ... rest of test
232-239: Consider using ≤ for token limit check.Line 235 uses
<to check the token count is under the limit. Consider using<=instead, as being exactly at the token limit is acceptable:- assert truncated_token_count < token_limit + assert truncated_token_count <= token_limitThe current assertion is slightly stricter but functionally correct.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
holmes/core/tools_utils/tool_context_window_limiter.py(1 hunks)holmes/core/tools_utils/tool_executor.py(2 hunks)holmes/plugins/toolsets/logging_utils/logging_api.py(5 hunks)tests/plugins/toolsets/test_logging_api.py(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- holmes/core/tools_utils/tool_executor.py
🧰 Additional context used
📓 Path-based instructions (3)
{holmes,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{holmes,tests}/**/*.py: Use Ruff for formatting and linting for all Python code
Require type hints (code must be type-checkable with mypy)
ALWAYS place Python imports at the top of the file; do not place imports inside functions or methods
Files:
holmes/core/tools_utils/tool_context_window_limiter.pytests/plugins/toolsets/test_logging_api.pyholmes/plugins/toolsets/logging_utils/logging_api.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
Files:
tests/plugins/toolsets/test_logging_api.py
tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Test files should mirror the source structure under tests/
Files:
tests/plugins/toolsets/test_logging_api.py
🧬 Code graph analysis (3)
holmes/core/tools_utils/tool_context_window_limiter.py (5)
holmes/core/llm.py (1)
LLM(38-66)holmes/core/tools.py (1)
StructuredToolResultStatus(52-76)holmes/core/models.py (2)
ToolCallResult(23-60)as_tool_call_message(30-36)tests/conftest.py (2)
get_context_window_size(74-75)count_tokens_for_message(80-83)holmes/utils/sentry_helper.py (1)
capture_toolcall_contains_too_many_tokens(22-34)
tests/plugins/toolsets/test_logging_api.py (5)
holmes/core/tools_utils/token_counting.py (1)
count_tool_response_tokens(6-13)holmes/plugins/toolsets/logging_utils/logging_api.py (2)
truncate_logs(90-145)_invoke(249-270)holmes/core/tools.py (4)
StructuredToolResult(79-103)StructuredToolResultStatus(52-76)_invoke(351-360)_invoke(413-445)holmes/core/llm.py (1)
DefaultLLM(69-371)tests/conftest.py (1)
create_mock_tool_invoke_context(107-133)
holmes/plugins/toolsets/logging_utils/logging_api.py (3)
holmes/core/llm.py (1)
LLM(38-66)holmes/core/tools.py (5)
StructuredToolResult(79-103)Tool(172-364)ToolInvokeContext(163-169)ToolParameter(155-160)get_stringified_data(90-103)holmes/core/tools_utils/token_counting.py (1)
count_tool_response_tokens(6-13)
🔇 Additional comments (8)
holmes/core/tools_utils/tool_context_window_limiter.py (2)
1-6: LGTM: Imports are correctly placed at the top.All imports are properly positioned at the top of the file and appear necessary for the functions below.
34-48: Review NO_DATA handling logic for oversized responses.When a NO_DATA response is too large, the code sets
error_message = None(line 35) and then converts the status to ERROR witherror = None(lines 46-48). This creates an ERROR status with no error message, which may be confusing.Consider whether NO_DATA should remain NO_DATA (since no data is still the accurate state) or if a clearer error message should be provided explaining that the "no data" response itself was too large.
Can you clarify the intended behavior when a NO_DATA response exceeds the token limit? Should it:
- Remain as NO_DATA status with data=None (current state already satisfies this)?
- Become ERROR with a descriptive message?
- Become ERROR with error=None (current behavior)?
holmes/plugins/toolsets/logging_utils/logging_api.py (4)
4-4: LGTM: Imports follow coding guidelines.All new imports are properly placed at the top of the file as required.
Also applies to: 10-10, 14-14, 18-18
96-146: Truncation logic looks good.The implementation correctly addresses most concerns from past reviews:
- Uses
get_stringified_data()to avoid "None" stringification- Keeps most recent logs by slicing the tail of the text
- Handles line boundaries to avoid cutting log lines mid-line
- Includes clear comments explaining the iterative truncation approach
The truncation math differs from the past review suggestion but is reasonable and will converge via the while loop.
Based on past review comments.
249-249: LGTM: Signature updated to accept ToolInvokeContext.The signature change correctly threads the invocation context through the tool, aligning with the PR objectives.
264-268: Consider skipping truncation for non-success results.The truncation is applied regardless of the result status. Should error results also be truncated? If errors are typically short and important for debugging, consider applying truncation only when
result.status == StructuredToolResultStatus.SUCCESS.# Example conditional truncation if result.status == StructuredToolResultStatus.SUCCESS: truncate_logs( logging_structured_tool_result=result, llm=context.llm, token_limit=context.max_token_count, )tests/plugins/toolsets/test_logging_api.py (2)
3-3: LGTM: Test imports properly added.All new imports follow coding guidelines and are correctly placed at the top of the file.
Also applies to: 6-7, 8-8, 10-10, 15-15, 18-19
44-44: LGTM: Test invocations updated correctly.All test calls to
_invokehave been properly updated to pass theToolInvokeContextparameter using the helper function from conftest.Also applies to: 83-83, 115-115, 140-140, 175-175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
holmes/plugins/toolsets/newrelic/newrelic.py (1)
78-108: Implement token-aware truncation in NewRelicTool._invokeThe
contextparameter is unused—afterapi.execute_nrql_query(query), you must enforcecontext.max_token_countvia token-aware truncation usingcontext.llm. For example, callcount_tool_response_tokens(llm=context.llm, structured_tool_result=result)and trimresult.datain a loop (mirroring the logic intruncate_logsin holmes/plugins/toolsets/logging_utils/logging_api.py:90–140) until the token count ≤context.max_token_count. Extract this into a shared helper in holmes/core/tools_utils if needed.
♻️ Duplicate comments (1)
tests/llm/utils/test_mock_toolset.py (1)
480-480: Move local ToolExecutor imports to module top.The local imports of
ToolExecutorinside test functions violate the coding guideline requiring all Python imports at the top of the file. This issue was flagged in previous reviews but remains unresolved.As per coding guidelines, add to the top of the file with other imports:
+from holmes.core.tools_utils.tool_executor import ToolExecutorThen remove the local import statements at lines 480, 538, 598, and 645:
- from holmes.core.tools_utils.tool_executor import ToolExecutorAlso applies to: 538-538, 598-598, 645-645
🧹 Nitpick comments (1)
tests/llm/utils/mock_toolset.py (1)
440-492: Consider documenting the intentionally unusedcontextparameter.The
contextparameter is accepted for signature consistency with_call_live_invokebut is not used in mock mode (since mocks are pre-recorded data). While this is reasonable for maintaining a uniform interface, consider adding a brief docstring or suppressing the Ruff warning to clarify the intent.Apply this diff to suppress the Ruff warning:
- def _call_mock_invoke(self, params: Dict, context: ToolInvokeContext): + def _call_mock_invoke( + self, params: Dict, context: ToolInvokeContext # noqa: ARG002 + ):Or add a docstring:
def _call_mock_invoke(self, params: Dict, context: ToolInvokeContext): + """Read mock data from disk. Context is unused in mock mode but accepted for interface consistency."""
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
holmes/core/models.py(2 hunks)holmes/plugins/toolsets/atlas_mongodb/mongodb_atlas.py(7 hunks)holmes/plugins/toolsets/investigator/core_investigation.py(2 hunks)holmes/plugins/toolsets/newrelic/newrelic.py(2 hunks)holmes/plugins/toolsets/runbook/runbook_fetcher.py(2 hunks)holmes/plugins/toolsets/servicenow/servicenow.py(4 hunks)tests/llm/utils/mock_toolset.py(4 hunks)tests/llm/utils/test_mock_toolset.py(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- holmes/core/models.py
🧰 Additional context used
📓 Path-based instructions (3)
{holmes,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{holmes,tests}/**/*.py: Use Ruff for formatting and linting for all Python code
Require type hints (code must be type-checkable with mypy)
ALWAYS place Python imports at the top of the file; do not place imports inside functions or methods
Files:
holmes/plugins/toolsets/atlas_mongodb/mongodb_atlas.pyholmes/plugins/toolsets/servicenow/servicenow.pyholmes/plugins/toolsets/newrelic/newrelic.pyholmes/plugins/toolsets/runbook/runbook_fetcher.pytests/llm/utils/test_mock_toolset.pytests/llm/utils/mock_toolset.pyholmes/plugins/toolsets/investigator/core_investigation.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
Files:
tests/llm/utils/test_mock_toolset.pytests/llm/utils/mock_toolset.py
tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Test files should mirror the source structure under tests/
Files:
tests/llm/utils/test_mock_toolset.pytests/llm/utils/mock_toolset.py
🧬 Code graph analysis (7)
holmes/plugins/toolsets/atlas_mongodb/mongodb_atlas.py (2)
holmes/core/tools.py (2)
ToolInvokeContext(163-169)StructuredToolResult(79-103)holmes/plugins/toolsets/runbook/runbook_fetcher.py (1)
_invoke(69-190)
holmes/plugins/toolsets/servicenow/servicenow.py (1)
holmes/core/tools.py (2)
ToolInvokeContext(163-169)StructuredToolResult(79-103)
holmes/plugins/toolsets/newrelic/newrelic.py (2)
holmes/core/tools.py (2)
ToolInvokeContext(163-169)StructuredToolResult(79-103)holmes/plugins/toolsets/logging_utils/logging_api.py (1)
_invoke(249-270)
holmes/plugins/toolsets/runbook/runbook_fetcher.py (1)
holmes/core/tools.py (2)
ToolInvokeContext(163-169)StructuredToolResult(79-103)
tests/llm/utils/test_mock_toolset.py (3)
tests/conftest.py (1)
create_mock_tool_invoke_context(107-133)holmes/core/tools_utils/tool_executor.py (1)
invoke(50-63)holmes/core/tools.py (1)
invoke(235-261)
tests/llm/utils/mock_toolset.py (1)
holmes/core/tools.py (5)
ToolInvokeContext(163-169)StructuredToolResult(79-103)invoke(235-261)_invoke(351-360)_invoke(413-445)
holmes/plugins/toolsets/investigator/core_investigation.py (2)
holmes/core/tools.py (4)
ToolInvokeContext(163-169)_invoke(351-360)_invoke(413-445)StructuredToolResult(79-103)holmes/plugins/toolsets/atlas_mongodb/mongodb_atlas.py (6)
_invoke(122-135)_invoke(147-160)_invoke(180-194)_invoke(207-244)_invoke(264-298)_invoke(318-337)
🪛 Ruff (0.13.1)
holmes/plugins/toolsets/atlas_mongodb/mongodb_atlas.py
122-122: Unused method argument: context
(ARG002)
147-147: Unused method argument: context
(ARG002)
180-180: Unused method argument: context
(ARG002)
207-207: Unused method argument: context
(ARG002)
264-264: Unused method argument: context
(ARG002)
318-318: Unused method argument: context
(ARG002)
holmes/plugins/toolsets/servicenow/servicenow.py
119-119: Unused method argument: context
(ARG002)
162-162: Unused method argument: context
(ARG002)
194-194: Unused method argument: context
(ARG002)
holmes/plugins/toolsets/newrelic/newrelic.py
78-78: Unused method argument: context
(ARG002)
holmes/plugins/toolsets/runbook/runbook_fetcher.py
69-69: Unused method argument: context
(ARG002)
tests/llm/utils/mock_toolset.py
440-440: Unused method argument: context
(ARG002)
holmes/plugins/toolsets/investigator/core_investigation.py
78-78: Unused method argument: context
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
🔇 Additional comments (9)
holmes/plugins/toolsets/servicenow/servicenow.py (1)
8-8: LGTM! Signature updates align with PR refactor.The addition of
ToolInvokeContextimport and the signature changes to_invokemethods are consistent with the broader PR objective of introducing context-based tool invocation. The unusedcontextparameter is acceptable for this incremental refactor stage.Also applies to: 119-119, 162-162, 194-194
holmes/plugins/toolsets/atlas_mongodb/mongodb_atlas.py (1)
7-7: LGTM! Consistent signature updates across all MongoDB Atlas tools.The
ToolInvokeContextimport and signature updates across all six_invokemethods follow the same pattern as other toolsets in this PR. The unusedcontextparameter is expected at this stage of the refactor.Also applies to: 122-122, 147-147, 180-180, 207-207, 264-264, 318-318
holmes/plugins/toolsets/runbook/runbook_fetcher.py (1)
9-9: LGTM! Signature updated consistently with other tools.The
ToolInvokeContextimport and signature change align with the PR's refactor pattern. The unusedcontextparameter is acceptable for this incremental change.Also applies to: 69-69
holmes/plugins/toolsets/investigator/core_investigation.py (1)
11-11: LGTM! TodoWriteTool signature updated consistently.The
ToolInvokeContextimport and signature change are consistent with the broader refactor. The unusedcontextparameter is expected at this stage.Also applies to: 78-78
tests/llm/utils/test_mock_toolset.py (1)
14-14: LGTM! Test updates correctly propagate ToolInvokeContext.The addition of
create_mock_tool_invoke_contextimport and its usage throughout the tests correctly exercises the new context-based tool invocation API. All tool and executor invocations now properly create and pass context objects.Also applies to: 248-249, 282-283, 291-291, 306-306, 334-335, 484-485, 541-542, 603-606, 662-664
tests/llm/utils/mock_toolset.py (3)
18-18: LGTM!The
ToolInvokeContextimport is correctly added and follows the coding guidelines for placing imports at the top of the file.
433-438: LGTM!The signature update correctly threads
ToolInvokeContextthrough to the underlying tool'sinvokemethod. The context is properly used at line 438.
494-527: LGTM!The
_invokemethod signature is correctly updated to acceptToolInvokeContext, and all internal call sites (lines 499, 503, 524, 526) properly propagate the context to both_call_mock_invokeand_call_live_invoke.holmes/plugins/toolsets/newrelic/newrelic.py (1)
7-7: LGTM: Import addition aligns with PR changes.The import of
ToolInvokeContextis consistent with the broader refactoring across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (1)
tests/plugins/toolsets/test_logging_api.py (1)
225-230: Typo fix confirmed (“without” variable).The rename to truncated_log_data_without_prefix is correct and consistent in the slice and assertion.
🧹 Nitpick comments (5)
tests/plugins/toolsets/test_logging_api.py (2)
199-203: Avoid env-coupled DefaultLLM in tests; use MockLLM to keep tests hermetic.Relying on MODEL and DefaultLLM can make tests flaky and slow. Use the mock LLM from create_mock_tool_invoke_context() instead.
- model = os.environ.get("MODEL") - if not model: - pytest.skip("Missing MODEL env var.") - llm = DefaultLLM(model=model) + # Use a mock LLM for deterministic token counting in tests + llm = create_mock_tool_invoke_context().llm
238-239: Tight 95% threshold may be brittle across tokenizers.Tokenization varies by model/version. Consider relaxing to ~90% or gating via a small epsilon to reduce false negatives.
holmes/core/tools_utils/tool_context_window_limiter.py (1)
12-16: Clamp invalid percentages to sane bounds.If percent_of_total_context_window <= 0 or > 100, returning the full context window may defeat the limit. Prefer clamping to [1, 100].
- if 0 < percent_of_total_context_window and percent_of_total_context_window <= 100: - return int(context_window_size * percent_of_total_context_window // 100) - else: - return context_window_size + pct = max(1.0, min(100.0, percent_of_total_context_window)) + return int(context_window_size * pct // 100)holmes/plugins/toolsets/prometheus/prometheus.py (2)
442-455: Silence Ruff ARG002: explicitly mark unused context.These _invoke methods accept context but don’t use it. Add a single del context to satisfy Ruff without altering the signature (callers pass context by name).
def _invoke(self, params: dict, context: ToolInvokeContext) -> StructuredToolResult: + del context # context is required by interface but unused hereApply to:
- ListPrometheusRules._invoke
- GetMetricNames._invoke
- GetLabelValues._invoke
- GetAllLabels._invoke
- GetSeries._invoke
- GetMetricMetadata._invoke
Also applies to: 560-575, 679-694, 793-806, 898-913, 1000-1013 --- `422-430`: **Consider aligning StructuredToolResult.status with response outcome.** create_structured_tool_result always sets SUCCESS even when response.status is “Failed” with error_message. Upstream logic (e.g., limiters, UI) may rely on tool-level status. Would you like a patch to map: - success+data -> SUCCESS - success+no data -> NO_DATA - non-success/error_message -> ERROR? </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 1279783854e18c712d22074e81493201068864ab and 515b33c82e13d100663370045bcf096d1967f8cd. </details> <details> <summary>📒 Files selected for processing (3)</summary> * `holmes/core/tools_utils/tool_context_window_limiter.py` (1 hunks) * `holmes/plugins/toolsets/prometheus/prometheus.py` (15 hunks) * `tests/plugins/toolsets/test_logging_api.py` (7 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>📓 Path-based instructions (3)</summary> <details> <summary>{holmes,tests}/**/*.py</summary> **📄 CodeRabbit inference engine (CLAUDE.md)** > `{holmes,tests}/**/*.py`: Use Ruff for formatting and linting for all Python code > Require type hints (code must be type-checkable with mypy) > ALWAYS place Python imports at the top of the file; do not place imports inside functions or methods Files: - `tests/plugins/toolsets/test_logging_api.py` - `holmes/core/tools_utils/tool_context_window_limiter.py` - `holmes/plugins/toolsets/prometheus/prometheus.py` </details> <details> <summary>tests/**/*.py</summary> **📄 CodeRabbit inference engine (CLAUDE.md)** > Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags Files: - `tests/plugins/toolsets/test_logging_api.py` </details> <details> <summary>tests/**</summary> **📄 CodeRabbit inference engine (CLAUDE.md)** > Test files should mirror the source structure under tests/ Files: - `tests/plugins/toolsets/test_logging_api.py` </details> </details><details> <summary>🧬 Code graph analysis (3)</summary> <details> <summary>tests/plugins/toolsets/test_logging_api.py (4)</summary><blockquote> <details> <summary>holmes/core/tools_utils/token_counting.py (1)</summary> * `count_tool_response_tokens` (6-13) </details> <details> <summary>holmes/plugins/toolsets/logging_utils/logging_api.py (2)</summary> * `truncate_logs` (90-145) * `_invoke` (249-270) </details> <details> <summary>holmes/core/llm.py (1)</summary> * `DefaultLLM` (70-379) </details> <details> <summary>tests/conftest.py (1)</summary> * `create_mock_tool_invoke_context` (107-133) </details> </blockquote></details> <details> <summary>holmes/core/tools_utils/tool_context_window_limiter.py (4)</summary><blockquote> <details> <summary>holmes/core/llm.py (1)</summary> * `LLM` (39-67) </details> <details> <summary>holmes/core/tools.py (1)</summary> * `StructuredToolResultStatus` (52-76) </details> <details> <summary>holmes/core/models.py (2)</summary> * `ToolCallResult` (23-60) * `as_tool_call_message` (30-36) </details> <details> <summary>holmes/utils/sentry_helper.py (1)</summary> * `capture_toolcall_contains_too_many_tokens` (22-34) </details> </blockquote></details> <details> <summary>holmes/plugins/toolsets/prometheus/prometheus.py (4)</summary><blockquote> <details> <summary>holmes/core/tools.py (5)</summary> * `ToolInvokeContext` (163-169) * `StructuredToolResultStatus` (52-76) * `StructuredToolResult` (79-103) * `_invoke` (351-360) * `_invoke` (413-445) </details> <details> <summary>holmes/core/tools_utils/token_counting.py (1)</summary> * `count_tool_response_tokens` (6-13) </details> <details> <summary>holmes/core/tools_utils/tool_context_window_limiter.py (1)</summary> * `get_pct_token_count` (9-15) </details> <details> <summary>holmes/utils/keygen_utils.py (1)</summary> * `generate_random_key` (5-6) </details> </blockquote></details> </details><details> <summary>🪛 Ruff (0.13.1)</summary> <details> <summary>holmes/plugins/toolsets/prometheus/prometheus.py</summary> 442-442: Unused method argument: `context` (ARG002) --- 560-560: Unused method argument: `context` (ARG002) --- 679-679: Unused method argument: `context` (ARG002) --- 793-793: Unused method argument: `context` (ARG002) --- 898-898: Unused method argument: `context` (ARG002) --- 1000-1000: Unused method argument: `context` (ARG002) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)</summary> * GitHub Check: Pre-commit checks * GitHub Check: llm_evals </details> <details> <summary>🔇 Additional comments (1)</summary><blockquote> <details> <summary>holmes/plugins/toolsets/prometheus/prometheus.py (1)</summary><blockquote> `1149-1167`: **Token cap logic LGTM; respects global limit and optional pct override.** Good use of context.max_token_count with get_pct_token_count to optionally tighten limits. Also applies to: 1392-1411 </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
holmes/core/tools_utils/tool_context_window_limiter.py (1)
37-55: Preserve NO_DATA status instead of coercing to ERROR.Lines 37-39 handle the NO_DATA case by setting
error_message = None, but line 49 unconditionally overwrites the status to ERROR, which loses the semantic distinction between "no data found" and "error occurred."This was flagged in a previous review but remains unaddressed. When a tool returns NO_DATA (a valid, non-error state), the status should be preserved.
Apply this diff to preserve NO_DATA status:
if tool_call_result.result.status == StructuredToolResultStatus.NO_DATA: error_message = None - # tool_call_result.result.data is set to None below which is expected to fix the issue elif tool_call_result.result.status == StructuredToolResultStatus.ERROR: original_error = ( tool_call_result.result.error or tool_call_result.result.data or "Unknown error" ) truncated_error = str(original_error)[:100] error_message = f"The tool call returned an error it is too large to return\nThe following original error is truncated:\n{truncated_error}" - tool_call_result.result.status = StructuredToolResultStatus.ERROR + # Only set status to ERROR when it wasn't NO_DATA + if tool_call_result.result.status != StructuredToolResultStatus.NO_DATA: + tool_call_result.result.status = StructuredToolResultStatus.ERROR tool_call_result.result.data = None tool_call_result.result.error = error_messageholmes/plugins/toolsets/prometheus/prometheus.py (1)
407-420: Type mismatch:datafield should accept dict, not str.Line 410 declares
data: Optional[str] = None, but the code at lines 1147, 1391, 1187, and 1428 assignsresult_data(a dict from Prometheus responses) directly toresponse_data.data. This violates the type annotation and will fail mypy type checking as required by the coding guidelines.This was flagged in a previous review but remains unaddressed.
As per coding guidelines.
Apply this diff to fix the type mismatch:
class MetricsBasedResponse(BaseModel): status: str error_message: Optional[str] = None - data: Optional[str] = None + data: Optional[dict[str, Any]] = None random_key: str tool_name: str description: str query: str start: Optional[str] = None end: Optional[str] = None step: Optional[float] = None output_type: Optional[str] = None data_summary: Optional[dict[str, Any]] = None
🧹 Nitpick comments (1)
holmes/core/tools_utils/tool_context_window_limiter.py (1)
9-21: Edge case handling could be improved.The condition on line 12 allows values between 0 and 100, but falls back to returning the full context window for values ≤ 0 or > 100. This could lead to unexpected behavior:
- If
percent_of_total_context_windowis 0, returning the full context window size seems counterintuitive.- If
percent_of_total_context_windowis > 100, capping at 100% would be clearer than returning the full window.Consider adding explicit guards or logging for these edge cases to make the behavior more predictable.
def get_pct_token_count(percent_of_total_context_window: float, llm: LLM) -> int: context_window_size = llm.get_context_window_size() - if 0 < percent_of_total_context_window and percent_of_total_context_window <= 100: + if 0 < percent_of_total_context_window <= 100: return int(context_window_size * percent_of_total_context_window // 100) + elif percent_of_total_context_window > 100: + # Cap at 100% instead of returning full window + return context_window_size else: + # For 0 or negative values, log warning and return 0 or raise error return context_window_size
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
holmes/core/tools_utils/tool_context_window_limiter.py(1 hunks)holmes/plugins/toolsets/prometheus/prometheus.py(15 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
{holmes,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{holmes,tests}/**/*.py: Use Ruff for formatting and linting for all Python code
Require type hints (code must be type-checkable with mypy)
ALWAYS place Python imports at the top of the file; do not place imports inside functions or methods
Files:
holmes/core/tools_utils/tool_context_window_limiter.pyholmes/plugins/toolsets/prometheus/prometheus.py
🧬 Code graph analysis (2)
holmes/core/tools_utils/tool_context_window_limiter.py (5)
holmes/core/llm.py (1)
LLM(39-67)holmes/core/tools.py (1)
StructuredToolResultStatus(52-76)holmes/core/models.py (2)
ToolCallResult(23-60)as_tool_call_message(30-36)tests/conftest.py (2)
get_context_window_size(74-75)count_tokens_for_message(80-83)holmes/utils/sentry_helper.py (1)
capture_toolcall_contains_too_many_tokens(22-34)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
holmes/core/tools.py (3)
ToolInvokeContext(163-169)StructuredToolResultStatus(52-76)StructuredToolResult(79-103)holmes/core/tools_utils/token_counting.py (1)
count_tool_response_tokens(6-13)holmes/core/tools_utils/tool_context_window_limiter.py (1)
get_pct_token_count(9-15)holmes/utils/keygen_utils.py (1)
generate_random_key(5-6)
🪛 Ruff (0.13.1)
holmes/plugins/toolsets/prometheus/prometheus.py
442-442: Unused method argument: context
(ARG002)
560-560: Unused method argument: context
(ARG002)
679-679: Unused method argument: context
(ARG002)
793-793: Unused method argument: context
(ARG002)
898-898: Unused method argument: context
(ARG002)
1000-1000: Unused method argument: context
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
🔇 Additional comments (3)
holmes/core/tools_utils/tool_context_window_limiter.py (1)
41-47: LGTM: TypeError risk addressed.Line 46 now wraps
original_errorwithstr()before slicing, which prevents the TypeError that would occur ifresult.datawere a dict or other non-string type. This addresses the previous review comment.holmes/plugins/toolsets/prometheus/prometheus.py (2)
1386-1434: Uninitialized variable whentool_calls_return_datais False.Line 1386 declares
structured_tool_result: StructuredToolResultwithout initialization. It's only assigned inside theif self.toolset.config.tool_calls_return_data:block (lines 1392-1394), but is then used unconditionally at lines 1430-1434.If
tool_calls_return_datais False,structured_tool_resultwill be uninitialized when accessed, causing an UnboundLocalError at runtime.Apply this diff to ensure the variable is always initialized:
response_data = MetricsBasedResponse( status=status, error_message=error_message, random_key=generate_random_key(), tool_name=self.name, description=description, query=query, start=start, end=end, step=step, output_type=output_type, ) - - structured_tool_result: StructuredToolResult # Check if data should be included based on size if self.toolset.config.tool_calls_return_data: result_data = data.get("data", {}) response_data.data = result_data structured_tool_result = create_structured_tool_result( params=params, response=response_data ) token_count = count_tool_response_tokens( llm=context.llm, structured_tool_result=structured_tool_result ) token_limit = context.max_token_count if self.toolset.config.query_response_size_limit_pct: custom_token_limit = get_pct_token_count( percent_of_total_context_window=self.toolset.config.query_response_size_limit_pct, llm=context.llm, ) if custom_token_limit < token_limit: token_limit = custom_token_limit # Provide summary if data is too large if token_count > token_limit: response_data.data = None response_data.data_summary = ( create_data_summary_for_large_result( result_data, query, token_count, is_range_query=True ) ) logging.info( f"Prometheus range query returned large dataset: " f"{response_data.data_summary.get('series_count', 0)} series, " f"{token_count:,} tokens (limit: {token_limit:,}). " f"Returning summary instead of full data." ) # Also add character info to the summary for debugging response_data.data_summary["_debug_info"] = ( f"Data size: {token_count:,} tokens exceeded limit of {token_limit:,} tokens" ) else: response_data.data = result_data structured_tool_result = create_structured_tool_result( params=params, response=response_data ) return structured_tool_resultLikely an incorrect or invalid review comment.
1143-1192: Uninitialized variable whentool_calls_return_datais False.Line 1143 declares
structured_tool_result: StructuredToolResultwithout initialization. It's only assigned inside theif self.toolset.config.tool_calls_return_data:block (lines 1149-1151), but is then used unconditionally at lines 1189-1192.If
tool_calls_return_datais False,structured_tool_resultwill be uninitialized when accessed, causing an UnboundLocalError at runtime.Apply this diff to ensure the variable is always initialized:
response_data = MetricsBasedResponse( status=status, error_message=error_message, random_key=generate_random_key(), tool_name=self.name, description=description, query=query, ) - structured_tool_result: StructuredToolResult # Check if data should be included based on size if self.toolset.config.tool_calls_return_data: result_data = data.get("data", {}) response_data.data = result_data structured_tool_result = create_structured_tool_result( params=params, response=response_data ) token_count = count_tool_response_tokens( llm=context.llm, structured_tool_result=structured_tool_result ) token_limit = context.max_token_count if self.toolset.config.query_response_size_limit_pct: custom_token_limit = get_pct_token_count( percent_of_total_context_window=self.toolset.config.query_response_size_limit_pct, llm=context.llm, ) if custom_token_limit < token_limit: token_limit = custom_token_limit # Provide summary if data is too large if token_count > token_limit: response_data.data = None response_data.data_summary = ( create_data_summary_for_large_result( result_data, query, token_count, is_range_query=False, ) ) logging.info( f"Prometheus instant query returned large dataset: " f"{response_data.data_summary.get('result_count', 0)} results, " f"{token_count:,} tokens (limit: {token_limit:,}). " f"Returning summary instead of full data." ) # Also add token info to the summary for debugging response_data.data_summary["_debug_info"] = ( f"Data size: {token_count:,} tokens exceeded limit of {token_limit:,} tokens" ) else: response_data.data = result_data structured_tool_result = create_structured_tool_result( params=params, response=response_data ) return structured_tool_resultLikely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (1)
198-198: Consider implementing token-based truncation for large responses.All eight
_invokemethods now accept aToolInvokeContextbut don't use it. Given the PR's objective to link tool limits to global truncation limits, and that several tools (especiallyFetchTracesSimpleComparison,QueryTraceById, andQueryMetricsRange) can return very large YAML-serialized datasets, you should consider implementing token counting and truncation similar to the Prometheus toolset.Reference implementation from
holmes/plugins/toolsets/prometheus/prometheus.py(lines 1088-1229):
- Use
count_tool_response_tokens(llm=context.llm, structured_tool_result=...)to measure response size- Compare against
context.max_token_count- Provide summaries or truncate data when limits are exceeded
This would prevent context window overflow when trace data or time series results are large.
If token limiting is intentionally deferred, you may want to add TODO comments or track this work separately.
Also applies to: 412-412, 492-492, 554-554, 629-629, 705-705, 783-783, 867-867
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
{holmes,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{holmes,tests}/**/*.py: Use Ruff for formatting and linting for all Python code
Require type hints (code must be type-checkable with mypy)
ALWAYS place Python imports at the top of the file; do not place imports inside functions or methods
Files:
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
🧬 Code graph analysis (1)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (2)
holmes/core/tools.py (4)
ToolInvokeContext(163-169)_invoke(351-360)_invoke(413-445)StructuredToolResult(79-103)holmes/plugins/toolsets/prometheus/prometheus.py (8)
_invoke(442-512)_invoke(560-632)_invoke(679-750)_invoke(793-854)_invoke(898-968)_invoke(1000-1047)_invoke(1089-1230)_invoke(1303-1470)
🪛 Ruff (0.13.1)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py
198-198: Unused method argument: context
(ARG002)
412-412: Unused method argument: context
(ARG002)
492-492: Unused method argument: context
(ARG002)
554-554: Unused method argument: context
(ARG002)
629-629: Unused method argument: context
(ARG002)
705-705: Unused method argument: context
(ARG002)
783-783: Unused method argument: context
(ARG002)
867-867: Unused method argument: context
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
🔇 Additional comments (1)
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (1)
10-10: LGTM!The
ToolInvokeContextimport is necessary for the updated method signatures throughout this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
321-404: LGTM! Token terminology is now consistent.The function correctly uses "tokens" throughout both the messages and field names. The previous issue about inconsistent units has been resolved.
422-435: LGTM! Status mapping correctly implemented.The function properly maps response status to
StructuredToolResultStatus, handling error conditions and missing data appropriately. The previous concern about always returning SUCCESS has been addressed.
407-420: Fix type mismatch: data field should be dict, not str.The
datafield is typed asOptional[str]but is assigned dictionary values at lines 1153, 1173, 1193, 1397, 1417, and 1434. This breaks mypy type checking.Apply this diff:
class MetricsBasedResponse(BaseModel): status: str error_message: Optional[str] = None - data: Optional[str] = None + data: Optional[dict[str, Any]] = None random_key: str tool_name: str description: str query: str start: Optional[str] = None end: Optional[str] = None step: Optional[float] = None output_type: Optional[str] = None data_summary: Optional[dict[str, Any]] = None
95-97: Set a default value equivalent to 2% as previously discussed.Per the previous review comment, this field should have a default value equivalent to 2% to maintain backward compatibility with the previously used threshold.
Apply this diff:
- query_response_size_limit_pct: Optional[int] = ( - None # Limit the max number of tokens that a query result can take to proactively prevent token limit issues. Expressed in % of the model's context window - ) + query_response_size_limit_pct: Optional[int] = 2 # Limit the max number of tokens that a query result can take to proactively prevent token limit issues. Expressed in % of the model's context window
🧹 Nitpick comments (2)
holmes/plugins/toolsets/prometheus/prometheus.py (2)
1095-1198: Token-based sizing logic correctly implemented.The implementation properly:
- Accepts
ToolInvokeContextparameter- Uses token counting via
count_tool_response_tokens- Respects both global
context.max_token_countand per-toolquery_response_size_limit_pct- Takes the minimum of the two limits when both are configured
- Generates data summaries when limits are exceeded
Optional refactor: Consider eliminating duplicate structured result creation.
The code creates
structured_tool_resulttwice (lines 1155-1160 and 1195-1197). The first creation is solely for token counting. While functionally correct, this could be optimized.Consider extracting the response serialization for counting:
# After line 1153 response_data.data = result_data # Count tokens directly from response_data token_count = count_tool_response_tokens( llm=context.llm, structured_tool_result=StructuredToolResult( status=StructuredToolResultStatus.SUCCESS, data=response_data.model_dump_json(indent=2), params=params, ) ) token_limit = context.max_token_count # ... rest of limit logic ... # Single final creation structured_tool_result = create_structured_tool_result( params=params, response=response_data ) return structured_tool_result
1309-1440: Token-based sizing logic correctly implemented.The range query implementation correctly mirrors the instant query pattern:
- Accepts
ToolInvokeContextparameter- Uses token counting and respects both global and per-tool limits
- Generates data summaries when limits are exceeded
- Properly handles range-specific fields (start, end, step, output_type)
Optional refactor: Same double-creation optimization applies.
Similar to the instant query, this method creates
structured_tool_resulttwice (lines 1398-1400 and 1436-1438). The same optimization suggested forExecuteInstantQuerycould be applied here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
holmes/plugins/toolsets/prometheus/prometheus.py(15 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
{holmes,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{holmes,tests}/**/*.py: Use Ruff for formatting and linting for all Python code
Require type hints (code must be type-checkable with mypy)
ALWAYS place Python imports at the top of the file; do not place imports inside functions or methods
Files:
holmes/plugins/toolsets/prometheus/prometheus.py
🧬 Code graph analysis (1)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
holmes/core/tools.py (5)
ToolInvokeContext(163-169)StructuredToolResultStatus(52-76)Toolset(533-748)ToolsetTag(143-146)StructuredToolResult(79-103)holmes/core/tools_utils/token_counting.py (1)
count_tool_response_tokens(6-13)holmes/core/tools_utils/tool_context_window_limiter.py (1)
get_pct_token_count(9-15)holmes/utils/keygen_utils.py (1)
generate_random_key(5-6)
🪛 Ruff (0.13.2)
holmes/plugins/toolsets/prometheus/prometheus.py
448-448: Unused method argument: context
(ARG002)
566-566: Unused method argument: context
(ARG002)
685-685: Unused method argument: context
(ARG002)
799-799: Unused method argument: context
(ARG002)
904-904: Unused method argument: context
(ARG002)
1006-1006: Unused method argument: context
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
🔇 Additional comments (3)
holmes/plugins/toolsets/prometheus/prometheus.py (3)
18-18: LGTM! Import additions are correct.The new imports for
ToolInvokeContext,count_tool_response_tokens, andget_pct_token_countare properly placed at the top of the file and support the token-based sizing functionality.Also applies to: 24-25
448-448: LGTM! Context parameter added for interface consistency.The
contextparameter is required by the base class interface even though it's not used in this specific method. This is acceptable for maintaining a consistent interface across all tool implementations.
566-566: LGTM! Context parameter added for interface consistency.These methods (
GetMetricNames,GetLabelValues,GetAllLabels,GetSeries,GetMetricMetadata) all accept thecontextparameter to maintain interface consistency, even though they don't use token counting. This is acceptable design.Also applies to: 685-685, 799-799, 904-904, 1006-1006
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
holmes/plugins/toolsets/prometheus/prometheus.py (1)
407-420: Type mismatch:MetricsBasedResponse.datashould not beOptional[str].The
datafield is typed asOptional[str](line 410), but the code assigns dictionaries to it (lines 1153, 1193, 1397, 1434). This breaks type checking with mypy and can cause issues with Pydantic serialization.Apply this diff to fix the type:
class MetricsBasedResponse(BaseModel): status: str error_message: Optional[str] = None - data: Optional[str] = None + data: Optional[dict[str, Any]] = None random_key: str tool_name: str description: str query: str start: Optional[str] = None end: Optional[str] = None step: Optional[float] = None output_type: Optional[str] = None data_summary: Optional[dict[str, Any]] = Noneholmes/plugins/toolsets/logging_utils/logging_api.py (1)
86-91: Add return type annotation.The function signature is missing a return type annotation. Per coding guidelines requiring type hints for all code, add
-> Noneto the signature.Apply this diff:
def truncate_logs( logging_structured_tool_result: StructuredToolResult, llm: LLM, token_limit: int, structured_params: FetchPodLogsParams, -): +) -> None:
🧹 Nitpick comments (1)
holmes/plugins/toolsets/logging_utils/logging_api.py (1)
29-33: Consider moving these constants up to group with other module constants.While the constants are appropriately defined at module level, they would be more discoverable if placed immediately after line 27 (POD_LOGGING_TOOL_NAME) to group all module constants together before class definitions.
Based on past review comment from aantn requesting constants at top of file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
holmes/plugins/toolsets/logging_utils/logging_api.py(5 hunks)holmes/plugins/toolsets/prometheus/prometheus.py(15 hunks)tests/plugins/toolsets/test_logging_api.py(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/plugins/toolsets/test_logging_api.py
🧰 Additional context used
📓 Path-based instructions (1)
{holmes,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{holmes,tests}/**/*.py: Use Ruff for formatting and linting for all Python code
Require type hints (code must be type-checkable with mypy)
ALWAYS place Python imports at the top of the file; do not place imports inside functions or methods
Files:
holmes/plugins/toolsets/prometheus/prometheus.pyholmes/plugins/toolsets/logging_utils/logging_api.py
🧬 Code graph analysis (2)
holmes/plugins/toolsets/prometheus/prometheus.py (4)
holmes/core/tools.py (3)
ToolInvokeContext(163-169)StructuredToolResultStatus(52-76)StructuredToolResult(79-103)holmes/core/tools_utils/token_counting.py (1)
count_tool_response_tokens(6-13)holmes/core/tools_utils/tool_context_window_limiter.py (1)
get_pct_token_count(9-15)holmes/utils/keygen_utils.py (1)
generate_random_key(5-6)
holmes/plugins/toolsets/logging_utils/logging_api.py (3)
holmes/core/llm.py (1)
LLM(39-67)holmes/core/tools.py (4)
StructuredToolResult(79-103)Tool(172-364)ToolInvokeContext(163-169)get_stringified_data(90-103)holmes/core/tools_utils/token_counting.py (1)
count_tool_response_tokens(6-13)
🪛 Ruff (0.13.2)
holmes/plugins/toolsets/prometheus/prometheus.py
448-448: Unused method argument: context
(ARG002)
566-566: Unused method argument: context
(ARG002)
685-685: Unused method argument: context
(ARG002)
799-799: Unused method argument: context
(ARG002)
904-904: Unused method argument: context
(ARG002)
1006-1006: Unused method argument: context
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
🔇 Additional comments (11)
holmes/plugins/toolsets/prometheus/prometheus.py (7)
18-18: LGTM!The new imports are correctly placed at the top of the file and are used in the token-counting logic throughout the file.
Also applies to: 24-25
95-97: LGTM!The config field is properly renamed to indicate percentage-based limiting, and the 2% default aligns with the reviewer's request and previous threshold behavior.
322-322: LGTM!The function has been updated to use token-based sizing consistently. The parameter name, return values, and messages all correctly reference "tokens" instead of "characters", which aligns with the token-counting approach introduced in this PR.
Also applies to: 365-365, 368-368, 398-398, 401-401
422-435: LGTM!The
create_structured_tool_resultfunction now correctly maps response status to appropriateStructuredToolResultStatusvalues, addressing the previous concern about always returning SUCCESS. The logic properly handles ERROR, NO_DATA, and SUCCESS cases.
448-448: LGTM!The
_invokemethod signatures have been updated to acceptcontext: ToolInvokeContext, maintaining interface consistency across all tools. While thecontextparameter is not used in these metadata-focused tools (static analysis warning ARG002), it is intentionally included for a uniform tool invocation API and is actively used in query tools (ExecuteInstantQuery, ExecuteRangeQuery).Also applies to: 566-566, 685-685, 799-799, 904-904, 1006-1006
1095-1198: LGTM!The token-counting and truncation logic is well-structured and correctly integrates with the
ToolInvokeContext. The implementation:
- Properly uses
context.llmfor token counting viacount_tool_response_tokens- Correctly applies the token limit from
context.max_token_count- Supports optional per-tool override via
query_response_size_limit_pct(choosing the smaller limit)- Replaces large responses with
data_summarywhen exceeding the limit- Includes debug information in the summary for troubleshooting
The flow is logical and handles both the normal and truncated-response paths appropriately.
1309-1440: LGTM!The token-counting and truncation logic for range queries mirrors the instant query implementation and is equally well-structured. The implementation correctly:
- Integrates with
ToolInvokeContextfor token counting- Applies token limits with optional per-tool override
- Distinguishes range queries by passing
is_range_query=Truetocreate_data_summary_for_large_result- Replaces large responses with detailed summaries including series count and debug info
The consistency between instant and range query implementations ensures maintainability.
holmes/plugins/toolsets/logging_utils/logging_api.py (4)
4-4: LGTM!The new imports are correctly placed at the top of the file and properly support the truncation functionality.
Also applies to: 10-10, 14-14, 18-18
92-145: LGTM!The truncation logic correctly addresses previous review feedback:
- Uses
get_stringified_data()to avoid stringifying None- Keeps most recent logs by slicing from the tail
- Preserves line boundaries with linefeed detection
- Logs a single informative message only when truncation occurs
- Handles edge cases appropriately
The iterative approach compensates for the character-to-token ratio approximation.
249-249: LGTM!The signature change to accept
ToolInvokeContextcorrectly aligns with the PR objectives to thread LLM and token limits through tool invocations.
264-269: LGTM!The integration of
truncate_logscorrectly uses the context to access the LLM and token limit, and appropriately passes the structured parameters for logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.