-
Couldn't load subscription status.
- Fork 183
Subtasks #851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # holmes/core/tools.py # holmes/plugins/prompts/generic_ask.jinja2
prompt fix
# Conflicts: # holmes/plugins/prompts/_general_instructions.jinja2 # holmes/plugins/prompts/kubernetes_workload_ask.jinja2
prompt fix
improve prompts
WalkthroughAdds per-investigation context (investigation_id) across prompts and LLM flows, introduces a thread-safe TodoListManager and TodoWrite toolset for task tracking, augments prompt templates with an investigation procedure, extends tool parameter schemas and OpenAI formatting, and raises max_steps from 10 to 40. Tests and docs updated. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant CLI_App as CLI/App
participant LLM as ToolCallingLLM
participant Prompt as Prompt Renderer
participant TodoMgr as TodoListManager
participant TodoTool as TodoWriteTool
User->>CLI_App: Ask question
CLI_App->>LLM: Initialize (max_steps=40)
LLM->>LLM: generate investigation_id (UUID)
LLM->>TodoMgr: format_tasks_for_prompt(investigation_id)
TodoMgr-->>LLM: todo_list (string)
LLM->>Prompt: render system prompt (investigation_id, todo_list, runbooks)
CLI_App->>Prompt: build_initial_ask_messages(..., investigation_id, ...)
loop Investigation loop
LLM->>TodoTool: TodoWrite todos[], investigation_id
TodoTool->>TodoMgr: update_session_tasks(investigation_id, tasks)
TodoTool-->>LLM: StructuredToolResult (formatted tasks)
LLM->>Prompt: re-render with updated todo_list
end
LLM-->>User: Final answer (after tasks complete & final review)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🔭 Outside diff range comments (4)
tests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml (1)
1-12: Rename test namespace to follow evals convention (app-)The fixture currently uses
staging-63, which violates our guideline that each LLM eval must run in its ownapp-<testid>namespace. Please update all occurrences toapp-63, including:
- The
user_promptline: replace"…in staging-63 namespace"→"…in app-63 namespace"- before_test block:
kubectl create namespace staging-63→kubectl create namespace app-63- any
-n staging-63flags →-n app-63- after_test block:
kubectl delete namespace staging-63→kubectl delete namespace app-63-n staging-63→-n app-63for secret deletion- manifest.yaml (if it hardcodes the namespace) and any other files referenced by this test
This change is required to prevent cross-test collisions and adhere to our LLM test naming standard.
holmes/plugins/prompts/_general_instructions.jinja2 (1)
66-72: Align tool call limit in prompts with actual safeguards
- The tool name is correctly defined as “TodoWrite” (class TodoWriteTool, name="TodoWrite").
- We found no enforcement of a “maximum of 5 tool calls per specific tool” in prevent_overly_repeated_tool_call or elsewhere—only duplicate-call and redundant‐pod‐logs checks exist.
- Please either remove or update the “5 tool calls” text in _general_instructions.jinja2 to match current behavior, or implement a configurable max‐calls threshold in the safeguards.
holmes/main.py (1)
72-79: CLI default for max_steps violates the file’s own config-precedence ruleLines 72–74 explicitly state that CLI option defaults that are also present in the config MUST be None, or the CLI will override config values. Setting a concrete default on Line 97 breaks this contract and will always override whatever is set in the config file.
Apply this diff to restore config precedence (and set the new 40-step default in the Config layer instead):
opt_max_steps: Optional[int] = typer.Option( - 40, + None, "--max-steps", help="Advanced. Maximum number of steps the LLM can take to investigate the issue", )If you want 40 to be the new default, set it where Config.load_from_file applies defaults when the CLI passes None, or in the underlying LLM client defaults. I can help patch that if you point me to the Config default site.
tests/core/test_todo_write_tool.py (1)
105-127: Updatetest_openai_formatto pass atarget_modeland assertinvestigation_idin the schemaThe
TodoWriteTool.get_openai_format()method requires atarget_model: strargument (it’s defined in the baseToolclass), so calling it without parameters will raise a missing-argument error. Also, sinceinvestigation_idis declared on the tool, the test should verify its presence in bothpropertiesandrequired.Pinpointed location:
- tests/core/test_todo_write_tool.py,
test_openai_format(around lines 105–127)Suggested diff:
- openai_format = tool.get_openai_format() + # Pass a valid OpenAI model identifier + openai_format = tool.get_openai_format("gpt-4") assert openai_format["type"] == "function" assert openai_format["function"]["name"] == "TodoWrite" assert "investigation tasks" in openai_format["function"]["description"] # Check parameters schema params = openai_format["function"]["parameters"] assert params["type"] == "object" assert "todos" in params["properties"] # Check array schema has items property todos_param = params["properties"]["todos"] assert todos_param["type"] == "array" assert "items" in todos_param assert todos_param["items"]["type"] == "object" - # Check required fields - assert "todos" in params["required"] + # Check required fields + assert "todos" in params["required"] + + # Ensure investigation_id is included + assert "investigation_id" in params["properties"] + assert params["properties"]["investigation_id"]["type"] == "string" + assert "investigation_id" in params["required"]
🧹 Nitpick comments (30)
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/test_case.yaml (1)
3-9: Clarify matching semantics to avoid brittle/flaky assertionsNatural-language expectations are fine, but the example embed can be brittle if the harness matches exact JSON string order or the example
random_key. Consider clarifying key-order independence and thatrandom_keyis arbitrary. Minor grammar fix included.Proposed tweak:
expected_output: | - Output must contain 1 or more embeds in the following format + Output must contain one or more embeds in the following format <<{"type": "datadogql", "tool_name": "query_datadog_metrics", "random_key": "iD8G"}>> - random_key may be different than the above example, but all other parameters (type and tool_name) must be as described + random_key may be different than the above example, but all other parameters (type and tool_name) must be as described. + The order of keys inside the JSON object is not important; assertions should match keys irrespective of order. Output must NOT tell the user it doesn't have access to metrics or that they should use another toolFollow-up: Please confirm the eval harness checks the embed JSON by keys (order-insensitive) and does not require the literal
random_keyvalue shown in the example.holmes/config.py (1)
75-75: Update documentation code samples to reflect new max_steps defaultThe quick scan found several occurrences of
max_steps=10in your Python installation guide. Please update these examples to use the new default of 40:• docs/installation/python-installation.md:31
• docs/installation/python-installation.md:88
• docs/installation/python-installation.md:209
• docs/installation/python-installation.md:286
• docs/installation/python-installation.md:306holmes/plugins/prompts/investigation_procedure.jinja2 (5)
1-4: Use consistent capitalization for Investigation IDMinor clarity/consistency tweak.
-# Investigation ID for this session -Investigation id: {{ investigation_id }} +# Investigation ID for this session +Investigation ID: {{ investigation_id }}
31-36: Avoid inline comments inside JSON-like examples to prevent copy/paste into tool paramsInline comments in example objects can be copied literally by the LLM and break schema validation. Move the note outside the JSON example.
TodoWrite(todos=[ {"id": "1", "content": "Check frontend pod logs", "status": "in_progress"}, {"id": "2", "content": "Check backend service config", "status": "in_progress"}, {"id": "3", "content": "Analyze network policies", "status": "in_progress"}, -{"id": "4", "content": "Compare logs from both pods", "status": "pending"} # Depends on 1,2 +{"id": "4", "content": "Compare logs from both pods", "status": "pending"} ]) + +# Note: Task 4 depends on tasks 1 and 2
47-50: Clarify concurrency phrasing to match runtime behaviorState concurrency as “preferred/where supported,” since orchestration depends on the runtime scheduler.
-MAXIMIZE PARALLEL TOOL CALLS: -- When executing multiple in_progress tasks, make ALL their tool calls at once +MAXIMIZE PARALLEL TOOL CALLS: +- When executing multiple in_progress tasks, prefer issuing tool calls concurrently where supported by the runtime - Example: If tasks 1,2,3 are in_progress, call kubectl_logs + kubectl_describe + kubectl_get simultaneously
71-76: Fix formatting in the Task Status Check exampleMissing newline merges Task 2 and Task 3.
[✓] completed - Task 1 -[✓] completed - Task 2[✓] completed - Task 3 +[✓] completed - Task 2 +[✓] completed - Task 3 [✓] completed - Investigation Verification
153-157: Typo: “accross” → “across”Small spelling fix.
-*Final Review Phase*: Validate that the chain of events, accross the different components, can lead to the investigated scenario. +*Final Review Phase*: Validate that the chain of events, across the different components, can lead to the investigated scenario.examples/custom_llm.py (1)
50-65: Add a return type hint to ask_holmes for mypy complianceMinor type-hint nit for examples.
-def ask_holmes(): +def ask_holmes() -> None: prompt = "what pods are unhealthy in my cluster?"server.py (1)
212-221: Add todo_list to the workload prompt context (consistency with investigation prompts).You added
investigation_idto the system prompt context here, but nottodo_list. Since the workload prompt now includesinvestigation_procedure.jinja2, it may render a task list when present. To avoid undefined var issues (if StrictUndefined) and to keep UX consistent, pass a formattedtodo_listtoo.Apply this diff within this block:
ai = config.create_toolcalling_llm(dal=dal, model=request.model) + # Provide current investigation tasks context to the prompt + todo_manager = get_todo_manager() + todo_context = todo_manager.format_tasks_for_prompt(ai.investigation_id) + system_prompt = load_and_render_prompt( request.prompt_template, context={ "alerts": workload_alerts, "toolsets": ai.tool_executor.toolsets, "response_format": workload_health_structured_output, "cluster_name": config.cluster_name, + "todo_list": todo_context, "investigation_id": ai.investigation_id, }, )And add this import at the top of the file (with other imports):
from holmes.core.todo_manager import get_todo_managerholmes/core/investigation.py (1)
137-139: Guard against missing or non-string investigation_id.In unusual cases (e.g., alt LLM impls),
ai.investigation_idmight be None or non-string. Defensive cast avoids unexpected keying in the todo manager.Apply:
- todo_manager = get_todo_manager() - todo_context = todo_manager.format_tasks_for_prompt(ai.investigation_id) + todo_manager = get_todo_manager() + session_id = str(getattr(ai, "investigation_id", "") or "") + todo_context = todo_manager.format_tasks_for_prompt(session_id)holmes/core/tools.py (1)
125-127: Support for nested schemas added; consider forward-ref rebuild and basic schema validation.The recursive
ToolParameterrefs look good. To avoid forward-ref resolution issues with Pydantic v2 in some environments, and to enforce consistency betweentypeandproperties/items, consider:
- Ensure forward refs are rebuilt after class definition:
# place near the end of the module or right after class ToolParameter ToolParameter.model_rebuild()
- Optionally add a validator to enforce that:
- when
propertiesis set,type == "object"- when
itemsis set,type == "array"
This prevents malformed schemas reaching the formatter.tests/plugins/toolsets/test_core_investigation.py (1)
1-43: Solid coverage of the core toolset; tighten one assertion.The tests validate naming, defaults, tool presence, and prerequisite behavior. Consider asserting the example config is empty to codify the “no config required” contract.
Apply:
- assert isinstance(config, dict) - # Core toolset doesn't need configuration + assert isinstance(config, dict) + # Core toolset doesn't need configuration + assert config == {}tests/llm/test_ask_holmes.py (2)
299-301: Increase of max_steps to 40 is fine; consider making it configurable for test runsBumping to 40 aligns with the PR goals. To keep tests flexible and avoid long runs in constrained CI, consider reading from an env var with a sensible default.
ai = ToolCallingLLM( tool_executor=tool_executor, - max_steps=40, + max_steps=int(os.environ.get("HOLMES_MAX_STEPS", "40")), llm=DefaultLLM(os.environ.get("MODEL", "gpt-4o"), tracer=tracer), )
320-327: API change to build_initial_ask_messages: pass investigation_id by keywordPassing the new parameter is correct. To future-proof against signature changes and improve readability, prefer a keyword argument for investigation_id.
messages = build_initial_ask_messages( console, test_case.user_prompt, None, ai.tool_executor, - ai.investigation_id, + investigation_id=ai.investigation_id, runbooks, )holmes/interactive.py (1)
1000-1008: Pass investigation_id by keyword to avoid future signature driftThe new parameter is wired correctly. Minor nit: use a keyword for clarity and resilience to future parameter ordering changes.
if messages is None: messages = build_initial_ask_messages( console, user_input, include_files, ai.tool_executor, - ai.investigation_id, + investigation_id=ai.investigation_id, runbooks, system_prompt_additions, )docs/installation/python-installation.md (1)
45-59: Consider documenting session-scoped investigation_id behavior.ai.investigation_id is created once per ToolCallingLLM instance. In the Tool Details example you reuse the same ai for multiple questions, so todos will persist across all questions in that session. Consider a short note showing how to force a fresh investigation_id per question (e.g., pass str(uuid.uuid4()) explicitly) if users want isolation.
tests/core/test_prompt.py (2)
33-36: API update coverage looks good; consider deduping test UUID generation.Passing a UUID as the new 5th argument aligns tests with the updated signature. To reduce duplication, you could generate a single value at test start or via a fixture.
Also applies to: 54-57, 78-81, 103-106, 131-134
26-36: Verify tests also cover investigation_id propagation into prompts (optional).Current assertions don’t validate that investigation_id/todo_list reach the system prompt. If feasible, add a focused assertion in one test to ensure the system prompt contains investigation_id, guarding against regressions.
holmes/core/tool_calling_llm.py (2)
214-215: Add explicit type annotation for investigation_id to satisfy mypy.Minor improvement to make the attribute type explicit.
Apply:
- self.investigation_id = str(uuid.uuid4()) + self.investigation_id: str = str(uuid.uuid4())
788-801: Optional: avoid injecting empty todo_list.format_tasks_for_prompt returns "" when no tasks. You can omit the key entirely when empty to reduce prompt noise.
Example:
- todo_context = todo_manager.format_tasks_for_prompt(self.investigation_id) - system_prompt = load_and_render_prompt( + todo_context = todo_manager.format_tasks_for_prompt(self.investigation_id) + ctx = { prompt, - { + { "issue": issue, "sections": sections, "structured_output": request_structured_output_from_llm, "toolsets": self.tool_executor.toolsets, "cluster_name": self.cluster_name, - "todo_list": todo_context, "investigation_id": self.investigation_id, }, - ) + } + if todo_context: + ctx[1]["todo_list"] = todo_context + system_prompt = load_and_render_prompt(*ctx)holmes/core/openai_formatting.py (1)
28-36: Make nested object “required” precise (don’t force all properties unless strict_mode).Currently, in strict_mode you require all nested properties. That may be intended, but outside strict_mode you should only require those explicitly marked required. Also, only set “required” if the list is non-empty.
Apply:
- if strict_mode: - type_obj["required"] = list(param_attributes.properties.keys()) + # Only require explicitly-required props unless strict_mode is enabled + if strict_mode: + required_props = list(param_attributes.properties.keys()) + else: + required_props = [ + name + for name, prop in param_attributes.properties.items() + if getattr(prop, "required", False) + ] + if required_props: + type_obj["required"] = required_propsholmes/plugins/prompts/_general_instructions.jinja2 (1)
54-65: Revisit “FIRST tool call MUST be TodoWrite” rigidity.This may be overly prescriptive for trivial single-step queries and could conflict with earlier guidance (“run as many tools as you need, then respond”). Consider softening: “If the investigation requires multiple steps, first create a plan with TodoWrite,” or gate the block behind a condition in the template context.
tests/core/test_todo_manager.py (2)
76-79: Priority assertions likely inconsistent with current Task model/formatterThe Task model (and the formatter) in the provided context do not include a priority attribute, and the formatter spec doesn’t mention rendering priority markers. These assertions will likely fail unless the formatter was extended to include priority labels regardless of input.
Options:
- If priority isn’t supported in TodoListManager.format_tasks_for_prompt, remove these assertions:
- # Check priority - assert "(HIGH)" in prompt_context - assert "(MED)" in prompt_context - assert "(LOW)" in prompt_context
- If priority is intended, update the Task model and formatter to include and render priority, and extend the test tasks with priority fields to match expectations.
39-45: Order-sensitive checks are OK, but also consider verifying status indicators per taskYou verify the retrieval order and presence of status indicators elsewhere. For tighter coverage, consider asserting that each retrieved task’s line in the prompt contains the correct status indicator ([ ], [~], [✓]) alongside its id/content.
I can draft an additional test assertion block if you want to tighten this.
holmes/core/prompt.py (2)
33-36: Signature change: add investigation_id — make sure the docstring lists itThe function signature now includes investigation_id, but the docstring’s Args section doesn’t. Add it for clarity and mypy-oriented readability.
Apply this diff to document the new parameter:
def build_initial_ask_messages( console: Console, initial_user_prompt: str, file_paths: Optional[List[Path]], tool_executor: Any, # ToolExecutor type investigation_id: str, runbooks: Union[RunbookCatalog, Dict, None] = None, system_prompt_additions: Optional[str] = None, ) -> List[Dict]: """Build the initial messages for the AI call. Args: console: Rich console for output initial_user_prompt: The user's prompt file_paths: Optional list of files to include tool_executor: The tool executor with available toolsets + investigation_id: Unique identifier for the current investigation/session (used for scoping todos and context) runbooks: Optional runbook catalog system_prompt_additions: Optional additional system prompt content """
64-70: Fix grammar/numbering in the system reminder; remove an empty “3.” itemThere’s a stray “3.” with no content and minor grammar issues (“Ask your self”). Clean this up to avoid confusing the model.
Apply this diff:
- user_prompt_with_files += ( - "\n\n<system-reminder>\nIMPORTANT: You have access to the TodoWrite tool. It creates a TodoList, in order to track progress. It's very important. You MUST use it:\n1. FIRST: Ask your self which sub problems you need to solve in order to answer the question." - "Do this, BEFORE any other tools\n2. " - "AFTER EVERY TOOL CALL: If required, update the TodoList\n3. " - "\n\nFAILURE TO UPDATE TodoList = INCOMPLETE INVESTIGATION\n\n" - "Example flow:\n- Think and divide to sub problems → create TodoList → Perform each task on the list → Update list → Verify your solution\n</system-reminder>" - ) + user_prompt_with_files += ( + "\n\n<system-reminder>\n" + "IMPORTANT: You have access to the TodoWrite tool. It creates a TodoList to track progress. You MUST use it:\n" + "1. FIRST: Ask yourself which sub-problems you need to solve in order to answer the question. Do this BEFORE using any other tools.\n" + "2. AFTER EVERY TOOL CALL: If required, update the TodoList.\n" + "3. ON COMPLETION: Verify the TodoList reflects the final state of the investigation.\n" + "\nFAILURE TO UPDATE THE TodoList = INCOMPLETE INVESTIGATION\n\n" + "Example flow:\n- Think and divide into sub-problems → create TodoList → Perform each task on the list → Update list → Verify your solution\n" + "</system-reminder>" + )holmes/plugins/toolsets/investigator/investigator_instructions.jinja2 (1)
58-61: Overly strict response-length constraint may degrade UX“You MUST answer concisely with fewer than 4 lines” is very constraining for investigations and may cut off useful explanations. Consider softening or making it context-dependent.
I can propose a nuanced rule (e.g., concise by default; allow detail when asked or when summarizing findings with evidence).
tests/core/test_todo_write_tool.py (1)
24-43: Consider explicitly passing investigation_id in tests to reflect the tool contractIf TodoWriteTool requires investigation_id (as suggested elsewhere in the PR), calling _invoke without it will either fail or rely on hidden defaults. Tests should be explicit to avoid future flakiness.
Example (if required by the tool):
- params = { + params = { + "investigation_id": "test-session", "todos": [ { "id": "1", "content": "Check pod status", "status": "pending", "priority": "high", }, { "id": "2", "content": "Analyze logs", "status": "in_progress", "priority": "medium", }, ] }If the tool is designed to tolerate a missing investigation_id by using a session default, document that behavior in the tool and keep this test as-is.
holmes/core/todo_manager.py (2)
25-29: Slight simplification in clear_session.Use pop with a default to simplify and avoid the membership check.
def clear_session(self, session_id: str) -> None: with self._lock: - if session_id in self._sessions: - del self._sessions[session_id] + self._sessions.pop(session_id, None)
34-78: Prefer TaskStatus over raw strings to avoid drift.Use the enum directly for ordering and indicators to prevent string literal duplication.
- status_order = {"pending": 0, "in_progress": 1, "completed": 2} + status_order = { + TaskStatus.PENDING: 0, + TaskStatus.IN_PROGRESS: 1, + TaskStatus.COMPLETED: 2, + } sorted_tasks = sorted( tasks, - key=lambda t: (status_order.get(t.status.value, 3),), + key=lambda t: (status_order.get(t.status, 3),), ) @@ - status_indicator = { - "pending": "[ ]", - "in_progress": "[~]", - "completed": "[✓]", - }.get(task.status.value, "[?]") + status_indicator = { + TaskStatus.PENDING: "[ ]", + TaskStatus.IN_PROGRESS: "[~]", + TaskStatus.COMPLETED: "[✓]", + }.get(task.status, "[?]")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (29)
docs/installation/python-installation.md(3 hunks)examples/custom_llm.py(1 hunks)holmes/config.py(1 hunks)holmes/core/conversations.py(11 hunks)holmes/core/investigation.py(3 hunks)holmes/core/openai_formatting.py(1 hunks)holmes/core/prompt.py(3 hunks)holmes/core/todo_manager.py(1 hunks)holmes/core/tool_calling_llm.py(5 hunks)holmes/core/tools.py(1 hunks)holmes/interactive.py(1 hunks)holmes/main.py(2 hunks)holmes/plugins/prompts/_general_instructions.jinja2(2 hunks)holmes/plugins/prompts/generic_ask.jinja2(1 hunks)holmes/plugins/prompts/investigation_procedure.jinja2(1 hunks)holmes/plugins/prompts/kubernetes_workload_ask.jinja2(1 hunks)holmes/plugins/toolsets/__init__.py(2 hunks)holmes/plugins/toolsets/investigator/core_investigation.py(1 hunks)holmes/plugins/toolsets/investigator/investigator_instructions.jinja2(1 hunks)holmes/plugins/toolsets/investigator/model.py(1 hunks)server.py(1 hunks)tests/core/test_prompt.py(6 hunks)tests/core/test_todo_manager.py(1 hunks)tests/core/test_todo_write_tool.py(1 hunks)tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/test_case.yaml(1 hunks)tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml(1 hunks)tests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml(1 hunks)tests/llm/test_ask_holmes.py(2 hunks)tests/plugins/toolsets/test_core_investigation.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py: ALWAYS place Python imports at the top of the file, not inside functions or methods
Type hints are required (project is type-checked with mypy)
Use Ruff for formatting and linting (configured in pyproject.toml)
Files:
holmes/core/tools.pyexamples/custom_llm.pyserver.pytests/plugins/toolsets/test_core_investigation.pyholmes/plugins/toolsets/investigator/model.pyholmes/core/investigation.pyholmes/core/conversations.pyholmes/interactive.pytests/core/test_todo_write_tool.pyholmes/core/openai_formatting.pytests/core/test_prompt.pyholmes/core/todo_manager.pytests/core/test_todo_manager.pytests/llm/test_ask_holmes.pyholmes/plugins/toolsets/__init__.pyholmes/core/prompt.pyholmes/main.pyholmes/config.pyholmes/core/tool_calling_llm.pyholmes/plugins/toolsets/investigator/core_investigation.py
{tests/**/*.py,pyproject.toml}
📄 CodeRabbit Inference Engine (CLAUDE.md)
Only use pytest markers that are defined in pyproject.toml; never introduce undefined markers in tests
Files:
tests/plugins/toolsets/test_core_investigation.pytests/core/test_todo_write_tool.pytests/core/test_prompt.pytests/core/test_todo_manager.pytests/llm/test_ask_holmes.py
tests/llm/**/test_case.yaml
📄 CodeRabbit Inference Engine (CLAUDE.md)
Do NOT place toolset configuration inside test_case.yaml for evals
Files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/test_case.yamltests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml
tests/llm/**/*.yaml
📄 CodeRabbit Inference Engine (CLAUDE.md)
tests/llm/**/*.yaml: In evals, ALWAYS use Kubernetes Secrets for scripts; do not embed scripts in inline manifests or ConfigMaps
Each eval test must use a dedicated Kubernetes namespace named app-
All pod names in evals must be unique and should not hint at the problem (use neutral names)
Files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/test_case.yamltests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yamltests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml
tests/llm/**/toolsets.yaml
📄 CodeRabbit Inference Engine (CLAUDE.md)
tests/llm/**/toolsets.yaml: Eval-specific toolset configuration must be defined in a separate toolsets.yaml file in the test directory
In eval toolsets.yaml, all toolset-specific configuration must be under a config field
Files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml
{holmes/plugins/toolsets/**/*.yaml,tests/llm/**/toolsets.yaml}
📄 CodeRabbit Inference Engine (CLAUDE.md)
Allowed top-level fields for toolset YAMLs are limited to: enabled, name, description, additional_instructions, prerequisites, tools, docs_url, icon_url, installation_instructions, config, url (MCP only)
Files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml
holmes/plugins/prompts/*.jinja2
📄 CodeRabbit Inference Engine (CLAUDE.md)
Prompts must be Jinja2 templates named {name}.jinja2 under holmes/plugins/prompts/
Files:
holmes/plugins/prompts/kubernetes_workload_ask.jinja2holmes/plugins/prompts/investigation_procedure.jinja2holmes/plugins/prompts/_general_instructions.jinja2holmes/plugins/prompts/generic_ask.jinja2
docs/**/*.md
📄 CodeRabbit Inference Engine (CLAUDE.md)
In MkDocs docs, always add a blank line between a header/bold text and a following list
Files:
docs/installation/python-installation.md
🧠 Learnings (8)
📚 Learning: 2025-08-17T08:42:48.763Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T08:42:48.763Z
Learning: Applies to {holmes/plugins/toolsets/**/*.yaml,tests/llm/**/toolsets.yaml} : Allowed top-level fields for toolset YAMLs are limited to: enabled, name, description, additional_instructions, prerequisites, tools, docs_url, icon_url, installation_instructions, config, url (MCP only)
Applied to files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml
📚 Learning: 2025-08-17T08:42:48.763Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T08:42:48.763Z
Learning: Applies to tests/llm/**/toolsets.yaml : Eval-specific toolset configuration must be defined in a separate toolsets.yaml file in the test directory
Applied to files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml
📚 Learning: 2025-08-17T08:42:48.763Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T08:42:48.763Z
Learning: Applies to tests/llm/**/toolsets.yaml : In eval toolsets.yaml, all toolset-specific configuration must be under a config field
Applied to files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml
📚 Learning: 2025-08-17T08:42:48.763Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T08:42:48.763Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets must reside under holmes/plugins/toolsets/ as {name}.yaml or within a {name}/ directory
Applied to files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yamlholmes/plugins/toolsets/__init__.py
📚 Learning: 2025-08-17T08:42:48.763Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T08:42:48.763Z
Learning: Applies to tests/llm/**/test_case.yaml : Do NOT place toolset configuration inside test_case.yaml for evals
Applied to files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml
📚 Learning: 2025-08-05T00:42:23.792Z
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.
Applied to files:
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml
📚 Learning: 2025-08-13T05:57:40.420Z
Learnt from: mainred
PR: robusta-dev/holmesgpt#829
File: holmes/plugins/runbooks/runbook-format.prompt.md:11-22
Timestamp: 2025-08-13T05:57:40.420Z
Learning: In holmes/plugins/runbooks/runbook-format.prompt.md, the user (mainred) prefers to keep the runbook step specifications simple without detailed orchestration metadata like step IDs, dependencies, retry policies, timeouts, and storage variables. The current format with Action, Function Description, Parameters, Expected Output, and Success/Failure Criteria is sufficient for their AI agent troubleshooting use case.
Applied to files:
holmes/plugins/prompts/_general_instructions.jinja2
📚 Learning: 2025-08-17T08:42:48.763Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T08:42:48.763Z
Learning: Applies to holmes/plugins/prompts/*.jinja2 : Prompts must be Jinja2 templates named {name}.jinja2 under holmes/plugins/prompts/
Applied to files:
holmes/plugins/prompts/generic_ask.jinja2
🧬 Code Graph Analysis (9)
examples/custom_llm.py (1)
holmes/core/tool_calling_llm.py (1)
ToolCallingLLM(204-713)
tests/plugins/toolsets/test_core_investigation.py (2)
holmes/plugins/toolsets/investigator/core_investigation.py (3)
CoreInvestigationToolset(134-156)TodoWriteTool(22-131)get_example_config(149-150)holmes/core/tools.py (6)
ToolsetStatusEnum(103-106)ToolsetTag(109-112)check_prerequisites(418-473)get_example_config(476-477)get_example_config(495-496)get_example_config(528-529)
holmes/core/investigation.py (1)
holmes/core/todo_manager.py (2)
get_todo_manager(83-84)format_tasks_for_prompt(34-77)
tests/core/test_todo_write_tool.py (3)
holmes/core/tools.py (6)
ToolResultStatus(23-42)_invoke(174-175)_invoke(226-254)get_parameterized_one_liner(178-179)get_parameterized_one_liner(205-212)get_openai_format(142-148)holmes/plugins/toolsets/investigator/core_investigation.py (3)
TodoWriteTool(22-131)_invoke(83-127)get_parameterized_one_liner(129-131)holmes/plugins/toolsets/investigator/model.py (1)
TaskStatus(6-9)
holmes/core/todo_manager.py (1)
holmes/plugins/toolsets/investigator/model.py (2)
Task(12-15)TaskStatus(6-9)
tests/core/test_todo_manager.py (2)
holmes/core/todo_manager.py (7)
TodoListManager(7-77)get_todo_manager(83-84)get_session_count(30-32)get_session_tasks(17-19)format_tasks_for_prompt(34-77)update_session_tasks(21-23)clear_session(25-28)holmes/plugins/toolsets/investigator/model.py (2)
Task(12-15)TaskStatus(6-9)
holmes/plugins/toolsets/__init__.py (1)
holmes/plugins/toolsets/investigator/core_investigation.py (1)
CoreInvestigationToolset(134-156)
holmes/core/tool_calling_llm.py (1)
holmes/core/todo_manager.py (2)
get_todo_manager(83-84)format_tasks_for_prompt(34-77)
holmes/plugins/toolsets/investigator/core_investigation.py (4)
holmes/core/todo_manager.py (3)
get_todo_manager(83-84)update_session_tasks(21-23)format_tasks_for_prompt(34-77)holmes/core/tools.py (15)
Toolset(342-484)ToolsetTag(109-112)ToolsetStatusEnum(103-106)ToolParameter(121-126)Tool(129-179)StructuredToolResult(45-69)ToolResultStatus(23-42)_invoke(174-175)_invoke(226-254)get_parameterized_one_liner(178-179)get_parameterized_one_liner(205-212)get_example_config(476-477)get_example_config(495-496)get_example_config(528-529)_load_llm_instructions(479-484)holmes/plugins/toolsets/investigator/model.py (2)
Task(12-15)TaskStatus(6-9)holmes/utils/cache.py (1)
get(64-74)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
🔇 Additional comments (31)
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/test_case.yaml (1)
1-2: Prompt fixture looks good and aligns with test intentThe user prompt is clear and minimal. No toolset configuration is embedded here, which complies with the testing guidelines.
tests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml (1)
8-8: Readiness wait added — LGTMThe
kubectl wait --for=condition=availableaddition reduces test flakiness by ensuring the deployment is ready before assertions.holmes/plugins/toolsets/investigator/model.py (1)
6-15: Task and TaskStatus model — LGTMEnum values align with the prompt’s required statuses, and the
iddefault_factory ensures UUID strings. Good Pydantic usage.holmes/plugins/prompts/investigation_procedure.jinja2 (2)
9-17: Status values align with TaskStatus enum — LGTMThe template’s required statuses (“pending”, “in_progress”, “completed”) match the TaskStatus enum. Good consistency.
88-91: Todo list rendering — LGTMConditional rendering of
todo_listis correct and keeps the prompt focused when no list exists.examples/custom_llm.py (1)
58-58: Example updated to 40 steps — LGTMExample aligns with the new default and helps users mirror production behavior.
holmes/plugins/prompts/kubernetes_workload_ask.jinja2 (1)
9-10: Verify context variables forinvestigation_procedure.jinja2includeThe include path is correct and the template exists, but it references both
investigation_idandtodo_listwithout StrictUndefined, so missing values will render silently. Please confirm that your renderer forkubernetes_workload_ask.jinja2supplies these in its context:• File: holmes/plugins/prompts/kubernetes_workload_ask.jinja2 (line 9)
• Template: holmes/plugins/prompts/investigation_procedure.jinja2
– Usesinvestigation_idinside{% if investigation_id %}…{% endif %}
– Usestodo_listinside{% if todo_list %}…{% endif %}Ensure upstream callers include both keys (e.g., in your call to
render_templateor equivalent).server.py (1)
219-220: Confirmed:ai.investigation_idis always defined on LLM instances
Theconfig.create_toolcalling_llmfactory returns aToolCallingLLM, whose__init__unconditionally setsself.investigation_id = str(uuid.uuid4()). There are no alternative code paths returning a plainLLM, soai.investigation_idis guaranteed to exist.holmes/core/investigation.py (2)
13-13: LGTM: Import placed correctly per project guidelines.Importing
get_todo_managerat the top complies with the repo’s import policy.
148-150: Good: Propagating todo_list and investigation_id into the system prompt.This aligns the investigation prompt with the new TodoWrite flow and per-investigation context.
holmes/plugins/toolsets/__init__.py (2)
47-49: Importing CoreInvestigationToolset looks correctThe import path matches the provided toolset implementation and keeps imports at the top of the file.
74-76: No duplicate YAML-defined toolset namedcore_investigation
Verified there are no YAML files orname: core_investigationentries in the repo. PrependingCoreInvestigationToolsetposes no risk of duplicate definitions.holmes/plugins/prompts/generic_ask.jinja2 (1)
11-17: Approved: Required Jinja2 partials verifiedAll shared instruction blocks are present under holmes/plugins/prompts, so the includes will resolve correctly:
- holmes/plugins/prompts/_current_date_time.jinja2
- holmes/plugins/prompts/_general_instructions.jinja2
- holmes/plugins/prompts/_runbook_instructions.jinja2
Ready to merge.
holmes/core/conversations.py (3)
136-137: Propagation of investigation_id in issue chat prompts: LGTMinvestigation_id is consistently added to all template contexts (no-tools, truncated-tools, and history-update paths). This keeps system prompts aligned across flows.
Also applies to: 157-158, 191-192, 233-234, 257-258
474-475: Propagation of investigation_id in workload health prompts: LGTMThe investigation_id is added across all workload-health paths (no-tools, truncated-tools, and history-update). Consistent and correct.
Also applies to: 495-496, 529-530, 571-572, 595-596
282-283: Ensure ToolCallingLLM Always Initializes a Non-Emptyinvestigation_idI didn’t find any assignment of
self.investigation_idin theToolCallingLLMclass, so it’s unclear whetherai.investigation_idcan ever beNone. Please verify and, if needed, update the constructor to always set a valid ID (or provide a default/fallback) to prevent"None"from leaking into your prompts or breaking template assumptions.• Inspect
class ToolCallingLLMinholmes/.../tool_calling_llm.py(or wherever it’s defined) and confirm its__init__takes and assignsinvestigation_id.
• If no explicit assignment exists, add something like:def __init__(..., investigation_id: str, ...): self.investigation_id = investigation_id or generate_default_id()• Review all instantiations of
ToolCallingLLMto ensure they pass a non-emptyinvestigation_id.
• Optionally add a runtime assertion or fallback insideToolCallingLLMto guard against empty IDs.docs/installation/python-installation.md (1)
51-52: No changes needed:build_initial_ask_messagesusesinvestigation_id
The function signature in holmes/core/prompt.py clearly defines the fifth parameter asinvestigation_id, and all call sites (including tests) rely on positional or keywordinvestigation_id. There is norequest_idparameter—docs examples are correct as written.tests/core/test_prompt.py (1)
1-1: LGTM: import placed at top-level.uuid import placement complies with project import rules.
holmes/core/tool_calling_llm.py (2)
5-5: LGTM: imports are correctly placed at module top.uuid and get_todo_manager are imported at the top, satisfying the import placement rule.
Also applies to: 42-44
788-801: Good addition: per-investigation todo context in system prompt.Fetching todo_context via TodoManager and injecting both todo_list and investigation_id into the system prompt is a clean integration and keeps prompts session-aware.
holmes/core/openai_formatting.py (1)
37-47: Array handling looks solid; confirm strictness expectations.When items are provided, recursion will enforce nested strictness on object items. For the fallback branch, you add additionalProperties: False under strict_mode, which is good. No action needed.
holmes/plugins/prompts/_general_instructions.jinja2 (1)
1-1: Include path looks correct.investigation_procedure.jinja2 is under the same prompts directory; include should resolve in the current loader.
holmes/main.py (2)
300-308: Good: investigation_id is correctly propagated into the ask flowPassing ai.investigation_id to build_initial_ask_messages aligns with the updated signature and ensures session-scoped context reaches prompt construction.
286-297: Investigation ID propagation verifiedBoth the non-interactive and interactive paths pass
ai.investigation_idintobuild_initial_ask_messages, so the interactive flow already propagates the investigation ID as intended:
- holmes/main.py (non-interactive): call at lines 300–305 includes
ai.investigation_id- holmes/interactive.py (interactive): call at lines 1000–1005 includes
ai.investigation_idNo changes required.
holmes/core/prompt.py (1)
49-55: LGTM: investigation_id included in template contextThis makes investigation_id available to the system prompt template as intended.
tests/core/test_todo_write_tool.py (1)
68-86: Invalid enum handling: ensure both status and priority validations feed consistent errorsYou expect an error for invalid status/priority. Confirm the tool validates both and returns a stable error prefix like “Failed to process tasks” as asserted here.
If not already implemented, I can help add strict enum validation with clear error aggregation for per-task failures.
holmes/core/todo_manager.py (3)
17-24: Good defensive copies with locking.Returning a copy and storing a copied list under lock avoids external mutation and concurrent hazards. LGTM.
30-33: Session count under lock is correct.Straightforward and thread-safe.
80-85: Singleton accessor is fine.The module-level instance with a typed getter is appropriate for shared state.
holmes/plugins/toolsets/investigator/core_investigation.py (2)
109-120: Result construction looks good.Success path returns a clear message and includes the formatted tasks for immediate feedback. LGTM.
121-128: Error handling is appropriate.Exception is logged with stack trace and returned as a structured error result. LGTM.
holmes/plugins/toolsets/investigator/investigator_instructions.jinja2
Outdated
Show resolved
Hide resolved
tests/llm/fixtures/test_ask_holmes/150_what_is_wrong_with_mimir/toolsets.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
holmes/core/prompt.py (3)
28-36: Polish the reminder text: fix grammar and a missing space (user-visible).The concatenated string currently yields “question.Do this” (missing space) and has a couple of typos. Suggest tightening grammar without changing meaning.
Apply this diff:
def get_tasks_management_system_reminder() -> str: return ( - "\n\n<system-reminder>\nIMPORTANT: You have access to the TodoWrite tool. It creates a TodoList, in order to track progress. It's very important. You MUST use it:\n1. FIRST: Ask your self which sub problems you need to solve in order to answer the question." - "Do this, BEFORE any other tools\n2. " + "\n\n<system-reminder>\nIMPORTANT: You have access to the TodoWrite tool. It creates a TodoList, in order to track progress. It's very important. You MUST use it:\n1. FIRST: Ask yourself which sub-problems you need to solve in order to answer the question. " + "Do this, BEFORE any other tools\n2. " "AFTER EVERY TOOL CALL: If required, update the TodoList\n3. " "\n\nFAILURE TO UPDATE TodoList = INCOMPLETE INVESTIGATION\n\n" - "Example flow:\n- Think and divide to sub problems → create TodoList → Perform each task on the list → Update list → Verify your solution\n</system-reminder>" + "Example flow:\n- Think and divide into sub-problems → create TodoList → Perform each task on the list → Update list → Verify your solution\n</system-reminder>" )
38-46: Tighten return typing for messages.Return type is too generic (List[Dict]). It’s better as List[Dict[str, str]] to help mypy and callers.
Apply this diff:
-) -> List[Dict]: +) -> List[Dict[str, str]]:Optional (stronger typing): define a TypedDict and use it as the return type.
# at top of file from typing import TypedDict, Literal class ChatMessage(TypedDict): role: Literal["system", "user"] content: strThen change the return annotation to List[ChatMessage].
74-75: Consider moving the reminder into the system prompt instead of the user message.System-level guidance carries higher priority in most chat stacks and is less likely to be overridden by user content. If you agree, append the reminder to system_prompt_rendered and drop it from the user content.
Suggested change:
@@ - user_prompt_with_files += get_tasks_management_system_reminder() + system_prompt_rendered += get_tasks_management_system_reminder()tests/core/test_prompt.py (4)
27-37: Solid update to pass investigation_id into the builder.Consider reducing duplication by introducing a fixture for investigation_id so you don’t repeat str(uuid.uuid4()) in every test.
Example fixture:
import uuid import pytest @pytest.fixture def investigation_id() -> str: return str(uuid.uuid4())Then use it in calls to build_initial_ask_messages.
72-86: Add an assertion that the reminder is present when attachments are used.Currently you check file markers and content; add a check for the appended reminder to cover this path as well.
Apply this diff:
assert "File content here" in messages[1]["content"] # Check for file attachment markers assert "<attached-file" in messages[1]["content"] assert "test.txt" in messages[1]["content"] assert "</attached-file>" in messages[1]["content"] + # The task management reminder should still be appended + assert get_tasks_management_system_reminder() in messages[1]["content"]
122-141: Also assert the reminder in the “all parameters” test.This ensures the reminder is appended regardless of optional args.
Apply this diff:
assert "Test prompt" in messages[1]["content"] assert "File content" in messages[1]["content"] + assert get_tasks_management_system_reminder() in messages[1]["content"]
27-37: Optional: add a focused test to verify investigation_id is passed into the template context.Right now we don’t assert that the template receives investigation_id. You can monkeypatch load_and_render_prompt to capture the context and assert it contains the key.
Prototype test:
def test_system_template_receives_investigation_id(monkeypatch, console, mock_tool_executor): captured = {} def fake_render(template, ctx): captured.update(ctx) return "SYSTEM" monkeypatch.setattr("holmes.core.prompt.load_and_render_prompt", fake_render) messages = build_initial_ask_messages( console, "Test prompt", None, mock_tool_executor, "inv-123", None, None, ) assert captured.get("investigation_id") == "inv-123" assert messages[0]["content"] == "SYSTEM"Also applies to: 47-61, 72-86, 103-111, 122-141
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
holmes/core/prompt.py(3 hunks)tests/core/test_prompt.py(9 hunks)tests/core/test_todo_manager.py(1 hunks)tests/core/test_todo_write_tool.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/core/test_todo_manager.py
- tests/core/test_todo_write_tool.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py: ALWAYS place Python imports at the top of the file, not inside functions or methods
Type hints are required (project is type-checked with mypy)
Use Ruff for formatting and linting (configured in pyproject.toml)
Files:
tests/core/test_prompt.pyholmes/core/prompt.py
{tests/**/*.py,pyproject.toml}
📄 CodeRabbit Inference Engine (CLAUDE.md)
Only use pytest markers that are defined in pyproject.toml; never introduce undefined markers in tests
Files:
tests/core/test_prompt.py
🧬 Code Graph Analysis (1)
tests/core/test_prompt.py (1)
holmes/core/prompt.py (1)
get_tasks_management_system_reminder(28-35)
🔇 Additional comments (5)
holmes/core/prompt.py (2)
63-64: Injecting investigation_id into template context is spot-on.This enables templates to reference the current investigation, aligning with the broader per-investigation context changes.
43-43: All call sites updated to passinvestigation_id.
Ran the AST-based verification script across the repo; no calls tobuild_initial_ask_messagesare missing the new parameter. No further action required.tests/core/test_prompt.py (3)
1-2: LGTM on the UUID import.Using a fresh investigation_id per test is appropriate.
7-12: Importing the new helper keeps expectations stable.Asserting against get_tasks_management_system_reminder() avoids brittle hard-coding.
42-45: Assertion of the appended reminder is good coverage.This verifies the new behavior explicitly for the basic path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
holmes/core/openai_formatting.py (1)
70-73: Add explicit return type and generics for mypy complianceProject policy requires type hints; the function lacks a return type, and
tool_parametersis an unparameterizeddict. Adding types avoids mypy complaints.-def format_tool_to_open_ai_standard( - tool_name: str, tool_description: str, tool_parameters: dict, target_model: str -): +def format_tool_to_open_ai_standard( + tool_name: str, + tool_description: str, + tool_parameters: dict[str, Any], + target_model: str, +) -> dict[str, Any]: - tool_properties = {} + tool_properties: dict[str, Any] = {}
🧹 Nitpick comments (2)
holmes/core/openai_formatting.py (2)
28-36: Great recursion for nested objects; consider propagating property descriptionsWhen mapping nested object properties, we currently drop each property's description. Keeping descriptions helps LLMs adhere to constraints and improves dev ergonomics. Also, switching from a dict comprehension to a small loop increases clarity and lets us attach metadata.
Apply:
- if hasattr(param_attributes, "properties") and param_attributes.properties: - type_obj["properties"] = { - name: type_to_open_ai_schema(prop, strict_mode) - for name, prop in param_attributes.properties.items() - } - if strict_mode: - type_obj["required"] = list(param_attributes.properties.keys()) + if hasattr(param_attributes, "properties") and param_attributes.properties: + props: dict[str, Any] = {} + for name, prop in param_attributes.properties.items(): + schema = type_to_open_ai_schema(prop, strict_mode) + desc = getattr(prop, "description", None) + if desc: + schema["description"] = desc + props[name] = schema + type_obj["properties"] = props + if strict_mode: + type_obj["required"] = list(param_attributes.properties.keys())
37-47: Arrays: explicit items schema is good; also carry item descriptionsThe explicit items path is spot-on. Minor improvement: if the item parameter has a description, attach it onto the items schema as well.
- if hasattr(param_attributes, "items") and param_attributes.items: - items_schema = type_to_open_ai_schema(param_attributes.items, strict_mode) - type_obj = {"type": "array", "items": items_schema} + if hasattr(param_attributes, "items") and param_attributes.items: + items_schema = type_to_open_ai_schema(param_attributes.items, strict_mode) + item_desc = getattr(param_attributes.items, "description", None) + if item_desc: + items_schema["description"] = item_desc + type_obj = {"type": "array", "items": items_schema}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
holmes/core/openai_formatting.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py: ALWAYS place Python imports at the top of the file, not inside functions or methods
Type hints are required (project is type-checked with mypy)
Use Ruff for formatting and linting (configured in pyproject.toml)
Files:
holmes/core/openai_formatting.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
🔇 Additional comments (1)
holmes/core/openai_formatting.py (1)
56-58: No legacyarray[object]usages detected — safe to mergeSearches across Python, YAML, and JSON files for
array[object](includingtype="array[object]") returned no matches. This breaking change can be merged without impacting existing tool definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
holmes/core/todo_manager.py (1)
13-16: Add missing return annotation on init (mypy compliance).Constructor should be annotated with -> None. Lock is already typed, thanks.
Apply this diff:
- def __init__(self): + def __init__(self) -> None: self._sessions: Dict[str, List[Task]] = {} self._lock: Lock = Lock()
🧹 Nitpick comments (6)
holmes/core/todo_manager.py (3)
44-49: Key mappings with string literals are typo-prone; bind to TaskStatus values.Use TaskStatus constants to prevent accidental typos and keep things consistent with the model.
Apply this diff:
- status_order = {"pending": 0, "in_progress": 1, "completed": 2} + status_order = { + TaskStatus.PENDING.value: 0, + TaskStatus.IN_PROGRESS.value: 1, + TaskStatus.COMPLETED.value: 2, + } sorted_tasks = sorted( tasks, - key=lambda t: (status_order.get(t.status.value, 3),), + key=lambda t: (status_order.get(t.status.value, 3),), )
63-70: Same note for status indicator mapping: prefer TaskStatus-based keys.Keeps mapping resilient to typos and aligned to the enum.
Apply this diff:
- status_indicator = { - "pending": "[ ]", - "in_progress": "[~]", - "completed": "[✓]", - }.get(task.status.value, "[?]") + status_indicator = { + TaskStatus.PENDING.value: "[ ]", + TaskStatus.IN_PROGRESS.value: "[~]", + TaskStatus.COMPLETED.value: "[✓]", + }.get(task.status.value, "[?]")
80-84: Global singleton is convenient; consider a reset hook for tests.A simple clear_all() method or a reset helper can reduce test coupling when using the module singleton.
I can add a
clear_all_sessions()method and a pytest fixture to reset state between tests if desired.holmes/plugins/toolsets/investigator/investigator_instructions.jinja2 (3)
151-151: Remove stray closing tag</assistant>.Looks like a copy artifact that could confuse downstream processing.
Apply this diff:
-Let's start by analyzing the detailed CPU metrics and correlating with request patterns.</assistant> +Let's start by analyzing the detailed CPU metrics and correlating with request patterns.
169-172: Fix code block formatting in the Python example.Use fenced code block for clarity and consistent rendering.
Apply this diff:
-python -print("Hello World") +```python +print("Hello World") +```
60-66: Hard constraint “fewer than 4 lines” and “Always use TodoWrite” may be too rigid.These absolutes can degrade UX in nuanced investigations. Consider softening or gating behind a template variable (e.g., {{ concise_mode|default(false) }}).
Proposed wording:
- “Answer concisely. Prefer under 4 lines unless the user asks for detail or context requires more.”
- “Use TodoWrite proactively for multi-step tasks; skip it for trivial single-step requests.”
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
holmes/config.py(1 hunks)holmes/core/todo_manager.py(1 hunks)holmes/plugins/toolsets/investigator/core_investigation.py(1 hunks)holmes/plugins/toolsets/investigator/investigator_instructions.jinja2(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- holmes/config.py
- holmes/plugins/toolsets/investigator/core_investigation.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py: ALWAYS place Python imports at the top of the file, not inside functions or methods
Type hints are required (project is type-checked with mypy)
Use Ruff for formatting and linting (configured in pyproject.toml)
Files:
holmes/core/todo_manager.py
🧬 Code Graph Analysis (1)
holmes/core/todo_manager.py (1)
holmes/plugins/toolsets/investigator/model.py (2)
Task(12-15)TaskStatus(6-9)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build (3.12)
- GitHub Check: build (3.10)
- GitHub Check: build (3.11)
- GitHub Check: build (3.10)
- GitHub Check: build (3.11)
- GitHub Check: build (3.12)
🔇 Additional comments (2)
holmes/plugins/toolsets/investigator/investigator_instructions.jinja2 (2)
1-28: Strong addition: clear guidance and good worked examples.The task management rationale and examples will help drive better, auditable investigations.
243-253: Ignore removal of thepriorityfield
Thepriorityproperty is part of the Task schema and is used throughout the code and tests. No change is required to the Jinja template:• tests/core/test_todo_write_tool.py defines and exercises
priorityin multiple test cases
• TheTodoWriteToolmodel and OpenAI schema generation includepriorityas a valid enum
• The Jinja2 snippet correctly mirrors the supported task propertiesPlease disregard the earlier suggestion to remove
priority.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
holmes/config.py (1)
77-77: Default max_steps increased to 40 — acknowledgedRationale was discussed earlier (“todo tools add more steps”). No further concerns from this file’s side.
holmes/core/todo_manager.py (2)
13-16: Add constructor return annotation (mypy compliance)Constructor should explicitly return None.
- def __init__(self): + def __init__(self) -> None:
17-20: Avoid mutable default in dict.get and tighten typing of returnUsing dict.get(..., []) returns a mutable default and can confuse mypy; also needlessly copies a fresh list.
- def get_session_tasks(self, session_id: str) -> List[Task]: - with self._lock: - return self._sessions.get(session_id, []).copy() + def get_session_tasks(self, session_id: str) -> List[Task]: + with self._lock: + tasks = self._sessions.get(session_id) + return tasks.copy() if tasks is not None else []holmes/plugins/toolsets/investigator/core_investigation.py (3)
139-149: Annotate constructor return type (mypy compliance)Constructor should explicitly return None.
- def __init__(self): + def __init__(self) -> None:
153-158: Annotate _reload_instructions return type (mypy compliance)Explicit None return keeps type checking clean.
- def _reload_instructions(self): + def _reload_instructions(self) -> None:
102-106: Validate investigation_id to avoid cross-session contaminationEmpty/whitespace IDs lead to storing tasks under an empty key.
- session_id = params.get("investigation_id", "") - todo_manager.update_session_tasks(session_id, tasks) + session_id = str(params.get("investigation_id", "")).strip() + if not session_id: + return StructuredToolResult( + status=ToolResultStatus.ERROR, + error="investigation_id is required but was missing or empty", + params=params, + ) + todo_manager.update_session_tasks(session_id, tasks)
🧹 Nitpick comments (5)
holmes/core/todo_manager.py (1)
84-88: Optional: Mark singleton as Final to prevent reassignmentMinor typing hardening for the module-level singleton.
-_todo_manager = TodoListManager() +_todo_manager: Final[TodoListManager] = TodoListManager()Add this import at the top of the file:
from typing import Finalholmes/plugins/toolsets/investigator/core_investigation.py (4)
3-6: Import List for typing and drop uuid4 (rely on Task default id)Preps for type hints and removes unnecessary dependency.
-from typing import Any, Dict +from typing import Any, Dict, List - -from uuid import uuid4
25-37: Relax id requirement; document allowed status values in schemaRuntime generates IDs when missing; schema should reflect that. Also clarify allowed statuses.
"todos": ToolParameter( description="COMPLETE list of ALL tasks on the task list. Each task should have: id (string), content (string), status (pending/in_progress/completed)", type="array", required=True, items=ToolParameter( type="object", properties={ - "id": ToolParameter(type="string", required=True), + "id": ToolParameter(type="string", required=False), "content": ToolParameter(type="string", required=True), - "status": ToolParameter(type="string", required=True), + "status": ToolParameter( + type="string", + required=True, + description="one of: pending, in_progress, completed", + ), }, ), ),
45-49: Add type hints to print_tasks_table (mypy requirement)Annotate parameters and return type.
- def print_tasks_table(self, tasks): + def print_tasks_table(self, tasks: List[Task]) -> None:
89-99: Leverage Task’s default id; tighten local typing and input handlingRemoves ad-hoc uuid, uses Task’s default id, and adds types.
- tasks = [] + tasks: List[Task] = [] for todo_item in todos_data: if isinstance(todo_item, dict): - task = Task( - id=todo_item.get("id", str(uuid4())), - content=todo_item.get("content", ""), - status=TaskStatus(todo_item.get("status", "pending")), - ) - tasks.append(task) + status = TaskStatus(todo_item.get("status", "pending")) + content = str(todo_item.get("content", "")).strip() + id_val = todo_item.get("id") + if isinstance(id_val, str) and id_val: + tasks.append(Task(id=id_val, content=content, status=status)) + else: + tasks.append(Task(content=content, status=status)) + else: + logging.warning("Skipping non-dict todo item: %r", todo_item)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
holmes/config.py(1 hunks)holmes/core/todo_manager.py(1 hunks)holmes/plugins/toolsets/investigator/core_investigation.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py: ALWAYS place Python imports at the top of the file, not inside functions or methods
Type hints are required (project is type-checked with mypy)
Use Ruff for formatting and linting (configured in pyproject.toml)
Files:
holmes/config.pyholmes/core/todo_manager.pyholmes/plugins/toolsets/investigator/core_investigation.py
🧬 Code Graph Analysis (2)
holmes/core/todo_manager.py (1)
holmes/plugins/toolsets/investigator/model.py (2)
Task(12-15)TaskStatus(6-9)
holmes/plugins/toolsets/investigator/core_investigation.py (3)
holmes/core/todo_manager.py (3)
get_todo_manager(87-88)update_session_tasks(21-23)format_tasks_for_prompt(34-81)holmes/core/tools.py (8)
ToolParameter(121-126)Tool(129-179)StructuredToolResult(45-69)ToolResultStatus(23-42)_invoke(174-175)_invoke(226-254)get_parameterized_one_liner(178-179)get_parameterized_one_liner(205-212)holmes/plugins/toolsets/investigator/model.py (2)
Task(12-15)TaskStatus(6-9)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build (3.10)
- GitHub Check: build (3.12)
- GitHub Check: build (3.11)
- GitHub Check: build (3.12)
- GitHub Check: build (3.11)
- GitHub Check: build (3.10)
- GitHub Check: llm_evals
Results of HolmesGPT evals
Legend
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my comments are late.
|
|
||
| class TodoWriteTool(Tool): | ||
| name: str = "TodoWrite" | ||
| description: str = "Save investigation tasks to break down complex problems into manageable sub-tasks. ALWAYS provide the COMPLETE list of all tasks, not just the ones being updated." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of calls should have no dependencies, so it's ok to run them in parallel.
| from holmes.plugins.toolsets.investigator.model import Task, TaskStatus | ||
|
|
||
|
|
||
| class TodoWriteTool(Tool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not move it to the tools.py?
| @@ -0,0 +1,210 @@ | |||
| {% if investigation_id %} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious, why investigation_id is so important. It's an uuid across one interactive or non-interactive mode and marking on session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's important to track one tasks running in parallel. These can be multiple session in one interactive mode ask.
No description provided.