-
Couldn't load subscription status.
- Fork 182
Enhance prompt caching #916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Anthropic prompt caching and LLM cost/usage logging; removes the session-scoped Todo manager and the Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant DefaultLLM
participant LiteLLM
Caller->>DefaultLLM: completion(messages, ...)
DefaultLLM->>DefaultLLM: _add_cache_control_to_last_message(messages)
Note right of DefaultLLM #dfeffc: remove prior cache_control<br/>attach ephemeral cache_control to last non-user message
DefaultLLM->>LiteLLM: completion(modified messages)
LiteLLM-->>DefaultLLM: response
DefaultLLM-->>Caller: response
sequenceDiagram
autonumber
participant UserFlow
participant ToolCallingLLM
participant LLM
UserFlow->>ToolCallingLLM: call (stream or non-stream)
alt non-stream
ToolCallingLLM->>LLM: request
LLM-->>ToolCallingLLM: full_response (may include usage/cost)
ToolCallingLLM->>ToolCallingLLM: _process_cost_info(full_response, "LLM call")
opt LOG_LLM_USAGE_RESPONSE=true
ToolCallingLLM->>ToolCallingLLM: log usage details
end
else stream
loop chunks
ToolCallingLLM->>LLM: stream request/recv
LLM-->>ToolCallingLLM: delta (may include usage/cost)
ToolCallingLLM->>ToolCallingLLM: _process_cost_info(delta, "LLM iteration")
end
end
ToolCallingLLM-->>UserFlow: result
Note right of ToolCallingLLM #fff4e6: No investigation_id or session Todo manager used<br/>Uses stateless task formatter where needed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
holmes/plugins/toolsets/investigator/investigator_instructions.jinja2 (1)
214-220: Conflicting guidance: “ONE in_progress task” vs. “parallel in_progress tasks”Line 214 says limit to ONE in_progress, while Line 220 instructs marking multiple in_progress for parallel work. Please reconcile.
Apply:
- - in_progress: Currently working on (limit to ONE task at a time) + - in_progress: Currently working on. Prefer ONE at a time by default; you MAY have multiple in_progress tasks only when they are independent and you are running tool calls in parallel (cap at 2–3). @@ - - If tasks are not dependent on one another, handle multiple tasks in parallel and mark them in_progress. + - If tasks are independent, you MAY run tools in parallel. Mark only those tasks as 'in_progress'; keep the rest 'pending'.holmes/core/llm.py (1)
272-328: Harden cache_control injection (provider gate, safer removal, text-block checks, graceful fallback)
- Add provider check (Anthropic only).
- Remove cache_control from dict content too.
- Only set on a text block; if none exists, append a tiny text block to carry cache_control.
- Avoid touching user messages (kept).
- Improve logs.
def _add_cache_control_to_last_message( self, messages: List[Dict[str, Any]] ) -> None: @@ - # First, remove any existing cache_control from all messages + # Provider gate: Anthropic-only feature + lookup = litellm.get_llm_provider(self.model) + if not lookup or lookup[1] != "anthropic": + return + + # First, remove any existing cache_control from all messages for msg in messages: content = msg.get("content") - if isinstance(content, list): + if isinstance(content, list): for block in content: if isinstance(block, dict) and "cache_control" in block: del block["cache_control"] logging.debug( f"Removed existing cache_control from {msg.get('role')} message" ) + elif isinstance(content, dict) and "cache_control" in content: + # Rare shape: single structured block as dict + del content["cache_control"] + logging.debug( + f"Removed existing cache_control from {msg.get('role')} message (dict content)" + ) @@ - if isinstance(content, str): + if isinstance(content, str): # Convert string to structured format with cache_control target_msg["content"] = [ { "type": "text", "text": content, "cache_control": {"type": "ephemeral"}, } ] logging.debug( f"Added cache_control to {target_msg.get('role')} message (converted from string)" ) - elif isinstance(content, list) and content: - # Add cache_control to the last content block - last_block = content[-1] - if isinstance(last_block, dict) and "type" in last_block: - last_block["cache_control"] = {"type": "ephemeral"} - logging.debug( - f"Added cache_control to {target_msg.get('role')} message (structured content)" - ) + elif isinstance(content, list) and content: + # Prefer the last text block; fall back to appending a tiny text block + target_block = None + for block in reversed(content): + if isinstance(block, dict) and block.get("type") == "text": + target_block = block + break + if target_block is not None: + target_block["cache_control"] = {"type": "ephemeral"} + logging.debug( + f"Added cache_control to {target_msg.get('role')} message (existing text block)" + ) + else: + content.append( + { + "type": "text", + "text": "", + "cache_control": {"type": "ephemeral"}, + } + ) + logging.debug( + f"Added cache_control carrier block to {target_msg.get('role')} message (no prior text blocks)" + ) + # If content is a dict without 'type' or None, skip silently.I can add unit tests for: string content, list with/without text blocks, only user messages, non-Anthropic models.
holmes/plugins/prompts/investigation_procedure.jinja2 (1)
110-119: Logical contradiction in phase “YES” checklistYou ask to continue until you can answer “YES” to “Are there gaps…?” which should be “NO”.
Apply:
-3. **Continue Creating New Phases** until you can answer "YES" to: - - "Do I have enough information to completely answer the user's question?" - - "Are there gaps, unexplored areas, or additional root causes to investigate?" +3. **Continue Creating New Phases** until you can answer: + - "Do I have enough information to completely answer the user's question?" → YES + - "Are there gaps, unexplored areas, or additional root causes to investigate?" → NO - "Have I followed the 'five whys' methodology to the actual root cause?" - - "Did my investigation reveal new questions or areas that need exploration?" - - "Are there any additional investigation steps I can perform, in order to provide a more accurate solution?" + - "Did my investigation reveal new questions or areas that need exploration?" → NO + - "Are there any additional investigation steps I can perform, in order to provide a more accurate solution?" → NO - "I have thoroughly investigated all aspects of this problem" - "I can provide a complete answer with specific, actionable information" - "No additional investigation would improve my answer"holmes/plugins/toolsets/investigator/core_investigation.py (4)
83-91: Validate and normalize task status with clear errorsCurrent TaskStatus(...) cast will raise ValueError without context. Validate per-item and fail fast with a helpful message.
- for todo_item in todos_data: + for idx, todo_item in enumerate(todos_data): if isinstance(todo_item, dict): - task = Task( + status_raw = str(todo_item.get("status", "pending")) + if status_raw not in {s.value for s in TaskStatus}: + raise ValueError(f"Invalid status at todos[{idx}]: '{status_raw}'. Allowed: pending, in_progress, completed") + task = Task( id=todo_item.get("id", str(uuid4())), content=todo_item.get("content", ""), - status=TaskStatus(todo_item.get("status", "pending")), + status=TaskStatus(status_raw), ) tasks.append(task)
95-102: Remove stale “stored in session” claim (persistence was removed)This tool is now stateless; don’t promise session persistence.
- response_data = f"✅ Investigation plan updated with {len(tasks)} tasks. Tasks are now stored in session and will appear in subsequent prompts.\n\n" + response_data = f"✅ Investigation plan updated with {len(tasks)} tasks.\n\n"Optional: add “The formatted task list is included below and will be visible to the AI in subsequent turns as tool output.”
139-144: Follow prompt-location guideline: move template under holmes/plugins/prompts/Toolsets should not load prompts via file paths under the toolset dir. Use the shared prompts folder.
- template_file_path = os.path.abspath( - os.path.join(os.path.dirname(__file__), "investigator_instructions.jinja2") - ) - self._load_llm_instructions(jinja_template=f"file://{template_file_path}") + # After moving the template to holmes/plugins/prompts/investigator_instructions.jinja2 + self._load_llm_instructions(jinja_template="builtin://investigator_instructions.jinja2")Move investigator_instructions.jinja2 to holmes/plugins/prompts/ to comply with repo convention.
21-35: Add enum support to ToolParameter and constrain status values
Extend the base model in holmes/core/tools.py to support anenumfield, then apply it to thestatusproperty:In holmes/core/tools.py under
class ToolParameter(BaseModel), add:enum: Optional[List[Any]] = NoneThen update your investigator parameters:
- "status": ToolParameter(type="string", required=True), + "status": ToolParameter( + type="string", + required=True, + enum=["pending", "in_progress", "completed"], + ),holmes/core/tool_calling_llm.py (1)
96-104: Harden usage parsing across SDKs (dict or object)usage may be a dict or an object with attributes. Access safely.
- if usage: - if LOG_LLM_USAGE_RESPONSE: # shows stats on token cache usage - logging.info(f"LLM usage response:\n{usage}\n") - prompt_toks = usage.get("prompt_tokens", 0) - completion_toks = usage.get("completion_tokens", 0) - total_toks = usage.get("total_tokens", 0) + if usage: + if LOG_LLM_USAGE_RESPONSE: + cost_logger.info(f"LLM usage response:\n{usage}\n") + getter = (lambda k, d=0: usage.get(k, d)) if isinstance(usage, dict) else (lambda k, d=0: getattr(usage, k, d)) + prompt_toks = getter("prompt_tokens", 0) + completion_toks = getter("completion_tokens", 0) + total_toks = getter("total_tokens", 0)
🧹 Nitpick comments (8)
holmes/plugins/toolsets/aws.yaml (1)
72-76: Minor copy polish for logs toolOptional, but improves consistency.
- description: "Describe all available logs for an AWS RDS instance. Runs 'aws rds describe-db-log-files' for a specific instance." + description: "List all available logs for an RDS instance. Runs 'aws rds describe-db-log-files' for the specified instance."holmes/plugins/prompts/investigation_procedure.jinja2 (3)
69-70: Fix missing newline in status checklist“Task 2” and “Task 3” are concatenated.
Apply:
-[✓] completed - Task 2[✓] completed - Task 3 +[✓] completed - Task 2 +[✓] completed - Task 3
146-147: Typo: “accross” → “across”- *Final Review Phase*: Validate that the chain of events, accross the different components, can lead to the investigated scenario. + *Final Review Phase*: Validate that the chain of events, across the different components, can lead to the investigated scenario.
120-128: Consistent naming: “Final Review” vs “Verification Phase”You refer to “Final Review phase” and later “Verification Phase.” Pick one term to avoid ambiguity. Suggest “Final Review phase” everywhere.
Also applies to: 186-195
holmes/common/env_vars.py (1)
71-71: Ensure boolean type at definition siteload_bool returns Optional[bool]. Force a concrete bool for simpler typing downstream.
-LOG_LLM_USAGE_RESPONSE = load_bool("LOG_LLM_USAGE_RESPONSE", False) +LOG_LLM_USAGE_RESPONSE: bool = bool(load_bool("LOG_LLM_USAGE_RESPONSE", False))holmes/plugins/toolsets/investigator/core_investigation.py (1)
38-47: Add type hints to improve readability and mypy signal- def print_tasks_table(self, tasks): + def print_tasks_table(self, tasks: list[Task]) -> None:holmes/core/todo_tasks_formatter.py (2)
6-11: Modern typing and docstring precision-def format_tasks(tasks: List[Task]) -> str: - """ - Format tasks for tool response - Returns empty string if no tasks exist. - """ +def format_tasks(tasks: list[_TaskLike]) -> str: + """Format tasks into a markdown block. Returns '' if no tasks."""Note: adjust imports if you accept the Protocol suggestion above.
20-24: Stable ordering within same statusAdd a secondary key (id) for deterministic output.
- sorted_tasks = sorted( - tasks, - key=lambda t: (status_order.get(t.status, 3),), - ) + def _status_key(s: object) -> int: + v = getattr(s, "value", s) # enum or str + return status_order.get(v, 3) + + sorted_tasks = sorted(tasks, key=lambda t: (_status_key(t.status), str(t.id)))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (20)
docs/ai-providers/anthropic.md(1 hunks)docs/installation/python-installation.md(0 hunks)holmes/common/env_vars.py(1 hunks)holmes/core/conversations.py(0 hunks)holmes/core/investigation.py(0 hunks)holmes/core/llm.py(2 hunks)holmes/core/prompt.py(0 hunks)holmes/core/todo_manager.py(0 hunks)holmes/core/todo_tasks_formatter.py(1 hunks)holmes/core/tool_calling_llm.py(2 hunks)holmes/interactive.py(0 hunks)holmes/main.py(0 hunks)holmes/plugins/prompts/_general_instructions.jinja2(1 hunks)holmes/plugins/prompts/investigation_procedure.jinja2(2 hunks)holmes/plugins/toolsets/aws.yaml(3 hunks)holmes/plugins/toolsets/investigator/core_investigation.py(2 hunks)holmes/plugins/toolsets/investigator/investigator_instructions.jinja2(1 hunks)server.py(0 hunks)tests/core/test_todo_manager.py(0 hunks)tests/llm/test_ask_holmes.py(0 hunks)
💤 Files with no reviewable changes (10)
- holmes/main.py
- server.py
- holmes/interactive.py
- holmes/core/investigation.py
- holmes/core/prompt.py
- tests/llm/test_ask_holmes.py
- tests/core/test_todo_manager.py
- holmes/core/conversations.py
- docs/installation/python-installation.md
- holmes/core/todo_manager.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always place Python imports at the top of the file, not inside functions or methods
Type hints are required (mypy is configured in pyproject.toml)
Files:
holmes/core/todo_tasks_formatter.pyholmes/common/env_vars.pyholmes/plugins/toolsets/investigator/core_investigation.pyholmes/core/tool_calling_llm.pyholmes/core/llm.py
holmes/plugins/toolsets/**
📄 CodeRabbit inference engine (CLAUDE.md)
Toolsets must live under holmes/plugins/toolsets as either {name}.yaml or a {name}/ directory
Files:
holmes/plugins/toolsets/aws.yamlholmes/plugins/toolsets/investigator/investigator_instructions.jinja2holmes/plugins/toolsets/investigator/core_investigation.py
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
In MkDocs content, add a blank line between a header or bold text and the following list so lists render correctly
Files:
docs/ai-providers/anthropic.md
holmes/plugins/prompts/*.jinja2
📄 CodeRabbit inference engine (CLAUDE.md)
Prompt templates must be placed at holmes/plugins/prompts/{name}.jinja2
Files:
holmes/plugins/prompts/_general_instructions.jinja2holmes/plugins/prompts/investigation_procedure.jinja2
🧬 Code graph analysis (2)
holmes/core/todo_tasks_formatter.py (1)
holmes/plugins/toolsets/investigator/model.py (2)
Task(12-15)TaskStatus(6-9)
holmes/plugins/toolsets/investigator/core_investigation.py (1)
holmes/core/todo_tasks_formatter.py (1)
format_tasks(6-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
- GitHub Check: Pre-commit checks
🔇 Additional comments (3)
holmes/plugins/prompts/investigation_procedure.jinja2 (2)
13-16: Parallelism guidance tightened — LGTMClearer rule of “parallel when independent, sequential when dependent.” Matches our executor’s concurrent tool calls.
87-90: Immediate TodoWrite + in_progress semantics — LGTMThis aligns with the tool contract and makes agent behavior explicit.
holmes/core/tool_calling_llm.py (1)
15-19: Importing LOG_LLM_USAGE_RESPONSE — LGTMKeeps logging concerns configurable via env.
No description provided.