-
Couldn't load subscription status.
- Fork 182
Add max output tokens to holmes response #972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds metadata["max_output_tokens"] (sourced from llm.get_maximum_output_token()) to final outputs in ToolCallingLLM.call and ToolCallingLLM.call_stream when completing responses. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
holmes/core/tool_calling_llm.py (1)
425-431: Set metadata["max_output_tokens"] for non-post-processing returnmetadata["max_output_tokens"] is only set in the post-processing branch; add the same assignment before the direct LLMResult return (holmes/core/tool_calling_llm.py — before return at ~line 441).
perf_timing.end(f"- completed in {i} iterations -") + # Keep metadata consistent with post-processing and streaming responses + metadata["max_output_tokens"] = maximum_output_token return LLMResult( result=text_response, tool_calls=tool_calls, prompt=json.dumps(messages, indent=2), messages=messages, **costs.model_dump(), # Include all cost fields metadata=metadata, )
🧹 Nitpick comments (1)
holmes/core/tool_calling_llm.py (1)
428-429: Nit: Clarify metadata naming for context size vs. output tokensmetadata["max_tokens"] holds the model context window size (not a “max completion tokens” parameter). Consider renaming to metadata["context_window_tokens"] (or adding this as an alias) to avoid confusion now that metadata also contains max_output_tokens.
Also applies to: 872-873
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
holmes/core/tool_calling_llm.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods
Files:
holmes/core/tool_calling_llm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
🔇 Additional comments (1)
holmes/core/tool_calling_llm.py (1)
869-876: LGTM: max_output_tokens included in streaming final messageStreaming path now populates metadata["max_output_tokens"] alongside usage and max_tokens. Matches the PR intent.
If clients consume this field, confirm no downstream schema validation breaks when the field appears in streaming but was previously absent.
No description provided.