-
Notifications
You must be signed in to change notification settings - Fork 181
Improvements to loki and datadog logging #1016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds wildcard-aware pod-name handling and centralized query-building for Datadog and Loki log tools. Introduces new PodLoggingTool subclasses for both providers, updates healthcheck behavior and error context, and adds Loki-specific instructions template. Refactors Datadog query construction into a helper and enhances Loki label matching to support wildcards. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant DatadogToolset as DatadogLogsToolset
participant PodTool as DatadogPodLoggingTool
participant DD as Datadog API
rect rgba(230,240,255,0.5)
note right of PodTool: Centralized query construction
User->>DatadogToolset: fetch_pod_logs(params)
DatadogToolset->>DatadogToolset: build_datadog_query(params, dd_config)
DatadogToolset->>DD: fetch_paginated_logs(query, time range)
DD-->>DatadogToolset: logs / NO_DATA
DatadogToolset-->>User: logs / empty result + context
end
rect rgba(240,255,230,0.5)
note right of DatadogToolset: Healthcheck treats NO_DATA as reachable
User->>DatadogToolset: healthcheck()
DatadogToolset->>DatadogToolset: build_datadog_query(wildcard-capable)
DatadogToolset->>DD: query (short time range)
alt Data or NO_DATA
DD-->>DatadogToolset: response
DatadogToolset-->>User: success (with URL, time range, query)
else Error
DD-->>DatadogToolset: error
DatadogToolset-->>User: failure (with context)
end
end
sequenceDiagram
autonumber
participant User
participant LokiToolset as GrafanaLokiToolset
participant PodTool as LokiPodLoggingTool
participant Loki as Loki API
rect rgba(255,245,230,0.6)
User->>LokiToolset: fetch_pod_logs(params)
LokiToolset->>LokiToolset: _reload_instructions()
note over PodTool,LokiToolset: pod_name supports wildcards (exact, "*", simple patterns)
LokiToolset->>Loki: query_loki_logs_by_label(pod=pattern)
alt Exact label
Loki-->>LokiToolset: logs
else Wildcard pattern
LokiToolset->>Loki: regex-matched query
Loki-->>LokiToolset: logs
end
LokiToolset-->>User: logs or error (with context)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
holmes/plugins/toolsets/datadog/toolset_datadog_logs.py (2)
438-469
: Consider usinglogging.debug()
for debug messages.The healthcheck correctly uses wildcards and the timestamp calculations are accurate. However, lines 459-468 and 471-473 use
logging.info()
for messages prefixed with "DEBUG:". These should uselogging.debug()
for proper log level semantics.Apply this diff:
- logging.info( - f"DEBUG: Running Datadog healthcheck with params: namespace={healthcheck_params.namespace}, pod_name={healthcheck_params.pod_name}, start_time={healthcheck_params.start_time}" - ) - logging.info( - f"DEBUG: Healthcheck time range: from {from_dt.isoformat()} to {to_dt.isoformat()}" - ) + logging.debug( + f"Running Datadog healthcheck with params: namespace={healthcheck_params.namespace}, pod_name={healthcheck_params.pod_name}, start_time={healthcheck_params.start_time}" + ) + logging.debug( + f"Healthcheck time range: from {from_dt.isoformat()} to {to_dt.isoformat()}" + ) if self.dd_config: - logging.info( - f"DEBUG: Healthcheck query will be sent to: {self.dd_config.site_api_url}" - ) + logging.debug( + f"Healthcheck query will be sent to: {self.dd_config.site_api_url}" + )
470-485
: LGTM! Sensible healthcheck semantics.The change to treat
NO_DATA
as a successful healthcheck is correct—the healthcheck validates API connectivity and credentials, not data availability. The additional debug logging (though consider usinglogging.debug()
instead oflogging.info()
for lines 471-473 and 482-484) provides helpful diagnostic context.Consider applying this diff for consistent log levels:
- logging.info( - f"DEBUG: Healthcheck result status: {result.status}, error: {result.error}, data length: {len(result.data or '')}" - ) + logging.debug( + f"Healthcheck result status: {result.status}, error: {result.error}, data length: {len(result.data or '')}" + )And:
- logging.info( - "Datadog healthcheck completed successfully (no data found, but API is accessible)" - ) + logging.debug( + "Datadog healthcheck completed successfully (no data found, but API is accessible)" + )holmes/plugins/toolsets/grafana/loki_api.py (1)
86-96
: LGTM! Wildcard support correctly implemented.The query construction properly handles three cases:
"*"
→ regex match any non-empty value- Contains
"*"
→ convert to regex pattern (e.g.,"payment-*"
→"payment-.*"
)- No wildcards → exact match
The implementation aligns with the Loki instructions template and will work correctly for typical Kubernetes pod names.
Optional edge case consideration:
Iflabel_value
contains other regex special characters (.
,+
,?
, etc.), they'll be interpreted as regex metacharacters. For example,"my.pod"
would match"myzpod"
. While Kubernetes pod names rarely have such characters, you could escape them withre.escape()
before replacing"*"
if stricter matching is desired.holmes/plugins/toolsets/grafana/toolset_grafana_loki.py (1)
46-58
: Consider more specific return type for better type safety.The return type
dict
is correct and matches the parent class signature, butdict[str, ToolParameter]
would provide better type hints for mypy. However, this would require updating the parent class signature inlogging_api.py
as well.The implementation logic is sound and properly extends the base class to add wildcard support.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
holmes/plugins/toolsets/datadog/toolset_datadog_logs.py
(7 hunks)holmes/plugins/toolsets/grafana/loki_api.py
(2 hunks)holmes/plugins/toolsets/grafana/loki_instructions.jinja2
(1 hunks)holmes/plugins/toolsets/grafana/toolset_grafana_loki.py
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
{holmes,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{holmes,tests}/**/*.py
: Use Ruff for formatting and linting for all Python code
Require type hints (code must be type-checkable with mypy)
ALWAYS place Python imports at the top of the file; do not place imports inside functions or methods
Files:
holmes/plugins/toolsets/grafana/loki_api.py
holmes/plugins/toolsets/grafana/toolset_grafana_loki.py
holmes/plugins/toolsets/datadog/toolset_datadog_logs.py
🧬 Code graph analysis (2)
holmes/plugins/toolsets/grafana/toolset_grafana_loki.py (2)
holmes/core/tools.py (3)
StructuredToolResult
(78-102)ToolParameter
(154-159)_load_llm_instructions
(730-735)holmes/plugins/toolsets/logging_utils/logging_api.py (3)
PodLoggingTool
(77-205)_get_tool_parameters
(113-176)BasePodLoggingToolset
(60-74)
holmes/plugins/toolsets/datadog/toolset_datadog_logs.py (4)
holmes/core/tools.py (2)
ToolParameter
(154-159)StructuredToolResultStatus
(51-75)holmes/plugins/toolsets/logging_utils/logging_api.py (4)
FetchPodLogsParams
(42-57)PodLoggingTool
(77-205)_get_tool_parameters
(113-176)BasePodLoggingToolset
(60-74)holmes/plugins/toolsets/grafana/toolset_grafana_loki.py (1)
_get_tool_parameters
(46-58)holmes/plugins/toolsets/utils.py (1)
process_timestamps_to_int
(90-136)
🪛 Ruff (0.13.1)
holmes/plugins/toolsets/grafana/loki_api.py
68-68: Use explicit conversion flag
Replace with conversion flag
(RUF010)
70-70: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
70-70: Create your own exception
(TRY002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
🔇 Additional comments (8)
holmes/plugins/toolsets/grafana/loki_instructions.jinja2 (1)
1-76
: LGTM! Comprehensive Loki usage guide.This template provides clear, actionable guidance for querying Loki logs with appropriate emphasis on wildcard usage for pod names. The examples and investigation patterns are well-structured and should help users effectively retrieve logs.
holmes/plugins/toolsets/datadog/toolset_datadog_logs.py (6)
31-31
: LGTM!Import properly placed at the top with other imports and is used by
DatadogPodLoggingTool
.
72-94
: LGTM! Good refactor centralizing query construction.The helper function properly handles wildcards by excluding single "" from filters, correctly escapes quotes in filter strings, and provides a sensible fallback to "" for empty queries. This centralization reduces duplication across multiple call sites.
226-241
: LGTM! Consistent wildcard support implementation.The
DatadogPodLoggingTool
class correctly extendsPodLoggingTool
and adds wildcard support to thepod_name
parameter, mirroring the pattern used inLokiPodLoggingTool
.
113-114
: LGTM! Successful refactor to use centralized query builder.Good use of the new
build_datadog_query
helper to eliminate duplication.
195-196
: LGTM! Consistent use of centralized query builder.Properly uses
build_datadog_query
to maintain consistency withfetch_paginated_logs
.
265-266
: LGTM! Enables wildcard support for Datadog logs.Correctly switches to
DatadogPodLoggingTool
to provide wildcard-aware parameter descriptions.holmes/plugins/toolsets/grafana/toolset_grafana_loki.py (1)
77-79
: LGTM! Clean integration of custom tool and instruction loading.The instantiation of
LokiPodLoggingTool
properly replaces the base tool with wildcard-aware functionality, and calling_reload_instructions()
at the end of__init__
ensures Loki-specific guidance is loaded.
error_details = ( | ||
f"Failed to query Loki logs:\n" | ||
f" URL: {url}\n" | ||
f" Query: {query}\n" | ||
f" Time range: {start} to {end}\n" | ||
f" Limit: {limit}\n" | ||
f" Error: {str(e)}" | ||
) | ||
raise Exception(error_details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add exception chaining to preserve context.
The enhanced error message with query details is excellent for debugging. However, line 70 should use raise ... from e
to preserve the exception chain, which is important for debugging and traceback clarity.
Apply this diff:
error_details = (
f"Failed to query Loki logs:\n"
f" URL: {url}\n"
f" Query: {query}\n"
f" Time range: {start} to {end}\n"
f" Limit: {limit}\n"
- f" Error: {str(e)}"
+ f" Error: {e!s}"
)
- raise Exception(error_details)
+ raise Exception(error_details) from e
Optional improvements:
- Line 68: Use
{e!s}
instead of{str(e)}
for cleaner f-string formatting. - Line 70: Consider creating a custom
LokiQueryError
exception class instead of genericException
for better error handling downstream.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
error_details = ( | |
f"Failed to query Loki logs:\n" | |
f" URL: {url}\n" | |
f" Query: {query}\n" | |
f" Time range: {start} to {end}\n" | |
f" Limit: {limit}\n" | |
f" Error: {str(e)}" | |
) | |
raise Exception(error_details) | |
error_details = ( | |
f"Failed to query Loki logs:\n" | |
f" URL: {url}\n" | |
f" Query: {query}\n" | |
f" Time range: {start} to {end}\n" | |
f" Limit: {limit}\n" | |
f" Error: {e!s}" | |
) | |
raise Exception(error_details) from e |
🧰 Tools
🪛 Ruff (0.13.1)
68-68: Use explicit conversion flag
Replace with conversion flag
(RUF010)
70-70: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
70-70: Create your own exception
(TRY002)
🤖 Prompt for AI Agents
In holmes/plugins/toolsets/grafana/loki_api.py around lines 62 to 70, the code
builds a detailed error_details string and raises a plain Exception without
chaining; update the raise to preserve the original exception by using "raise
Exception(error_details) from e". While editing, replace "{str(e)}" with "{e!s}"
in the f-string for cleaner formatting, and optionally consider defining and
raising a custom LokiQueryError instead of a generic Exception for clearer
downstream handling.
def _reload_instructions(self): | ||
"""Load Loki specific instructions.""" | ||
template_file_path = os.path.abspath( | ||
os.path.join(os.path.dirname(__file__), "loki_instructions.jinja2") | ||
) | ||
self._load_llm_instructions(jinja_template=f"file://{template_file_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Add return type hint for mypy compliance.
The method is missing a return type hint, which is required by the coding guidelines. Add -> None
to the method signature.
Apply this diff:
- def _reload_instructions(self):
+ def _reload_instructions(self) -> None:
"""Load Loki specific instructions."""
Verify the loki_instructions.jinja2 template file exists.
Ensure the template file is present in the same directory as this file, as the path construction expects it to be co-located.
Run this script to verify the template file exists:
🏁 Script executed:
#!/bin/bash
# Verify that loki_instructions.jinja2 exists in the same directory
fd -t f "loki_instructions.jinja2" holmes/plugins/toolsets/grafana/
Length of output: 56
Add missing return type hint for mypy compliance.
Apply this diff:
- def _reload_instructions(self):
+ def _reload_instructions(self) -> None:
"""Load Loki specific instructions."""
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _reload_instructions(self): | |
"""Load Loki specific instructions.""" | |
template_file_path = os.path.abspath( | |
os.path.join(os.path.dirname(__file__), "loki_instructions.jinja2") | |
) | |
self._load_llm_instructions(jinja_template=f"file://{template_file_path}") | |
def _reload_instructions(self) -> None: | |
"""Load Loki specific instructions.""" | |
template_file_path = os.path.abspath( | |
os.path.join(os.path.dirname(__file__), "loki_instructions.jinja2") | |
) | |
self._load_llm_instructions(jinja_template=f"file://{template_file_path}") |
🤖 Prompt for AI Agents
In holmes/plugins/toolsets/grafana/toolset_grafana_loki.py around lines 138 to
143, the _reload_instructions method is missing a return type hint which fails
mypy; add an explicit return type hint (-> None) to the def signature so it
reads def _reload_instructions(self) -> None:, leaving the body unchanged.
No description provided.