Skip to content

Conversation

@aantn
Copy link
Contributor

@aantn aantn commented Sep 10, 2025

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Reduces default MAX_GRAPH_POINTS from 300 to 200. Enhances Prometheus range query step handling with optional step, auto-calculation, and duration parsing utility. Adds large-result summarization path for Prometheus queries. Updates instructions for high-cardinality handling. Introduces new high-cardinality test fixtures, adds localhost passthrough in tests, and adjusts related tests.

Changes

Cohort / File(s) Summary
Env vars default change
holmes/common/env_vars.py
Changed default MAX_GRAPH_POINTS from 300 to 200.
Prometheus toolset logic
holmes/plugins/toolsets/prometheus/prometheus.py
Step handling now optional with auto-calculation; added large-result summarization; applied size-based summary return path; adjusted error logging; ExecuteRangeQuery step parameter no longer required.
Prometheus utils
holmes/plugins/toolsets/prometheus/utils.py
Added parse_duration_to_seconds to normalize duration inputs to seconds.
Prometheus instructions
holmes/plugins/toolsets/prometheus/prometheus_instructions.jinja2
Added guidance for high-cardinality handling, topk usage, time-range extension, and graph embedding constraints.
High-cardinality test fixtures
tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/manifest.yaml, .../test_case.yaml, .../toolsets.yaml
Added namespace, workload manifest to generate varied CPU loads; test case and toolset config for Prometheus via localhost port-forward; setup/teardown sequencing.
Test utils passthrough
conftest.py
Allowed http://localhost passthrough in responses fixture.
Grafana Tempo tests updates
tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py
Updated expected auto-calculated steps to reflect MAX_GRAPH_POINTS reduction.
Prometheus utils tests
tests/plugins/toolsets/prometheus/test_prometheus_utils.py
Added comprehensive tests for parse_duration_to_seconds, covering numeric, unit-suffixed, partial formats, and invalid inputs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant PT as PrometheusToolset
  participant PU as Utils (parse_duration_to_seconds)
  participant P as Prometheus API

  U->>PT: ExecuteRangeQuery(start, end, step?, query)
  PT->>PU: parse_duration_to_seconds(step)
  PU-->>PT: step_seconds or None
  PT->>PT: adjust_step_for_max_points(start, end, step_seconds?)\n- auto-calc if None\n- cap by MAX_GRAPH_POINTS (200)
  PT->>P: GET /api/v1/query_range?query...&step=adjusted
  P-->>PT: data (result, size)
  alt data_size_chars > 80k
    PT->>PT: create_data_summary_for_large_result(is_range_query=True)
    PT-->>U: summary (series_count, est_points, sample_data, suggestion)
  else
    PT-->>U: full data
  end
Loading
sequenceDiagram
  autonumber
  actor U as User
  participant PT as PrometheusToolset
  participant P as Prometheus API

  U->>PT: ExecuteInstantQuery(query)
  PT->>P: GET /api/v1/query?query
  P-->>PT: data (result, size)
  alt data_size_chars > 80k
    PT->>PT: create_data_summary_for_large_result(is_range_query=False)
    PT-->>U: summary (result_count, type, sample_data, suggestion)
  else
    PT-->>U: full data
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested reviewers

  • moshemorad

Pre-merge checks (1 passed, 1 warning, 1 inconclusive)

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check ❓ Inconclusive No pull request description text was provided in the supplied context, so I cannot determine whether the description accurately reflects the changeset or is on-topic. Please add a concise PR description summarizing the key changes (large-result summarization/truncation and threshold, MAX_GRAPH_POINTS default change, optional auto-step calculation and parse_duration_to_seconds helper, instruction/template updates for high-cardinality queries, and updated tests/fixtures) and the rationale/impact of those changes so reviewers can quickly assess intent and risk.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Improve prometheus truncation" is concise and directly related to a primary change in this PR: adding large-result summarization/truncation for Prometheus query responses. It is clear and specific enough for a quick scan, though it does not mention other notable changes (auto-step calculation, duration parsing util, tests, and docs), which is acceptable because a title need not list every change.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch prometheus-context-window-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
holmes/plugins/toolsets/prometheus/prometheus.py (2)

931-935: Make step optional in the tool schema to match auto-step behavior.

Runtime accepts missing step, but the schema still requires it. Aligning avoids conflicting signals to the LLM/UI.

-                "step": ToolParameter(
-                    description="Query resolution step width in duration format or float number of seconds",
-                    type="number",
-                    required=True,
-                ),
+                "step": ToolParameter(
+                    description="Optional: resolution step in seconds (float). Omit to auto-calculate.",
+                    type="number",
+                    required=False,
+                ),

33-33: Correct docstring for env var and default
In holmes/plugins/toolsets/prometheus/prometheus.py (line 291), replace

Max points is controlled by the PROMETHEUS_MAX_GRAPH_POINTS environment variable (default: 300).  

with

Max points is controlled by the MAX_GRAPH_POINTS environment variable (default: 200).  
🧹 Nitpick comments (12)
holmes/common/env_vars.py (1)

76-76: Use integer type and consistent default parsing for MAX_GRAPH_POINTS.

Points are a count; store as int and use a string default for consistency with other envs.

-MAX_GRAPH_POINTS = float(os.environ.get("MAX_GRAPH_POINTS", 200))
+MAX_GRAPH_POINTS: int = int(os.environ.get("MAX_GRAPH_POINTS", "200"))
holmes/plugins/toolsets/prometheus/prometheus_instructions.jinja2 (1)

24-29: Prefer topk over an aggregated expression to rank by the intended label.

Recommend demonstrating sum by (<label>) (...) inside topk so users pick “top pods/containers,” not arbitrary series.

-* Standard pattern for high-cardinality queries:
-  - Use `topk(5, <your_query>)` to get the top 5 series
-  - Example: `topk(5, rate(container_cpu_usage_seconds_total{namespace="default"}[5m]))`
+* Standard pattern for high-cardinality queries:
+  - Use `topk(5, sum by (pod) (<your_query>))` (replace `pod` with the label you care about)
+  - Example: `topk(5, sum by (pod) (rate(container_cpu_usage_seconds_total{namespace="default"}[5m])))`
holmes/plugins/toolsets/prometheus/prometheus.py (5)

291-293: Docstring: wrong env var name and default.

Update to reflect MAX_GRAPH_POINTS (default 200).

-    Max points is controlled by the PROMETHEUS_MAX_GRAPH_POINTS environment variable (default: 300).
+    Max points are controlled by the MAX_GRAPH_POINTS environment variable (default: 200).

335-406: Make topk/“other” suggestions accurate and label-agnostic.

The current suggestion strings alternate between pod and instance and use label_replace patterns that may be invalid. Provide a generic, correct template that nudges users to aggregate before topk and compute “other” via sum without(<label>) (...).

-            "suggestion": f'Consider using topk({min(5, num_items)}, {query}) to limit results to the top {min(5, num_items)} series. To also capture remaining data as \'other\': topk({min(5, num_items)}, {query}) or label_replace((sum({query}) - sum(topk({min(5, num_items)}, {query}))), "pod", "other", "", "")',
+            "suggestion": (
+                "Consider limiting series with topk(5, sum by (<label>) (<expr>)) "
+                "where <label> is your grouping (e.g., pod). "
+                "To include an 'other' bucket: "
+                "sum without(<label>) (sum by (<label>) (<expr>)) - "
+                "sum without(<label>) (topk(5, sum by (<label>) (<expr>)))."
+            ),
-            "suggestion": f'Consider using topk({min(5, num_items)}, {query}) to limit results. To also capture remaining data as \'other\': topk({min(5, num_items)}, {query}) or label_replace((sum({query}) - sum(topk({min(5, num_items)}, {query}))), "instance", "other", "", "")',
+            "suggestion": (
+                "Consider limiting series with topk(5, sum by (<label>) (<expr>)) "
+                "(pick <label> relevant to your query). "
+                "For an 'other' bucket, use the same pattern shown above."
+            ),

827-852: Avoid JSON re-serialization to measure payload size.

Use the HTTP payload size to decide summarization, then parse for samples only when needed. Saves CPU/memory on huge responses.

-                    # Estimate the size of the data
-                    data_str_preview = json.dumps(result_data)
-                    data_size_chars = len(data_str_preview)
+                    # Estimate size from HTTP payload
+                    data_size_chars = int(response.headers.get("Content-Length", 0)) or len(response.content)

1014-1036: Same optimization as instant query: use HTTP size, not JSON dump, for thresholding.

-                    # Estimate the size of the data
-                    data_str_preview = json.dumps(result_data)
-                    data_size_chars = len(data_str_preview)
+                    # Estimate size from HTTP payload
+                    data_size_chars = int(response.headers.get("Content-Length", 0)) or len(response.content)

284-301: Optional: early-return guard for degenerate ranges.

If end <= start, return step=1 to avoid division anomalies.

     time_range_seconds = (end_dt - start_dt).total_seconds()
+    if time_range_seconds <= 0:
+        logging.debug("Non-positive time range; defaulting step=1s")
+        return 1.0
tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml (2)

31-36: Increase readiness wait to reduce flakiness (45+5 pods may take >120s).

Bump to 240–300s.

-  kubectl wait --for=condition=Ready pod -l test=high-cardinality -n app-159 --timeout=120s
+  kubectl wait --for=condition=Ready pod -l test=high-cardinality -n app-159 --timeout=300s

24-27: Don’t mask Prometheus readiness failures unless intentional.

|| true hides real readiness issues. If that’s desired, leave it; otherwise, remove to fail fast.

-  kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=prometheus -n default --timeout=60s || true
+  kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=prometheus -n default --timeout=60s
tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/manifest.yaml (3)

53-71: Security context hardening (test fixture, optional).

Add non-root and no-priv-esc to address Checkov findings (CKV_K8S_20, CKV_K8S_23).

       - name: app
         image: alpine:latest
+        securityContext:
+          allowPrivilegeEscalation: false
+          runAsNonRoot: true
+          readOnlyRootFilesystem: true

88-108: Apply same securityContext to high-CPU pods (example for z-cpu-heavy-1).

Replicate to z-cpu-heavy-2..5 as well.

   - name: app
     image: alpine:latest
+    securityContext:
+      allowPrivilegeEscalation: false
+      runAsNonRoot: true
+      readOnlyRootFilesystem: true

53-53: Pin image tag (avoid “latest” for determinism).

Use a fixed Alpine (e.g., 3.20) to reduce pull drift across CI runs.

-        image: alpine:latest
+        image: alpine:3.20

Also applies to: 90-90, 121-121, 152-152, 183-183, 214-214

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ca849e and c850f46.

📒 Files selected for processing (6)
  • holmes/common/env_vars.py (1 hunks)
  • holmes/plugins/toolsets/prometheus/prometheus.py (7 hunks)
  • holmes/plugins/toolsets/prometheus/prometheus_instructions.jinja2 (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/manifest.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/toolsets.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
tests/llm/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place LLM evaluation tests under tests/llm/

Files:

  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/toolsets.yaml
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/manifest.yaml
tests/llm/fixtures/**/*.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

LLM test scenarios should use YAML configs under tests/llm/fixtures/

Files:

  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/toolsets.yaml
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/manifest.yaml
tests/llm/**/toolsets.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

tests/llm/**/toolsets.yaml: In evals, all toolset-specific configuration must be under a config field in toolsets.yaml
Only the allowed top-level fields are permitted in toolsets.yaml (enabled, name, description, additional_instructions, prerequisites, tools, docs_url, icon_url, installation_instructions, config, url for MCP only)

Files:

  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/toolsets.yaml
tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Tests should mirror the source structure under tests/

Files:

  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/toolsets.yaml
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/manifest.yaml
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods

Files:

  • holmes/plugins/toolsets/prometheus/prometheus.py
  • holmes/common/env_vars.py
tests/llm/**/test_case.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

tests/llm/**/test_case.yaml: Each LLM test must use a dedicated namespace app- defined in test_case.yaml
Do not put toolsets configuration in test_case.yaml; it must be in a separate toolsets.yaml file

Files:

  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml
🧠 Learnings (3)
📚 Learning: 2025-09-08T11:12:28.846Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-08T11:12:28.846Z
Learning: Applies to tests/llm/**/test_case.yaml : Do not put toolsets configuration in test_case.yaml; it must be in a separate toolsets.yaml file

Applied to files:

  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/toolsets.yaml
📚 Learning: 2025-09-08T11:12:28.846Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-08T11:12:28.846Z
Learning: Applies to tests/llm/**/toolsets.yaml : Only the allowed top-level fields are permitted in toolsets.yaml (enabled, name, description, additional_instructions, prerequisites, tools, docs_url, icon_url, installation_instructions, config, url for MCP only)

Applied to files:

  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/toolsets.yaml
📚 Learning: 2025-08-05T00:42:23.792Z
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.

Applied to files:

  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/toolsets.yaml
🪛 Checkov (3.2.334)
tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/manifest.yaml

[medium] 34-76: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 34-76: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 78-108: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 78-108: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 109-139: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 109-139: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 140-170: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 140-170: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 171-201: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 171-201: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 202-231: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 202-231: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Pre-commit checks
  • GitHub Check: llm_evals
  • GitHub Check: Pre-commit checks
🔇 Additional comments (7)
holmes/plugins/toolsets/prometheus/prometheus_instructions.jinja2 (1)

23-30: Great addition: clear high-cardinality guidance and topk pattern.

This will materially reduce result sizes and keep graphs readable.

holmes/plugins/toolsets/prometheus/prometheus.py (1)

307-314: Smart default step sizing — LGTM.

Auto-targeting ~60 points keeps charts dense enough without overfetching.

tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/toolsets.yaml (4)

4-4: Verify toolsets.yaml schema (top-level “toolsets” key).

Guidelines say only specific top-level fields are allowed in toolsets.yaml. “toolsets:” isn’t in that list. If the test runner expects “toolsets:”, ignore this; otherwise, restructure accordingly.


8-15: Prometheus config looks correct for the fixture.

Port-forward URL, metadata/labels toggle, and returning data align with the scenario.


5-7: Enabling kubernetes/core for this test is appropriate.

Nothing to change.


16-17: Enabling kubernetes/logs is fine.

No action needed.

tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml (1)

1-8: Ensure graphing won’t be summarized away.

With many series, results might cross the summarization threshold added in this PR. If flakes appear, consider explicitly guiding topk()/aggregation in the prompt or accepting summarized output in expectations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py (1)

577-580: Fix comment: MAX_GRAPH_POINTS value typo.

The comment says 300 but computes with 200. Adjust to avoid confusion.

-            # With 86400 seconds and MAX_GRAPH_POINTS=300, min step = ceil(86400/200) = 432
+            # With 86400 seconds and MAX_GRAPH_POINTS=200, min step = ceil(86400/200) = 432
holmes/plugins/toolsets/prometheus/prometheus.py (3)

284-326: Unify step adjustment with utils, fix default-points doc, and return an int step.

There’s already a canonical adjust_step_for_max_points in holmes.plugins.toolsets.utils. Re-implementing it here risks divergence. Recommend delegating to the utils function after computing time_range_seconds and your default step, fixing the doc (default is 200 now), and returning an integer step (Prometheus accepts seconds as integer; avoiding floats improves determinism).

Apply:

 def adjust_step_for_max_points(
     start_timestamp: str,
     end_timestamp: str,
-    step: Optional[float] = None,
+    step: Optional[float] = None,
 ) -> float:
     """
     Adjusts the step parameter to ensure the number of data points doesn't exceed max_points.
-    Max points is controlled by the PROMETHEUS_MAX_GRAPH_POINTS environment variable (default: 300).
+    Max points is controlled by the PROMETHEUS_MAX_GRAPH_POINTS environment variable (default: 200).
@@
-    # If no step provided, calculate a reasonable default
-    # Aim for ~60 data points across the time range (1 per minute for hourly, etc)
-    if step is None:
-        step = max(1, time_range_seconds / 60)
-        logging.debug(
-            f"No step provided, defaulting to {step}s for {time_range_seconds}s range"
-        )
-
-    current_points = time_range_seconds / step
-
-    # If current points exceed max, adjust the step
-    if current_points > MAX_GRAPH_POINTS:
-        adjusted_step = time_range_seconds / MAX_GRAPH_POINTS
-        logging.info(
-            f"Adjusting step from {step}s to {adjusted_step}s to limit points from {current_points:.0f} to {MAX_GRAPH_POINTS}"
-        )
-        return adjusted_step
-
-    return step
+    # If no step provided, calculate a reasonable default (~1/min resolution)
+    if step is None:
+        step = max(1.0, time_range_seconds / 60.0)
+        logging.debug(
+            f"No step provided, defaulting to {step}s for {time_range_seconds}s range"
+        )
+
+    # Delegate final adjustment to the shared helper to enforce the cap
+    from holmes.plugins.toolsets import utils as _utils  # local import to avoid cycles
+    adjusted = _utils.adjust_step_for_max_points(
+        time_range_seconds=int(time_range_seconds),
+        max_points=MAX_GRAPH_POINTS,
+        step=int(step) if step is not None else None,
+    )
+    return float(adjusted)

335-406: Large-result summary: good guardrail; refine suggestion text and bound work.

  • The topk/label_replace suggestion assumes specific labels (“pod”, “instance”). That can mislead on other datasets. Suggest making the example label-agnostic or omitting the label_replace pattern.
  • Consider limiting JSON work further: you already pass data_size_chars in; keep sampling to metadata only (as you do) and avoid any full-data serialization in this helper.
-            "suggestion": f'Consider using topk({min(5, num_items)}, {query}) to limit results to the top {min(5, num_items)} series. To also capture remaining data as \'other\': topk({min(5, num_items)}, {query}) or label_replace((sum({query}) - sum(topk({min(5, num_items)}, {query}))), "pod", "other", "", "")',
+            "suggestion": f'Consider using topk({min(5, num_items)}, {query}) to limit results. If you need totals, combine with sum()/avg() by the labels you care about.',
@@
-            "suggestion": f'Consider using topk({min(5, num_items)}, {query}) to limit results. To also capture remaining data as \'other\': topk({min(5, num_items)}, {query}) or label_replace((sum({query}) - sum(topk({min(5, num_items)}, {query}))), "instance", "other", "", "")',
+            "suggestion": f'Consider using topk({min(5, num_items)}, {query}) to limit results. If you need totals, combine with sum()/avg() by the labels you care about.',

1190-1194: Nit: redundant exception context.

logging.exception already includes the traceback; exc_info=True is redundant.

-            logging.exception("Failed to initialize Prometheus", exc_info=True)
+            logging.exception("Failed to initialize Prometheus")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c850f46 and f1ef282.

📒 Files selected for processing (3)
  • conftest.py (1 hunks)
  • holmes/plugins/toolsets/prometheus/prometheus.py (8 hunks)
  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods

Files:

  • conftest.py
  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py
  • holmes/plugins/toolsets/prometheus/prometheus.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Do not use pytest markers/tags that are not declared; only use markers listed in pyproject.toml

Files:

  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py
tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Tests should mirror the source structure under tests/

Files:

  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py
🧬 Code graph analysis (1)
holmes/plugins/toolsets/prometheus/prometheus.py (1)
holmes/plugins/toolsets/utils.py (1)
  • adjust_step_for_max_points (199-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: llm_evals
  • GitHub Check: Pre-commit checks
🔇 Additional comments (4)
conftest.py (1)

167-168: Scope localhost passthrough to an explicit allowlist of localhost ports used by tests (don't restrict to :3000).

File: conftest.py Lines: 167-168 — tests reference many localhost ports (found: 15672, 3000, 3003, 3004, 3100, 3101, 3200, 5000, 8000, 8080, 9093, 9099, 9153, 9159, 9200), so changing passthru to only :3000 will break other tests.

Apply this patch instead (or maintain an allowlist derived from tests / environment):

-        rsps.add_passthru("http://localhost")
+        # Allow only explicitly required localhost ports (enumerated from tests)
+        import re  # noqa: PLC0415 - test helper import
+        rsps.add_passthru(
+            re.compile(
+                r"^http://localhost:(?:15672|3000|3003|3004|3100|3101|3200|5000|8000|8080|9093|9099|9153|9159|9200)(?:/|$)"
+            )
+        )

If the fixture is meant only for Grafana (:3000), move other tests to use their own fixtures or explicitly add per-test passthru entries.

Likely an incorrect or invalid review comment.

tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py (3)

333-346: LGTM: step updated to match MAX_GRAPH_POINTS=200 auto-step behavior.

The explicit "8m" expectation aligns with the new cap and auto-step math.

Also applies to: 339-346


552-555: Comment explains the 18s derivation clearly.

Accurately documents ceil(3600/200)=18s.


635-637: LGTM: expectations for 1h and 1w ranges reflect 200-point cap.

The 18s and 50m24s expectations are consistent.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
holmes/plugins/toolsets/prometheus/prometheus.py (1)

287-326: Unify auto-step logic with MAX_GRAPH_POINTS and fix docstring.

Current defaulting to ~60 points diverges from the repo’s shared policy (limit to MAX_GRAPH_POINTS). Also, docstring still mentions 300. Recommend computing the smallest allowed step directly from MAX_GRAPH_POINTS (ceil) and using integer seconds to keep payloads stable.

Apply this diff:

 def adjust_step_for_max_points(
     start_timestamp: str,
     end_timestamp: str,
-    step: Optional[float] = None,
+    step: Optional[float] = None,
 ) -> float:
     """
     Adjusts the step parameter to ensure the number of data points doesn't exceed max_points.
-    Max points is controlled by the PROMETHEUS_MAX_GRAPH_POINTS environment variable (default: 300).
+    Max points is controlled by the PROMETHEUS_MAX_GRAPH_POINTS environment variable (see MAX_GRAPH_POINTS).
@@
-    start_dt = dateutil.parser.parse(start_timestamp)
-    end_dt = dateutil.parser.parse(end_timestamp)
-
-    time_range_seconds = (end_dt - start_dt).total_seconds()
-
-    # If no step provided, calculate a reasonable default
-    # Aim for ~60 data points across the time range (1 per minute for hourly, etc)
-    if step is None:
-        step = max(1, time_range_seconds / 60)
-        logging.debug(
-            f"No step provided, defaulting to {step}s for {time_range_seconds}s range"
-        )
-
-    current_points = time_range_seconds / step
-
-    # If current points exceed max, adjust the step
-    if current_points > MAX_GRAPH_POINTS:
-        adjusted_step = time_range_seconds / MAX_GRAPH_POINTS
-        logging.info(
-            f"Adjusting step from {step}s to {adjusted_step}s to limit points from {current_points:.0f} to {MAX_GRAPH_POINTS}"
-        )
-        return adjusted_step
-
-    return step
+    start_dt = dateutil.parser.parse(start_timestamp)
+    end_dt = dateutil.parser.parse(end_timestamp)
+    time_range_seconds = int((end_dt - start_dt).total_seconds())
+    # smallest allowed to keep points <= MAX_GRAPH_POINTS
+    min_step = max(1, (time_range_seconds + MAX_GRAPH_POINTS - 1) // MAX_GRAPH_POINTS)
+    if step is None:
+        return float(min_step)
+    return float(max(min_step, int(step)))

Also consider returning a Prometheus duration string (e.g., "18s") for readability and parity with other toolsets. See next comment.

🧹 Nitpick comments (3)
conftest.py (1)

167-168: Constrain localhost passthrough to avoid non-hermetic tests.

File: conftest.py Lines: 167-168 — Replace the blanket passthru with a tight regex that matches localhost and 127.0.0.1 (optional port).

-        rsps.add_passthru("http://localhost")
+        rsps.add_passthru(re.compile(r"^http://(localhost|127\.0\.0\.1)(:\d+)?(/|$)"))

Add at top of file:

import re

If a blanket passthru is intentional, gate it behind an env var (e.g., ENABLE_LOCAL_PASSTHRU) so CI remains hermetic.

tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py (1)

577-580: Fix typo in comment: MAX_GRAPH_POINTS value.

The comment says MAX_GRAPH_POINTS=300 but computes using 200. Update the comment to avoid confusion.

Apply this diff:

-            # With 86400 seconds and MAX_GRAPH_POINTS=300, min step = ceil(86400/200) = 432
+            # With 86400 seconds and MAX_GRAPH_POINTS=200, min step = ceil(86400/200) = 432
holmes/plugins/toolsets/prometheus/prometheus.py (1)

335-406: Large-result summarization — solid start; expose threshold and avoid label assumptions.

  • The 80K char cutoff is hardcoded. Consider promoting it to a constant or env var to tune per-deployment.
  • Suggestions assume labels like “pod”/“instance” exist; mention “e.g., pod/instance” or detect a common label from sample_metrics to avoid misleading advice.

Example tweak:

-    if is_range_query:
+    if is_range_query:
         ...
-        return {
+        common_label = next(iter(sample_metrics[0].keys()), "instance") if sample_metrics else "instance"
+        return {
             "message": ...,
             "suggestion": f'Consider using topk({min(5, num_items)}, {query}) ... label_replace((sum({query}) - sum(topk({min(5, num_items)}, {query}))), "{common_label}", "other", "", "")',
         }

Optionally add a small unit test covering the summary path with synthetic large results.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c850f46 and 291aba5.

📒 Files selected for processing (3)
  • conftest.py (1 hunks)
  • holmes/plugins/toolsets/prometheus/prometheus.py (8 hunks)
  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods

Files:

  • conftest.py
  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py
  • holmes/plugins/toolsets/prometheus/prometheus.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Do not use pytest markers/tags that are not declared; only use markers listed in pyproject.toml

Files:

  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py
tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Tests should mirror the source structure under tests/

Files:

  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py
🧬 Code graph analysis (2)
tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py (1)
holmes/core/tools.py (1)
  • ToolResultStatus (51-75)
holmes/plugins/toolsets/prometheus/prometheus.py (1)
holmes/plugins/toolsets/utils.py (1)
  • adjust_step_for_max_points (199-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pre-commit checks
  • GitHub Check: llm_evals
🔇 Additional comments (5)
tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py (3)

333-342: Updated expected step to 8m — LGTM.

Matches the reduced MAX_GRAPH_POINTS behavior.


552-555: Comment explains new min-step math — LGTM.

The rationale for 18s on a 1h range with MAX_GRAPH_POINTS=200 is correct.


635-637: Adjusted expectations — LGTM.

The new expected steps ("18s", "50m24s") align with MAX_GRAPH_POINTS=200.

holmes/plugins/toolsets/prometheus/prometheus.py (2)

1189-1194: Broadened exception handling in health check — LGTM.

Catching all and logging with exc_info avoids masking non-RequestException failures during detection.


975-980: No change required — /api/v1/query_range accepts numeric seconds or duration strings for "step".

Prometheus accepts step as either a duration string (e.g. "15s") or a float number of seconds in POST bodies, so keep "step": step as-is; optionally serialize to f"{step}s" for readability (note: converting to int will drop sub-second precision).
File: holmes/plugins/toolsets/prometheus/prometheus.py Lines: 975-980

Likely an incorrect or invalid review comment.

@nherment nherment requested a review from moshemorad September 11, 2025 10:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
holmes/plugins/toolsets/prometheus/prometheus.py (1)

285-327: Avoid duplicating step-adjust logic; reuse the shared utility.

There’s already adjust_step_for_max_points(time_range_seconds, max_points, step) in holmes/plugins/toolsets/utils.py. Duplicating here risks drift and inconsistent behavior/logging.

-from holmes.plugins.toolsets.utils import (
+from holmes.plugins.toolsets.utils import (
     get_param_or_raise,
     process_timestamps_to_rfc3339,
     standard_start_datetime_tool_param_description,
     toolset_name_for_one_liner,
+    adjust_step_for_max_points as _adjust_step_secs,
 )
@@
-def adjust_step_for_max_points(
-    start_timestamp: str,
-    end_timestamp: str,
-    step: Optional[float] = None,
-) -> float:
+def adjust_step_for_max_points(
+    start_timestamp: str,
+    end_timestamp: str,
+    step: Optional[float] = None,
+) -> float:
@@
-    time_range_seconds = (end_dt - start_dt).total_seconds()
+    time_range_seconds = (end_dt - start_dt).total_seconds()
@@
-    if step is None:
-        step = max(1, time_range_seconds / 60)
-        logging.debug(
-            f"No step provided, defaulting to {step}s for {time_range_seconds}s range"
-        )
+    if step is None:
+        step = max(1.0, time_range_seconds / 60.0)
+        logging.debug(
+            "No step provided, defaulting to %.3fs for %.0fs range",
+            step,
+            time_range_seconds,
+        )
@@
-    current_points = time_range_seconds / step
-
-    # If current points exceed max, adjust the step
-    if current_points > MAX_GRAPH_POINTS:
-        adjusted_step = time_range_seconds / MAX_GRAPH_POINTS
-        logging.info(
-            f"Adjusting step from {step}s to {adjusted_step}s to limit points from {current_points:.0f} to {MAX_GRAPH_POINTS}"
-        )
-        return adjusted_step
-
-    return step
+    # Delegate to shared util (returns int seconds)
+    adjusted = _adjust_step_secs(
+        time_range_seconds=int(time_range_seconds),
+        max_points=MAX_GRAPH_POINTS,
+        step=int(step) if step else None,
+    )
+    return float(adjusted)
♻️ Duplicate comments (1)
holmes/plugins/toolsets/prometheus/prometheus.py (1)

932-936: Schema mismatch: step accepts duration strings but type=number.

Clients will reject "5m" while the code now supports it. Make type="string" and keep required=False.

-                "step": ToolParameter(
-                    description="Query resolution step width in duration format or float number of seconds",
-                    type="number",
-                    required=False,
-                ),
+                "step": ToolParameter(
+                    description="Query resolution step width (e.g., 30s, 5m, 1h, or numeric seconds). Omit to auto-calculate.",
+                    type="string",
+                    required=False,
+                ),
🧹 Nitpick comments (9)
holmes/plugins/toolsets/prometheus/utils.py (2)

7-15: Tighten typing guard for mypy.

After excluding int/float, assert string before calling strip() to keep mypy happy.

 def parse_duration_to_seconds(v: Optional[Union[str, float, int]]) -> Optional[float]:
     if v is None:
         return None
     if isinstance(v, (int, float)):
         return float(v)
-    s = v.strip().lower()
+    assert isinstance(v, str)
+    s = v.strip().lower()

19-30: Option: reject trailing garbage in partial formats.

Right now "1h30mx" would be accepted (ignores trailing x). Consider full-match validation so only unit tokens are allowed.

-    pattern = r"(\d+(?:\.\d+)?)(d|h|m|s)"
-    matches = re.findall(pattern, s)
+    token = r"(\d+(?:\.\d+)?)(d|h|m|s)"
+    pattern = rf"^{token}+$"
+    matches = re.findall(token, s) if re.fullmatch(pattern, s) else []
tests/plugins/toolsets/prometheus/test_utils.py (1)

65-77: Add a couple of edge cases.

Consider adding:

  • Mixed tokens with spaces: "1h 30m" → 5400.0
  • Trailing junk should fail: "1h30mx" → ValueError
  • Negative durations (decide on behavior): "-5m"
holmes/plugins/toolsets/prometheus/prometheus.py (6)

308-315: Optional: ceil default step to avoid overshooting max points.

time_range/60 can be fractional; ceilling reduces risk of exceeding MAX_GRAPH_POINTS by a few points.

-    if step is None:
-        step = max(1, time_range_seconds / 60)
+    if step is None:
+        step = max(1.0, math.ceil(time_range_seconds / 60.0))

375-382: Make guidance label-agnostic.

Hard-coding labels like pod/instance may not exist for many metrics. Suggest generic hints without specific label names.

-            "suggestion": f'Consider using topk({min(5, num_items)}, {query}) to limit results to the top {min(5, num_items)} series. To also capture remaining data as \'other\': topk({min(5, num_items)}, {query}) or label_replace((sum({query}) - sum(topk({min(5, num_items)}, {query}))), "pod", "other", "", "")',
+            "suggestion": f'Consider using topk({min(5, num_items)}, {query}) to limit results. To capture the remainder as "other", aggregate by your primary label, e.g., sum by (<label>) ({query}) - sum by (<label>)(topk({min(5, num_items)}, {query})).',

(and similarly in the instant-query branch)


828-853: Compute size from the raw response to avoid re-serialization cost.

json.dumps(result_data) can be expensive. Use len(response.text) as a quick threshold, then parse or summarize as needed.

-                    data_str_preview = json.dumps(result_data)
-                    data_size_chars = len(data_str_preview)
+                    data_size_chars = len(response.text)

1015-1037: Reuse raw response length here too.

Same optimization as instant query: prefer len(response.text) over serializing result_data.

-                    data_str_preview = json.dumps(result_data)
-                    data_size_chars = len(data_str_preview)
+                    data_size_chars = len(response.text)

737-752: Use logging.warning instead of deprecated logging.warn.

Minor cleanup for consistency and to satisfy linters.

-            logging.warn("Timeout while fetching prometheus metrics", exc_info=True)
+            logging.warning("Timeout while fetching prometheus metrics", exc_info=True)
@@
-            logging.warn("Failed to fetch prometheus metrics", exc_info=True)
+            logging.warning("Failed to fetch prometheus metrics", exc_info=True)
@@
-            logging.warn("Failed to process prometheus metrics", exc_info=True)
+            logging.warning("Failed to process prometheus metrics", exc_info=True)

1191-1195: Nit: logging.exception already includes exc_info.

logging.exception(..., exc_info=True) is redundant. Drop exc_info=True.

-            logging.exception("Failed to initialize Prometheus", exc_info=True)
+            logging.exception("Failed to initialize Prometheus")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 291aba5 and 8a9aa62.

📒 Files selected for processing (3)
  • holmes/plugins/toolsets/prometheus/prometheus.py (10 hunks)
  • holmes/plugins/toolsets/prometheus/utils.py (1 hunks)
  • tests/plugins/toolsets/prometheus/test_utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods

Files:

  • tests/plugins/toolsets/prometheus/test_utils.py
  • holmes/plugins/toolsets/prometheus/utils.py
  • holmes/plugins/toolsets/prometheus/prometheus.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Do not use pytest markers/tags that are not declared; only use markers listed in pyproject.toml

Files:

  • tests/plugins/toolsets/prometheus/test_utils.py
tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Tests should mirror the source structure under tests/

Files:

  • tests/plugins/toolsets/prometheus/test_utils.py
🧬 Code graph analysis (2)
tests/plugins/toolsets/prometheus/test_utils.py (1)
holmes/plugins/toolsets/prometheus/utils.py (1)
  • parse_duration_to_seconds (7-30)
holmes/plugins/toolsets/prometheus/prometheus.py (2)
holmes/plugins/toolsets/prometheus/utils.py (1)
  • parse_duration_to_seconds (7-30)
holmes/plugins/toolsets/utils.py (1)
  • adjust_step_for_max_points (199-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pre-commit checks
  • GitHub Check: llm_evals
🔇 Additional comments (5)
tests/plugins/toolsets/prometheus/test_utils.py (1)

1-78: Great coverage and clear expectations.

The tests thoroughly cover None, numerics, units, partials, casing, whitespace, truncation, and invalids. LGTM.

holmes/plugins/toolsets/prometheus/prometheus.py (4)

25-25: Good: centralizes duration parsing.

Importing the shared parser keeps range-query step handling consistent.


336-407: Helpful large-result summarization.

Good metadata-only sampling, point estimation, and clear messaging.


965-972: LGTM: robust step parsing + auto-adjust.

Parsing flexible inputs and letting the adjuster cap points is the right flow.


162-172: AMP path: propagate timeout to AWSPrometheusConnect.signed_request

The AMP branch calls client.signed_request without timeout — pass the timeout to avoid hangs.

holmes/plugins/toolsets/prometheus/prometheus.py (do_request, AMP branch)

-        return client.signed_request(  # type: ignore
+        return client.signed_request(  # type: ignore
             method=method,
             url=url,
             data=data,
             params=params,
             verify=verify,
             headers=headers,
+            timeout=timeout,
         )

If AWSPrometheusConnect.signed_request doesn't accept timeout, enforce a per-call timeout in the SigV4 client implementation or wrap the call with a session that applies the timeout.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
holmes/plugins/toolsets/prometheus/prometheus.py (1)

285-327: Fix ZeroDivisionError, update docstring default, and round step to whole seconds

Guard against non-positive step (avoids ZeroDivisionError), update the docstring to reflect the actual env var/default, and align rounding with the utils helper.

  • File: holmes/plugins/toolsets/prometheus/prometheus.py (function around lines ~285–327)

    • Change docstring line to: "Max points is controlled by the MAX_GRAPH_POINTS environment variable (default: 200)."
    • Add a guard: elif step <= 0: logging.warning(...); step = max(1, time_range_seconds / 60) (treat non-positive like None).
    • After computing adjusted_step, round up to whole seconds and return as float: adjusted_step = max(1.0, float(math.ceil(adjusted_step))) and return max(1.0, float(math.ceil(step))).
    • Add import math to module imports.
  • Note: holmes/common/env_vars.py defines MAX_GRAPH_POINTS defaulting to 200; holmes/plugins/toolsets/utils.py already uses math.ceil and returns an int — make prometheus behavior consistent (float of the ceiling).

Run tests: tests/plugins/toolsets/test_prometheus_unit.py and tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py to confirm behavior.

♻️ Duplicate comments (4)
holmes/plugins/toolsets/prometheus/prometheus.py (4)

828-853: Avoid hard-coded 80K cutoff; derive from global context window budget and skip full dumps for sizing.

  • Tie the threshold to the general TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT so Prometheus truncation is always below the global cap (prior comment).
  • Computing size via json.dumps duplicates memory for large payloads; estimate via counts or cap-serialize a slice instead.

Please locate the global budget to wire into both instant/range paths:

#!/bin/bash
rg -n 'TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT|CONTEXT_WINDOW|MAX_OUTPUT_CHARS|RESULT_MAX_CHARS' -C2

1015-1037: Same 80K cutoff concern for range queries.
Mirror the instant-query advice: derive limit from the global context window budget and avoid full json.dumps for size estimation.


932-936: Make “step” truly optional and accept duration strings in schema.
You already parse durations; schema still declares type=number. Switch to string and clarify text to reflect durations and seconds.

-                "step": ToolParameter(
-                    description="Query resolution step width in duration format or float number of seconds",
-                    type="number",
-                    required=False,
-                ),
+                "step": ToolParameter(
+                    description="Query resolution step width. Accepts Prometheus duration (e.g., '30s','5m','1h') or seconds (e.g., '30'). Omit to auto-calculate.",
+                    type="string",
+                    required=False,
+                ),

965-972: Sanitize parsed step before adjustment.
If a user passes 0/negative, we still hit division by zero inside adjust. Either keep the defensive fix in adjust (preferred) or also guard here.

-            step = parse_duration_to_seconds(params.get("step"))
+            step = parse_duration_to_seconds(params.get("step"))
+            if step is not None and step <= 0:
+                step = None  # trigger auto-calc
@@
-                step=step,
+                step=step,
🧹 Nitpick comments (2)
holmes/plugins/toolsets/prometheus/prometheus.py (2)

336-407: Large-result summarization: helpful; consider adaptive label and safer suggestion text.
Current suggestion hardcodes label names ("pod"/"instance"). Prefer selecting a present label from sampled metrics (e.g., pod > instance > container > node > job) and adapt the label_replace accordingly to reduce user confusion.


977-982: Consider sending whole-second step to the API.
Some Prometheus-compatible backends are strict about step granularity. Casting to ceil int seconds avoids ambiguity and guarantees points <= MAX_GRAPH_POINTS after adjustment.

-                "step": step,
+                "step": int(math.ceil(step)),

Add at top if not present:

import math
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a9aa62 and 4504acf.

📒 Files selected for processing (3)
  • holmes/common/env_vars.py (1 hunks)
  • holmes/plugins/toolsets/prometheus/prometheus.py (10 hunks)
  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py
  • holmes/common/env_vars.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods

Files:

  • holmes/plugins/toolsets/prometheus/prometheus.py
🧬 Code graph analysis (1)
holmes/plugins/toolsets/prometheus/prometheus.py (2)
holmes/plugins/toolsets/prometheus/utils.py (1)
  • parse_duration_to_seconds (7-30)
holmes/plugins/toolsets/utils.py (1)
  • adjust_step_for_max_points (199-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: llm_evals
  • GitHub Check: Pre-commit checks
🔇 Additional comments (2)
holmes/plugins/toolsets/prometheus/prometheus.py (2)

25-25: Good import placement; enables flexible step parsing.
Import is at top-level and aligns with mypy/ruff rules.


1191-1195: Expanded exception context in health check: LGTM.
Using logging.exception with exc_info improves traceability without overexposing details to users.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
holmes/plugins/toolsets/prometheus/utils.py (1)

1-2: Fixed: correct typing import.

The bad Union import from a previous revision is resolved. Good catch.

holmes/plugins/toolsets/prometheus/prometheus.py (1)

939-943: Make “step” a string to match description and allow durations.

Schema says “duration format or float” but type="number". Switch to "string"; your parser already supports both.

Apply:

-                "step": ToolParameter(
-                    description="Query resolution step width in duration format or float number of seconds",
-                    type="number",
-                    required=False,
-                ),
+                "step": ToolParameter(
+                    description="Query resolution step width (e.g., 30s, 5m, 1h, or seconds as a number). Omit to auto-calculate.",
+                    type="string",
+                    required=False,
+                ),
🧹 Nitpick comments (7)
holmes/plugins/toolsets/prometheus/utils.py (3)

5-13: Add a short docstring to define accepted formats and rounding.

Clarify inputs (e.g., "5m", "1h30m", "123.45") and that unit-bearing values are floored to whole seconds.

Apply:

 def parse_duration_to_seconds(v: Optional[Union[str, float, int]]) -> Optional[float]:
+    """
+    Parse durations to seconds.
+    Accepts: None, numbers (int/float), numeric strings, or Prometheus-like
+    unit strings/compositions (e.g., "30s", "5m", "1h30m", "2.5s").
+    Note: unit-bearing values are floored to whole seconds (e.g., "2.9s" -> 2.0).
+    """

17-25: Avoid silently accepting partial matches (e.g., "1h30mX").

Current findall will parse valid tokens and ignore trailing garbage. Prefer full-string validation.

Apply:

-    pattern = r"(\d+(?:\.\d+)?)(d|h|m|s)"
-    matches = re.findall(pattern, s)
-
-    if matches:
-        total_seconds = 0.0
-        for value_str, unit in matches:
-            value = float(value_str)
-            total_seconds += value * units[unit]
-        return float(int(total_seconds))
+    token_pattern = r"(\d+(?:\.\d+)?)(d|h|m|s)"
+    full_pattern = rf"(?:\s*{token_pattern}\s*)+"
+    if re.fullmatch(full_pattern, s):
+        total_seconds = 0.0
+        for m in re.finditer(token_pattern, s):
+            value = float(m.group(1))
+            unit = m.group(2)
+            total_seconds += value * units[unit]
+        return float(int(total_seconds))

27-28: Guard against NaN/Infinity in fallback.

float(s) will accept "nan"/"inf", which later breaks step math. Reject non-finite values.

Apply:

+import math
@@
-    return float(s)
+    val = float(s)
+    if not math.isfinite(val):
+        raise ValueError("Duration must be a finite number")
+    return val
tests/plugins/toolsets/prometheus/test_prometheus_utils.py (1)

5-59: Solid coverage. Consider adding strictness/edge cases.

Add cases to define behavior for mixed-valid-with-garbage and non-finite inputs.

Apply:

@@ class TestParseDurationToSeconds:
     def test_parse_duration_to_seconds(self, input_value, expected):
         result = parse_duration_to_seconds(input_value)
         assert result == expected
+
+    @pytest.mark.parametrize(
+        "invalid_input",
+        [
+            "1h30mX",      # garbage suffix
+            "2.5 m",       # space between number and unit not supported
+            "NaN",
+            "Infinity",
+        ],
+    )
+    def test_parse_duration_to_seconds_strictness(self, invalid_input):
+        with pytest.raises(ValueError):
+            parse_duration_to_seconds(invalid_input)
holmes/plugins/toolsets/prometheus/prometheus.py (3)

59-61: Cap response size: align with overall context budget.

Ensure query_response_size_limit is always strictly below the general truncation threshold (TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT-based). Otherwise this path may still truncate downstream.

Would you like a helper that reads the global limit and clamps this value at startup?


288-296: Docstring mentions 300 but default changed to 200.

Update text to avoid drift or reference the env var instead of a number.

Apply:

-    Max points is controlled by the PROMETHEUS_MAX_GRAPH_POINTS environment variable (default: 300).
+    Max points is controlled by the PROMETHEUS_MAX_GRAPH_POINTS environment variable.

1202-1206: Minor: logging.exception already includes exc_info.

Drop exc_info=True to avoid redundancy.

Apply:

-            logging.exception("Failed to initialize Prometheus", exc_info=True)
+            logging.exception("Failed to initialize Prometheus")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4504acf and ec9ee71.

📒 Files selected for processing (4)
  • holmes/plugins/toolsets/prometheus/prometheus.py (11 hunks)
  • holmes/plugins/toolsets/prometheus/prometheus_instructions.jinja2 (1 hunks)
  • holmes/plugins/toolsets/prometheus/utils.py (1 hunks)
  • tests/plugins/toolsets/prometheus/test_prometheus_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • holmes/plugins/toolsets/prometheus/prometheus_instructions.jinja2
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting
Type hints are required (checked by mypy)
Always place Python imports at the top of the file, not inside functions or methods

Files:

  • holmes/plugins/toolsets/prometheus/prometheus.py
  • tests/plugins/toolsets/prometheus/test_prometheus_utils.py
  • holmes/plugins/toolsets/prometheus/utils.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Do not use pytest markers/tags that are not declared; only use markers listed in pyproject.toml

Files:

  • tests/plugins/toolsets/prometheus/test_prometheus_utils.py
tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Tests should mirror the source structure under tests/

Files:

  • tests/plugins/toolsets/prometheus/test_prometheus_utils.py
🧬 Code graph analysis (2)
holmes/plugins/toolsets/prometheus/prometheus.py (2)
holmes/plugins/toolsets/prometheus/utils.py (1)
  • parse_duration_to_seconds (5-28)
holmes/plugins/toolsets/utils.py (1)
  • adjust_step_for_max_points (199-222)
tests/plugins/toolsets/prometheus/test_prometheus_utils.py (1)
holmes/plugins/toolsets/prometheus/utils.py (1)
  • parse_duration_to_seconds (5-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pre-commit checks
  • GitHub Check: llm_evals
🔇 Additional comments (4)
holmes/plugins/toolsets/prometheus/prometheus.py (4)

25-25: Good reuse of shared parser.

Importing parse_duration_to_seconds centralizes duration handling.


339-409: Large-result summarization: pragmatic and safe.

Sampling labels only and estimating total points is a good trade-off for huge responses.


831-860: Large-result guard on instant queries looks good.

The size check avoids flooding outputs; logging is informative.


1022-1048: Range query summarization path: LGTM.

Consistent with instant path; helpful logs and compact summary.

@github-actions
Copy link
Contributor

Results of HolmesGPT evals

  • ask_holmes: 30/37 test cases were successful, 3 regressions, 2 skipped, 2 setup failures
Test suite Test case Status
ask 01_how_many_pods
ask 02_what_is_wrong_with_pod
ask 04_related_k8s_events ↪️
ask 05_image_version
ask 09_crashpod
ask 10_image_pull_backoff
ask 110_k8s_events_image_pull
ask 11_init_containers
ask 13a_pending_node_selector_basic
ask 14_pending_resources
ask 15_failed_readiness_probe
ask 17_oom_kill
ask 18_crash_looping_v2
ask 19_detect_missing_app_details
ask 20_long_log_file_search
ask 24_misconfigured_pvc
ask 24a_misconfigured_pvc_basic
ask 28_permissions_error 🚧
ask 29_events_from_alert_manager ↪️
ask 39_failed_toolset
ask 41_setup_argo
ask 42_dns_issues_steps_new_tools
ask 43_current_datetime_from_prompt
ask 45_fetch_deployment_logs_simple
ask 51_logs_summarize_errors
ask 53_logs_find_term
ask 54_not_truncated_when_getting_pods
ask 59_label_based_counting
ask 60_count_less_than 🚧
ask 61_exact_match_counting
ask 63_fetch_error_logs_no_errors
ask 79_configmap_mount_issue
ask 83_secret_not_found
ask 86_configmap_like_but_secret
ask 93_calling_datadog[0]
ask 93_calling_datadog[1]
ask 93_calling_datadog[2]

Legend

  • ✅ the test was successful
  • ↪️ the test was skipped
  • ⚠️ the test failed but is known to be flaky or known to fail
  • 🚧 the test had a setup failure (not a code regression)
  • 🔧 the test failed due to mock data issues (not a code regression)
  • ❌ the test failed and should be fixed before merging the PR

@nherment nherment merged commit fa945fc into master Sep 12, 2025
7 checks passed
@nherment nherment deleted the prometheus-context-window-fixes branch September 12, 2025 07:09
This was referenced Sep 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants