Skip to content

Conversation

@Sheeproid
Copy link
Contributor

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds Grafana Tempo integration upgrades: new config models, a retry-enabled Tempo API client, a refactored toolset replacing legacy functions, CLI for Tempo operations, utilities for durations/step sizing, documentation updates, and extensive tests. Removes the old tempo_api module. Minor bugfix in trace parsing. Test fixtures updated for newer Tempo and enriched OTEL resources.

Changes

Cohort / File(s) Summary
Config models
holmes/plugins/toolsets/grafana/common.py
Adds GrafanaTempoLabelsConfig and GrafanaTempoConfig (with labels).
Tempo API client (new) & legacy removal
holmes/plugins/toolsets/grafana/grafana_tempo_api.py, holmes/plugins/toolsets/grafana/tempo_api.py
Introduces GrafanaTempoAPI with retries, GET/POST toggle, and TempoAPIError; removes legacy tempo_api with helper functions.
Toolset refactor & docs
holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py, holmes/plugins/toolsets/grafana/toolset_grafana_tempo.jinja2
Replaces old tools with SearchTracesByQuery/Tags, QueryTraceById, TagNames/Values, Metrics (instant/range); adds prerequisites echo check; updates guidance to API/TraceQL-focused content.
Trace formatting fix
holmes/plugins/toolsets/grafana/trace_parser.py
Corrects key from 'trootServiceName' to 'rootServiceName' in format_traces_list.
Utility functions
holmes/plugins/toolsets/utils.py
Adds seconds_to_duration_string, duration_string_to_seconds, adjust_step_for_max_points.
CLI & README
tempo_cli.py, tempo_cli_README.md
Adds Typer-based CLI covering echo, trace, searches, tags, and metrics; includes pretty printing and error handling; README with installation/config/examples.
Tempo/K8s fixtures
tests/llm/fixtures/shared/tempo.yaml, tests/llm/fixtures/test_ask_holmes/114_checkout_latency_tracing_rebuild/checkout-service.yaml, tests/llm/fixtures/test_ask_holmes/115_checkout_errors_tracing/checkout-service.yaml
Updates Tempo image/config (2.8.2, metrics generator, overrides) and OTEL resources with K8s metadata via downward API.
Tempo tests overhaul
tests/plugins/toolsets/grafana/test_grafana_tempo_api.py, tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py, tests/plugins/toolsets/grafana/test_grafana_tempo_unit.py, tests/plugins/toolsets/grafana/test_grafana_tempo.py
Adds comprehensive tests for GrafanaTempoAPI and new tools; adapts unit tests to new API/config; comments out obsolete toolset tests; maintains health/trace formatting tests.
Utils tests
tests/plugins/toolsets/test_utils_duration.py
New tests for duration parsing/formatting and step adjustment.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant CLI as Tempo CLI
  participant API as GrafanaTempoAPI
  participant Tempo as Grafana Tempo

  User->>CLI: Command (e.g., search-query)
  CLI->>API: Initialize (url, api_key, datasource_uid, use_post)
  CLI->>API: search_traces_by_query(q, start, end, limit, spss)
  API->>Tempo: HTTP GET/POST /api/search?q=...&start=...&end=...
  alt 2xx
    Tempo-->>API: JSON
    API-->>CLI: Result dict
    CLI-->>User: Print JSON (pretty/compact)
  else HTTP error
    Tempo-->>API: 4xx/5xx
    API-->>CLI: TempoAPIError(status_code, url, response_text)
    CLI-->>User: Render error and exit 1
  end
Loading
sequenceDiagram
  autonumber
  participant Tool as GrafanaTempoToolset Tool
  participant Pre as prerequisites_callable
  participant API as GrafanaTempoAPI
  participant Tempo as Grafana Tempo

  Tool->>Pre: prerequisites_callable(config)
  Pre->>API: query_echo_endpoint()
  API->>Tempo: /api/echo (GET/POST)
  alt 200 OK
    Tempo-->>API: OK
    API-->>Pre: True
    Pre-->>Tool: OK to run
  else Error/timeout
    Tempo-->>API: Error
    API-->>Pre: False
    Pre-->>Tool: Fail with message
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • aantn
  • moshemorad

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea8e34 and 513f221.

📒 Files selected for processing (17)
  • holmes/plugins/toolsets/grafana/common.py (1 hunks)
  • holmes/plugins/toolsets/grafana/grafana_tempo_api.py (1 hunks)
  • holmes/plugins/toolsets/grafana/tempo_api.py (0 hunks)
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.jinja2 (1 hunks)
  • holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py (10 hunks)
  • holmes/plugins/toolsets/grafana/trace_parser.py (1 hunks)
  • holmes/plugins/toolsets/utils.py (2 hunks)
  • tempo_cli.py (1 hunks)
  • tempo_cli_README.md (1 hunks)
  • tests/llm/fixtures/shared/tempo.yaml (2 hunks)
  • tests/llm/fixtures/test_ask_holmes/114_checkout_latency_tracing_rebuild/checkout-service.yaml (2 hunks)
  • tests/llm/fixtures/test_ask_holmes/115_checkout_errors_tracing/checkout-service.yaml (2 hunks)
  • tests/plugins/toolsets/grafana/test_grafana_tempo.py (2 hunks)
  • tests/plugins/toolsets/grafana/test_grafana_tempo_api.py (1 hunks)
  • tests/plugins/toolsets/grafana/test_grafana_tempo_tools.py (1 hunks)
  • tests/plugins/toolsets/grafana/test_grafana_tempo_unit.py (12 hunks)
  • tests/plugins/toolsets/test_utils_duration.py (1 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tempo-freefrom-toolset

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

aantn
aantn previously approved these changes Sep 7, 2025
try:
import json

error_data = json.loads(response_text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason to parse this and not just always return response_text? Is there something big we are trying to filter out of the error message in the other fields?

"""Python wrapper for Grafana Tempo REST API.
This class provides a clean interface to all Tempo API endpoints,
supporting both GET and POST methods based on configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove the POST support? It seems like a mistake. (There might be specific endpoints that require POST but having this as a global flag doesn't make sense.)

endpoint: str,
params: Optional[Dict[str, Any]] = None,
path_params: Optional[Dict[str, str]] = None,
timeout: int = 30,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this dramatically shorter or remove it altogether. (Why do we need retries at all?) With 30s per timeout * 3 retries, a single call to tempo could take 90s seconds.

return make_request()
except requests.exceptions.HTTPError as e:
# Extract detailed error message from response
response = e.response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in the exception class. Could all this code could be much simpler. Something like raise Exception(f"{e.response}")?

and getattr(e, "response", None) is not None
and e.response.status_code < 500,
)
def make_request():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of the inner function here seems uncessary. We could just post the code directly in try/except - no need to define a function.

@@ -0,0 +1,419 @@
#!/usr/bin/env python3
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move to scripts/

@@ -0,0 +1,162 @@
# Tempo CLI - Grafana Tempo Command Line Interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move to scripts

process_timestamps_to_int,
)

TEMPO_LABELS_ADD_PREFIX = load_bool("TEMPO_LABELS_ADD_PREFIX", True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this not be set?

except Exception as e:
return False, f"Failed to connect to Tempo: {str(e)}"

def build_k8s_filters(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing this altogether! Its currently only used by the comparative tool. I wonder if we really need it there or could get comparative to work without it.

@Sheeproid Sheeproid marked this pull request as ready for review September 7, 2025 12:58
@Sheeproid Sheeproid requested a review from aantn September 7, 2025 12:58
@Sheeproid Sheeproid closed this Sep 7, 2025
Sheeproid added a commit that referenced this pull request Sep 7, 2025
old pr and comment we can tackle later -
#941

---------

Co-authored-by: Nicolas Herment <[email protected]>
@Sheeproid Sheeproid deleted the tempo-freefrom-toolset branch September 7, 2025 13:09
@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2025

Results of HolmesGPT evals

  • ask_holmes: 31/37 test cases were successful, 0 regressions, 2 skipped, 3 setup failures
Test suite Test case Status
ask 01_how_many_pods
ask 02_what_is_wrong_with_pod
ask 04_related_k8s_events ↪️
ask 05_image_version
ask 09_crashpod
ask 10_image_pull_backoff
ask 110_k8s_events_image_pull
ask 11_init_containers
ask 13a_pending_node_selector_basic
ask 14_pending_resources
ask 15_failed_readiness_probe
ask 17_oom_kill
ask 18_crash_looping_v2 🚧
ask 19_detect_missing_app_details
ask 20_long_log_file_search
ask 24_misconfigured_pvc
ask 24a_misconfigured_pvc_basic
ask 28_permissions_error 🚧
ask 29_events_from_alert_manager ↪️
ask 39_failed_toolset
ask 41_setup_argo
ask 42_dns_issues_steps_new_tools ⚠️
ask 43_current_datetime_from_prompt
ask 45_fetch_deployment_logs_simple
ask 51_logs_summarize_errors
ask 53_logs_find_term
ask 54_not_truncated_when_getting_pods
ask 59_label_based_counting
ask 60_count_less_than 🚧
ask 61_exact_match_counting
ask 63_fetch_error_logs_no_errors
ask 79_configmap_mount_issue
ask 83_secret_not_found
ask 86_configmap_like_but_secret
ask 93_calling_datadog[0]
ask 93_calling_datadog[1]
ask 93_calling_datadog[2]

Legend

  • ✅ the test was successful
  • ↪️ the test was skipped
  • ⚠️ the test failed but is known to be flaky or known to fail
  • 🚧 the test had a setup failure (not a code regression)
  • 🔧 the test failed due to mock data issues (not a code regression)
  • ❌ the test failed and should be fixed before merging the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants