Skip to content

Incorrect "span_metrics_calls_total" Metric Value for SpanMetrics when Otel-Collector is Restarted #38262

@meSATYA

Description

@meSATYA

Component(s)

connector/spanmetrics

What happened?

Description

We are generating spanmetrics by running otel-collector as statefulset behind a loadbalancing exporter with routing_key as service. The value of span_metrics_calls_total gives appropriate value until the time the collector is restarted. So, when we restart the collector, either the span_metrics_calls_total metric value shows a bump or a spike on the graph. This gives unpleasant impression that something is wrong in the service due to which calls are reduced or increased to the service.

Steps to Reproduce

Send the traces to a LoadBalancing Exporter collector running as deployment, then forward the traces from the LoadBalancing collector to another collector running as statefulset. Use routing_key as service.

Expected Result

The calls_total metric shouldn't show bump or spike when the otel-collector restarts.

Actual Result

Image Image Image

Collector version

0.120.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

mode: "statefulset"

config:
  exporters:
    debug/spanmetrics:
      verbosity: basic 

    prometheusremotewrite/spanmetrics:
      endpoint: http://victoria-metrics-cluster-vminsert.metrics.svc.cluster.local:8480/insert/10/prometheus
      resource_to_telemetry_conversion:
        enabled: true
      timeout: 60s
      compression: gzip
      tls:
        insecure_skip_verify: true

  extensions:
    health_check:
      endpoint: ${env:MY_POD_IP}:13133

  connectors:
    spanmetrics:
      histogram:
        explicit:
          buckets: [1ms, 10ms, 20ms, 50ms, 100ms, 250ms, 500ms, 800, 1s, 2s, 5s, 10s, 15s]
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
      dimensions:
        - name: http.method
        - name: http.status_code
      dimensions_cache_size: 1000
      events:
        enabled: true
        dimensions:
          - name: exception.type
      exclude_dimensions: ['k8s.pod.uid', 'k8s.pod.name', 'k8s.container.name', 'k8s.deployment.name', 'k8s.deployment.uid', 'k8s.job.name', 'k8s.job.uid', 'k8s.namespace.name', 'k8s.node.name', 'k8s.pod.ip', 'k8s.pod.start_time', 'k8s.replicaset.name', 'k8s.replicaset.uid', 'azure.vm.scaleset.name', 'cloud.resource_id', 'host.id', 'host.type', 'instance', 'service.instance.id', 'host.name', 'job', 'dt.entity.host', 'dt.entity.process_group', 'dt.entity.process_group_instance', 'container.id']      
      exemplars:
        enabled: true
        max_per_data_point: 5
      metrics_flush_interval: 1m
      metrics_expiration: 5m
      namespace: span.metrics
      resource_metrics_key_attributes:
        - service.name
        - telemetry.sdk.language
        - telemetry.sdk.name

  processors:
    batch: {}

    batch/spanmetrics:
      send_batch_max_size: 5000
      send_batch_size: 4500
      timeout: 10s

    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25
    
  receivers:
    otlp/traces:
      protocols:
        http:
          endpoint: ${env:MY_POD_IP}:4318
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
          max_recv_msg_size_mib: 12
          
  service:
    extensions:
      - health_check
    pipelines:
      metrics/spanmetrics:
        exporters:
        - prometheusremotewrite/spanmetrics
        processors:
        - batch/spanmetrics
        receivers:
        - spanmetrics

      traces/connector-pipeline:
        exporters:
        - spanmetrics
        processors:
        - batch
        receivers:
        - otlp/traces  
        
    telemetry:
      metrics:
        address: ${env:MY_POD_IP}:8888

Log output

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions