Skip to content

High Latency Metrics Collection on oDAO node #726

@mendelskiv93

Description

@mendelskiv93

Performance issue observed on oDAO node with metrics collection taking excessive time to respond, suggesting metrics are collected on-demand during query rather than continuously maintained.

Evidence:

  • Metric endpoint response times:

    • from localhost:
      time curl -s 0:9102/metrics  0.00s user 0.01s system 0% cpu 19.347 total
      
    • from prometheus slave:
      time curl http://10.13.0.58:9102/metrics  0.00s user 0.01s system 0% cpu 44.452 total
      
  • Impact visible in monitoring:

    • Significant increase in TCP socket TIMEWAIT states
    • File descriptors for rocketpool process show elevated numbers
    • No corresponding increase in system load

image
image

Suggested improvement:
Consider implementing continuous metric collection instead of on-demand gathering during scrape requests to reduce response latency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions