High Latency Metrics Collection on oDAO node

Performance issue observed on **oDAO node** with metrics collection taking excessive time to respond, suggesting metrics are collected on-demand during query rather than continuously maintained.

Evidence:
- Metric endpoint response times:
   - from localhost:
       ```
       time curl -s 0:9102/metrics  0.00s user 0.01s system 0% cpu 19.347 total
       ```
   - from prometheus slave:  
       ```
       time curl http://10.13.0.58:9102/metrics  0.00s user 0.01s system 0% cpu 44.452 total
       ```

- Impact visible in monitoring:
  - Significant increase in TCP socket TIMEWAIT states
  - File descriptors for rocketpool process show elevated numbers
  - No corresponding increase in system load
  
![image](https://github.com/user-attachments/assets/2d9cd053-45a2-4a41-947a-76548380be66)
![image](https://github.com/user-attachments/assets/51e72eb3-920a-431b-91e4-b9051a43d282)



Suggested improvement:
Consider implementing continuous metric collection instead of on-demand gathering during scrape requests to reduce response latency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High Latency Metrics Collection on oDAO node #726

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

High Latency Metrics Collection on oDAO node #726

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions