Skip to content

Conversation

cody-littley
Copy link
Contributor

Why are these changes needed?

Add metrics for the encoding manager.

Checks

  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@cody-littley cody-littley requested a review from ian-shim December 6, 2024 20:44
@cody-littley cody-littley self-assigned this Dec 6, 2024
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@@ -154,5 +176,12 @@ func RunController(ctx *cli.Context) error {
return fmt.Errorf("failed to start dispatcher: %v", err)
}

go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have this routine start in encodingManager.Start instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this before you commented. The core issue is that RunController() starts both the encoding manager and the dispatcher, and we only want to use a single metrics server that they both share.

}

func (m *encodingManagerMetrics) reportBatchSubmissionLatency(duration time.Duration) {
m.batchSubmissionLatency.WithLabelValues().Observe(float64(duration.Nanoseconds()) / float64(time.Millisecond))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this division? can we just report duration.Millisecond?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My dislike for Milliseconds() is that it reports an integer value, but we are measuring things where sub-millisecond accuracy may end up being meaningful. Converting from nanoseconds is more cumbersome, but utilizes the full accuracy of the clock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we are measuring things where sub-millisecond accuracy may end up being meaningful

Not sure if this is true, but I'm fine with this

batchSize *prometheus.GaugeVec
batchDataSize *prometheus.GaugeVec
batchRetryCount *prometheus.GaugeVec
batchSleepTime *prometheus.GaugeVec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this metric would be useful given that it's already tracking retry count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Signed-off-by: Cody Littley <[email protected]>
@cody-littley cody-littley merged commit 9170987 into Layr-Labs:master Dec 9, 2024
6 checks passed
@cody-littley cody-littley deleted the encoding-manager-metrics branch December 9, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants