Skip to content

Conversation

@PapaCharlie
Copy link
Contributor

@PapaCharlie PapaCharlie commented Mar 8, 2023

Because the metrics were updated after the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect.

Because the metrics were updated _after_ the listener is invoked, the listener does not always see
the fresh metric value. This fixes it so that the test waits for the value to become what we expect.
Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I reports the failure to ZOOKEEPER-4216. You can reuse that jira or create a new one.

@PapaCharlie PapaCharlie changed the title Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics ZOOKEEPER-4216: Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics Mar 9, 2023
Since there was an existing waitFor method in ZKTestCase, along with an existing implementation of a
waitForMetric LearnerMetricsTest, this commit moves waitForMetric to ZKTestCase and refactors the
metric-related usages of waitFor.
@tisonkun tisonkun requested a review from eolivelli March 22, 2023 15:20
@PapaCharlie
Copy link
Contributor Author

@kezhuw Can you take another look at this one as well please? Thank you!

Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % useless values. All other comments are neutral and acceptable for me.

waitForMetric("dead_watchers_cleared", is(3L));
waitForMetric("cnt_dead_watchers_cleaner_latency", is(3L));

//Each latency should be a little over 20 ms, allow 5 ms deviation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we rollback ZOOKEEPER-4200(#1592) after waitForMetric. I am not sure whether this is a regression. But I think ZOOKEEPER-4200 could also caused by concurrency in avg assertion and not atomic DEAD_WATCHERS_CLEANER_LATENCY.add(latency). I am neutral to this change. But max is more likely to fail in loaded environment. Maybe we can treat max a little special ? What do you think ? @ztzg @eolivelli

public void addDataPoint(long value) {
total.addAndGet(value);
count.incrementAndGet();
setMin(value);
setMax(value);
}

public double getAvg() {
// There is possible race-condition but we don't need the stats to be
// extremely accurate.
long currentCount = count.get();
long currentTotal = total.get();
if (currentCount > 0) {
double avgLatency = currentTotal / (double) currentCount;
BigDecimal bg = new BigDecimal(avgLatency);
return bg.setScale(4, RoundingMode.HALF_UP).doubleValue();
}
return 0;
}

if (!matcher.matches(actual)) {
Description description = new StringDescription();
matcher.describeMismatch(actual, description);
LOG.info("match failed for metric {}: {}", metricKey, description);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message might be distractive in success case. Given that this method is moved from LearnerMetricsTest, I think we could refactor it in separated jira to log matcher description only in failure case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the message. I think it makes sense to keep it as it is very useful when debugging, and in the success case the metric should resolve relatively quickly to the desired value so the log message may not even appear.

I have changed it to be a bit more descriptive, let me know what you think

@tisonkun tisonkun requested a review from symat March 28, 2023 06:31
This message will log every 100ms as long as the metric does not match the
expected value, and should be clear that it does not indicate a test failure (at
least not yet)
@kezhuw
Copy link
Member

kezhuw commented Aug 7, 2024

Already merged as part of #1950.

@kezhuw kezhuw closed this Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants