-
Notifications
You must be signed in to change notification settings - Fork 7.3k
ZOOKEEPER-4216: Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics #1989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Because the metrics were updated _after_ the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect.
kezhuw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I reports the failure to ZOOKEEPER-4216. You can reuse that jira or create a new one.
zookeeper-server/src/test/java/org/apache/zookeeper/server/watch/WatcherCleanerTest.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/test/java/org/apache/zookeeper/server/watch/WatcherCleanerTest.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/test/java/org/apache/zookeeper/server/watch/WatcherCleanerTest.java
Outdated
Show resolved
Hide resolved
Since there was an existing waitFor method in ZKTestCase, along with an existing implementation of a waitForMetric LearnerMetricsTest, this commit moves waitForMetric to ZKTestCase and refactors the metric-related usages of waitFor.
|
@kezhuw Can you take another look at this one as well please? Thank you! |
kezhuw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % useless values. All other comments are neutral and acceptable for me.
zookeeper-server/src/test/java/org/apache/zookeeper/server/watch/WatcherCleanerTest.java
Show resolved
Hide resolved
| waitForMetric("dead_watchers_cleared", is(3L)); | ||
| waitForMetric("cnt_dead_watchers_cleaner_latency", is(3L)); | ||
|
|
||
| //Each latency should be a little over 20 ms, allow 5 ms deviation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that we rollback ZOOKEEPER-4200(#1592) after waitForMetric. I am not sure whether this is a regression. But I think ZOOKEEPER-4200 could also caused by concurrency in avg assertion and not atomic DEAD_WATCHERS_CLEANER_LATENCY.add(latency). I am neutral to this change. But max is more likely to fail in loaded environment. Maybe we can treat max a little special ? What do you think ? @ztzg @eolivelli
zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounter.java
Lines 44 to 49 in e50a0bb
| public void addDataPoint(long value) { | |
| total.addAndGet(value); | |
| count.incrementAndGet(); | |
| setMin(value); | |
| setMax(value); | |
| } |
zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounter.java
Lines 65 to 76 in e50a0bb
| public double getAvg() { | |
| // There is possible race-condition but we don't need the stats to be | |
| // extremely accurate. | |
| long currentCount = count.get(); | |
| long currentTotal = total.get(); | |
| if (currentCount > 0) { | |
| double avgLatency = currentTotal / (double) currentCount; | |
| BigDecimal bg = new BigDecimal(avgLatency); | |
| return bg.setScale(4, RoundingMode.HALF_UP).doubleValue(); | |
| } | |
| return 0; | |
| } |
| if (!matcher.matches(actual)) { | ||
| Description description = new StringDescription(); | ||
| matcher.describeMismatch(actual, description); | ||
| LOG.info("match failed for metric {}: {}", metricKey, description); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message might be distractive in success case. Given that this method is moved from LearnerMetricsTest, I think we could refactor it in separated jira to log matcher description only in failure case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the message. I think it makes sense to keep it as it is very useful when debugging, and in the success case the metric should resolve relatively quickly to the desired value so the log message may not even appear.
I have changed it to be a bit more descriptive, let me know what you think
This message will log every 100ms as long as the metric does not match the expected value, and should be clear that it does not indicate a test failure (at least not yet)
|
Already merged as part of #1950. |
Because the metrics were updated after the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect.