ZOOKEEPER-4216: Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics #1989

PapaCharlie · 2023-03-08T17:54:19Z

Because the metrics were updated after the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect.

Because the metrics were updated _after_ the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect.

kezhuw

Good catch. I reports the failure to ZOOKEEPER-4216. You can reuse that jira or create a new one.

zookeeper-server/src/test/java/org/apache/zookeeper/server/watch/WatcherCleanerTest.java

Since there was an existing waitFor method in ZKTestCase, along with an existing implementation of a waitForMetric LearnerMetricsTest, this commit moves waitForMetric to ZKTestCase and refactors the metric-related usages of waitFor.

PapaCharlie · 2023-03-22T17:16:06Z

@kezhuw Can you take another look at this one as well please? Thank you!

kezhuw

LGTM % useless values. All other comments are neutral and acceptable for me.

zookeeper-server/src/test/java/org/apache/zookeeper/server/watch/WatcherCleanerTest.java

kezhuw · 2023-03-23T03:52:21Z

zookeeper-server/src/test/java/org/apache/zookeeper/server/watch/WatcherCleanerTest.java

+        waitForMetric("dead_watchers_cleared", is(3L));
+        waitForMetric("cnt_dead_watchers_cleaner_latency", is(3L));
+
+        //Each latency should be a little over 20 ms, allow 5 ms deviation


Seems that we rollback ZOOKEEPER-4200(#1592) after waitForMetric. I am not sure whether this is a regression. But I think ZOOKEEPER-4200 could also caused by concurrency in avg assertion and not atomic DEAD_WATCHERS_CLEANER_LATENCY.add(latency). I am neutral to this change. But max is more likely to fail in loaded environment. Maybe we can treat max a little special ? What do you think ? @ztzg @eolivelli

zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounter.java

Lines 44 to 49 in e50a0bb

public void addDataPoint(long value) {

total.addAndGet(value);

count.incrementAndGet();

setMin(value);

setMax(value);

}

zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounter.java

Lines 65 to 76 in e50a0bb

public double getAvg() {

// There is possible race-condition but we don't need the stats to be

// extremely accurate.

long currentCount = count.get();

long currentTotal = total.get();

if (currentCount > 0) {

double avgLatency = currentTotal / (double) currentCount;

BigDecimal bg = new BigDecimal(avgLatency);

return bg.setScale(4, RoundingMode.HALF_UP).doubleValue();

}

return 0;

}

kezhuw · 2023-03-23T03:56:56Z

zookeeper-server/src/test/java/org/apache/zookeeper/ZKTestCase.java

+            if (!matcher.matches(actual)) {
+                Description description = new StringDescription();
+                matcher.describeMismatch(actual, description);
+                LOG.info("match failed for metric {}: {}", metricKey, description);


This message might be distractive in success case. Given that this method is moved from LearnerMetricsTest, I think we could refactor it in separated jira to log matcher description only in failure case.

I updated the message. I think it makes sense to keep it as it is very useful when debugging, and in the success case the metric should resolve relatively quickly to the desired value so the log message may not even appear.

I have changed it to be a bit more descriptive, let me know what you think

This message will log every 100ms as long as the metric does not match the expected value, and should be clear that it does not indicate a test failure (at least not yet)

kezhuw · 2024-08-07T04:40:34Z

Already merged as part of #1950.

PapaCharlie mentioned this pull request Mar 8, 2023

ZOOKEEPER-4655: Communicate the Zxid that triggered a WatchEvent to fire #1950

Merged

PapaCharlie force-pushed the fixtest branch from 87e6e16 to 4f84cf3 Compare March 8, 2023 20:34

Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics

c1d57e0

Because the metrics were updated _after_ the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect.

PapaCharlie force-pushed the fixtest branch from 4f84cf3 to c1d57e0 Compare March 8, 2023 20:36

kezhuw reviewed Mar 9, 2023

View reviewed changes

PapaCharlie changed the title ~~Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics~~ ZOOKEEPER-4216: Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics Mar 9, 2023

PapaCharlie force-pushed the fixtest branch from 36d610a to 1052ee9 Compare March 9, 2023 19:50

tisonkun approved these changes Mar 22, 2023

View reviewed changes

tisonkun requested a review from eolivelli March 22, 2023 15:20

kezhuw approved these changes Mar 23, 2023

View reviewed changes

tisonkun requested a review from symat March 28, 2023 06:31

Update log message to be less alarming when polling for metric value

c0eb6fc

This message will log every 100ms as long as the metric does not match the expected value, and should be clear that it does not indicate a test failure (at least not yet)

ztzg force-pushed the master branch from 1c60545 to e2070be Compare October 3, 2023 12:57

kezhuw closed this Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZOOKEEPER-4216: Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics #1989

ZOOKEEPER-4216: Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics #1989

Uh oh!

PapaCharlie commented Mar 8, 2023 •

edited

Loading

Uh oh!

kezhuw left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PapaCharlie commented Mar 22, 2023

Uh oh!

kezhuw left a comment

Uh oh!

Uh oh!

kezhuw Mar 23, 2023

Uh oh!

kezhuw Mar 23, 2023

Uh oh!

PapaCharlie May 24, 2023

Uh oh!

kezhuw commented Aug 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	public void addDataPoint(long value) {
	total.addAndGet(value);
	count.incrementAndGet();
	setMin(value);
	setMax(value);
	}

	public double getAvg() {
	// There is possible race-condition but we don't need the stats to be
	// extremely accurate.
	long currentCount = count.get();
	long currentTotal = total.get();
	if (currentCount > 0) {
	double avgLatency = currentTotal / (double) currentCount;
	BigDecimal bg = new BigDecimal(avgLatency);
	return bg.setScale(4, RoundingMode.HALF_UP).doubleValue();
	}
	return 0;
	}

ZOOKEEPER-4216: Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics #1989

ZOOKEEPER-4216: Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics #1989

Uh oh!

Conversation

PapaCharlie commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kezhuw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PapaCharlie commented Mar 22, 2023

Uh oh!

kezhuw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kezhuw Mar 23, 2023

Choose a reason for hiding this comment

Uh oh!

kezhuw Mar 23, 2023

Choose a reason for hiding this comment

Uh oh!

PapaCharlie May 24, 2023

Choose a reason for hiding this comment

Uh oh!

kezhuw commented Aug 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PapaCharlie commented Mar 8, 2023 •

edited

Loading