Skip to content

Conversation

@Randgalt
Copy link
Member

@Randgalt Randgalt commented Oct 2, 2019

Background

Note: this is a port of #136

Implementation for a persistent, recursive watch addition for ZK. These watches are set via a new method, addPersistentWatch() and are removed via the existing watcher removal methods. Persistent, recursive watches have these characteristics: a) Once set, they do not auto-remove when triggered; b) they trigger for all event types (child, data, etc.) on the node they are registered for and any child znode recursively; c) they are efficiently implemented by using the existing watch internals. A new class PathIterator walks up the path parent-by-parent when checking if a watcher applies.

Implementation Details

  • A new enum manages the different "modes" for watchers: WatcherMode.
  • For traditional, "standard" watchers, the code path is almost exactly the same. There is very little overhead other than a bit of extra checks in WatchManager
  • Given how this is implemented it was difficult to add support when WatchManagerOptimized is used. I'm open to adding it for that version but it will take work. We should consider not supporting persistent/recursive watchers when WatchManagerOptimized is used. I notice that WatchManagerOptimized is not even mentioned in the docs.
  • The mode for a given watcher/path pair is held in a map inside of WatcherModeManager. The absence of an entry means Standard. This way, there's no overhead for old, standard watchers.
  • PathParentIterator is the "meat" of the implementation. Rather than set watchers on every ZNode implied by a recursive watcher. WatchManager passes any paths it processes through PathParentIterator which iterates up each parent znode looking for watchers.
  • The remainder of the changes are scaffolding to match how other watchers are used as well as Jute/API changes to set persistent/recursive watchers

Testing

The tests were written years ago. I think they're comprehensive but reviewers should pay attention to anything that was missed. There is much ZooKeeper knowledge that's only in the heads of ZK committers.

  • PersistentWatcherTest - tests persistent, non-recursive watchers
  • PersistentRecursiveWatcherTest - tests persistent, recursive watchers
  • PathParentIteratorTest- exercises edges of PathParentIterator

@Randgalt Randgalt changed the title ZOOKEEPER-1416 ZOOKEEPER-1416 - Persistent, recursive watchers Oct 2, 2019
* @throws KeeperException If the server signals an error with a non-zero
* error code.
*/
public void addPersistentWatch(String basePath, Watcher watcher, boolean recursive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using an enum instead of a boolean for 'recursive'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you anticipate more types? It would be easy to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
We could also call the method 'addWatch' and have enum values like PERSISTENT_RECURSIVE and PERSISTENT_RECURSIVE

Or even better we can use EnumSet and two flags PERSISTENT,RECURSIVE

addWatch(...., EnumSet.of(PERSISTENT,RECURSIVE))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with an EnumSet (or a var args) is that it would allow empty sets/arrays which is illegal. I'd rather have a single enum argument.

Copy link
Member Author

@Randgalt Randgalt Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I propose adding:

public enum PersistentWatchMode {
    NON_RECURSIVE,
    RECURSIVE
}

And changing the APIs to take this instead of the boolean. OK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is actually a pretty big change. I'm going to flow the watch mode through the server and not just make this a client change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should narrow down the patch to the minimum needed, otherwise it will be hard to merge it soon.
if we have more complex plan for the future and change deeply protocols and server side flows we should discuss on the ML more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you're suggesting a client only change? i.e. this enum is merely to avoid using a boolean in the public API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes
You already did a great work and the client with your changes is still compatibile at 100% with 3.5 servers in case of not using persistent watches.
If we change the protocol regarding watches (adding some watchtype to the wireprotocol) you will really need a "compatibility flag" on the client configuration or some more complicated handshake with version/feature discovery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the commit I just pushed. This is merely a refinement. I think this is much better actually and we'll thank ourselves in the future. Now, things are more generic "add watch with a mode". The mode is currently only persistent or recursive but we now have room to change in the future. The number/scope of the change is exactly the same as before.

@Randgalt Randgalt force-pushed the ZOOKEEPER-1416 branch 2 times, most recently from 5e268f4 to 717cb66 Compare October 2, 2019 22:59
@Randgalt Randgalt marked this pull request as ready for review October 2, 2019 23:12
@Randgalt Randgalt requested a review from eolivelli October 2, 2019 23:12
@Randgalt Randgalt force-pushed the ZOOKEEPER-1416 branch 2 times, most recently from 16d7460 to c48a3c3 Compare October 3, 2019 15:17
@Randgalt Randgalt requested a review from eolivelli October 3, 2019 15:50
@Randgalt
Copy link
Member Author

Randgalt commented Oct 7, 2019

FYI - I wrote the Curator portions of this new feature as a test. One of them is a new path cache that replaces Curator's TreeCache, PathChildrenCache and NodeCache. First, the code is so much simpler and easier to reason about. This alone will reduce errors and complexity. Also the number of ZK calls needed (and overhead for all those watchers) is vastly reduced.

I wrote a test to exercise the new cache, include with forced server crashes, and it's working perfectly. So, this suggest this new Watcher code is good. So, FYI, another data point.

If anyone's interested - here's the code:

  • PersistentWatcher.java - keeps a persistent watcher registered past any connection losses
  • CuratorCacheImpl.java - a full tree cache - note how simple it is compared to the corresponding Curator recipes that have to use old style watchers.

@Randgalt Randgalt force-pushed the ZOOKEEPER-1416 branch 6 times, most recently from 41c7f90 to 7e4f9a3 Compare October 9, 2019 08:47
@Randgalt
Copy link
Member Author

Randgalt commented Oct 18, 2019

How can we get this completed?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me.

Just waiting for approval from other committers.

Thank you so much for contributing this great feature

@Randgalt
Copy link
Member Author

An important question to all committers. In DataTree.setWatches persistent watchers are not applied. This means that after a network partition, no persistent watchers will trigger. I don't have a feeling about this one way or another - the current implementation works fine for Curator's use cases.

@eolivelli
Copy link
Contributor

@Randgalt can you elaborate a bit more?
What's the relationship between DataTree and network partition?

@tisonkun
Copy link
Member

@eolivelli it is about SetWatches message which on connection loss recovered resets watchers on server and trigger if it detected event. In this pull request it reset persistent watches but no logic detect if there has been an event.

@Randgalt
Copy link
Member Author

Randgalt commented Oct 19, 2019

Yes. It’s the triggering of missed events that I’m concerned about. It would be easy to add but I’m unsure if it’s needed.

Note: it could trigger the entire tree of nodes. I’m very uncertain if it’s a good idea. I was able to write Curator’s new caching recipe without this.

Update: 1 possibility is to leave the implementation as it is and document the behavior.

@anmolnar
Copy link
Contributor

anmolnar commented Oct 21, 2019

@Randgalt I don't have a strong opinion either, but in case of a lot of missed events triggering them would potentially put very high load on the client which might not be desirable.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a very quick glance it looks good to me. Here're a few points to think about:

  • More detailed documentation would be nice to have. I understand the motivation behind having a completely separate API for these new watchers, but it seems to me it has a very important difference in its characteristics: standard watchers are set via getData() API in order to avoid losing any data change event. Retrieving the data and setting the watcher is atomic. Persistent watchers work differently.

  • As you outlined we don't have docs for WatchManagerOptimized, but we do have a nice javadoc. Would be nice to add it here or create new Jira to fill the gap.

  • CLI support - new Jira?

  • C client support - new Jira?

  • Add watcher type to Admin server: currently I can see persistent watcher, but there's no way to distinguish them from standard watchers. 4LW is deprecated, so we probably don't need support for it.

  • WatchManagerOptimized: I haven't tried, but I think you can add the loop of PathIterator as for WatchManager. This guy here:

Watcher w = watcherBitIdMap.get(wBit);

retrieves the watcher and you can get the WatcherMode in the usual way. Maybe the hard part is that WatchManagerOptimized immediately removes the watchers here:

BitHashSet watchers = remove(path);

which could be not straightforward to refactor.
...or I completely misunderstand the problem.

private final Map<String, Set<Watcher>> dataWatches = new HashMap<String, Set<Watcher>>();
private final Map<String, Set<Watcher>> existWatches = new HashMap<String, Set<Watcher>>();
private final Map<String, Set<Watcher>> childWatches = new HashMap<String, Set<Watcher>>();
private final Map<String, Set<Watcher>> persistentWatches = new HashMap<String, Set<Watcher>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using ConcurrentHashMap instead of synchronizing everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - but it should probably be a separate tech-debt ticket.

@anmolnar
Copy link
Contributor

One more thing on the testing side.

Currently we run the same unit tests for all impl of WatchManagers in WatchManagerTest. If we end up not implementing persistent watcher for the optimized version, I think we will need to break these tests and separately test persistent/recursive watchers in the tests of standard WatchManager.

@Randgalt
Copy link
Member Author

@Randgalt I don't have a strong opinion either, but in case of a lot of missed events triggering them would potentially put very high load on the client which might not be desirable.

The Curator cache recipes have to re-query the entire tree anyway so it doesn't benefit algorithms like that. So, my vote would be to document the behavior and let clients choose how to handle re-connections. But, I'm happy to change if others prefer something else.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not take into account the OptimizedWatchManager case.
I left a new comment, please take a look

if (watcherMode == WatcherMode.DEFAULT_WATCHER_MODE) {
return addWatch(path, watcher);
}
throw new UnsupportedOperationException(); // custom implementations must defeat this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path will be followed with OptimizedWatchManager.
How will it bubble up to the client?

I see two ways:

  • implement the method everywhere (it will have a cost!)
  • add a test case about the behaviour of a client that it is trying to use this feature with a server that does not support it because it has not the default WatchManager impl

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Randgalt
Copy link
Member Author

Randgalt commented Oct 22, 2019

@anmolnar

  • Re More detailed documentation would be nice to have - I'll push more docs soon
  • Re the CLI - the CLI does not support watchers so it shouldn't be a concern
  • Re C client support - A separate ticket. I don't know much about the C client.
  • Re Add watcher type to Admin server - I'll see what I can do

@anmolnar and @eolivelli

  • Re WatchManagerOptimized - I'll make sure tests pass, etc. without adding support to it. As @anmolnar noticed it will be very tricky to do.

@anmolnar I looked at the AdminServer and it currently doesn't return any details about watchers - i.e. whether it's a data watcher, exists watcher or a child watcher. In my view, nothing needs to be done to stay consistent.

maintain compatibility with older servers - if no persistent/recursive watchers are used, use the old version of SetWatches
Generalized this feature in the hopes of making it easier to change in the future. The APIs are no longer specific to persistent/recursive. Now, they are generically "add watch" and there is an additional mode that details the behavior of the watch. For now, the only choices are persistent or recursive but that can change in the future.
Added CLI command that corresponds to new addWatch() method in ZooKeeper
Added versions of addWatch that use the default watcher. With this, the "Helpers" in the CLI is no longer needed.
@maoling
Copy link
Member

maoling commented Nov 5, 2019

  • One of my concerns is:When the children of one path is very frequently updated flying with persistent-recursive watcher, the fired watches can be a issue?(e.g overhead network bandwidth. I test a path node with 1000000 children paths, then update all these 100000 child-paths with 10 threads, the watch works well.
  • I saw this PR written two year ago. Thanks @Randgalt for unremitting efforts and awesome work.

@eolivelli
Copy link
Contributor

good, we need a final +1 from at least @anmolnar then we can go and release this feature in 3.6.0

@Randgalt Randgalt requested a review from anmolnar November 5, 2019 12:05
@Randgalt
Copy link
Member Author

Randgalt commented Nov 5, 2019

  • I saw this PR written two year ago.

Actually, 3 years ago 😃

@eribeiro
Copy link
Contributor

eribeiro commented Nov 6, 2019

@Randgalt Congratulations for the awesome work!!! 🎉 It's very nice to see this much anticipated feature being finally merged into trunk. 👏

@Randgalt
Copy link
Member Author

Randgalt commented Nov 7, 2019

@Randgalt Congratulations for the awesome work!!! 🎉 It's very nice to see this much anticipated feature being finally merged into trunk. 👏

It's been 3 years. I'll believe it when it actually happens 😄

Copy link
Contributor

@nkalmar nkalmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
builds, test run, tested with zkCli, command works, mode switch works, new watch works as expected.

@eolivelli
Copy link
Contributor

@anmolnar shall I have the honour of merging this patch ?

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eolivelli +1 Go ahead please.

@asfgit asfgit closed this in 5536393 Nov 8, 2019
junyoungKimGit added a commit to junyoungKimGit/zookeeper that referenced this pull request Dec 9, 2019
* ZOOKEEPER-3530: add new artifact for compiled c-client code

During the old ZooKeeper 3.4 ant builds (ant package-native), there was an artifact (zookeeper-<version>-lib.tar.gz) created just for the C-client, with the following content:

```
usr
  |--- bin
         |--- cli_mt
         |--- cli_st
         |--- load_gen
  |--- include
         |--- zookeeper
                |--- proto.h
                |--- recordio.h
                |--- zookeeper.h
                |--- zookeeper.jute.h
                |--- zookeeper_log.h
                |--- zookeeper_version.h
  |--- lib
         |--- libzookeeper_mt.a
         |--- libzookeeper_mt.la
         |--- libzookeeper_mt.so
         |--- libzookeeper_mt.so.2
         |--- libzookeeper_mt.so.2.0.0
         |--- libzookeeper_st.a
         |--- libzookeeper_st.la
         |--- libzookeeper_st.so
         |--- libzookeeper_st.so.2
         |--- libzookeeper_st.so.2.0.0
```

Currently with maven, when we are generating a tarball during full-build then the C-client is not getting archived. In [PR-1078](apache#1078) we discussed that we should re-introduce the apache-zookeeper-<version>-lib.tar.gz artifact.

The goals of this PR are:
- re-introduce the 'lib' artifact, with the same structure we had for the older zookeeper 3.4.x ant generated tar file
- we should also add the LICENSE.txt file to the archive (it was missing from the 3.4.x version tar.gz file)
- the new artifact should be generated only when the full-build profile is set for maven
- we should also update the README_packaging.md file

Author: Mate Szalay-Beko <[email protected]>

Reviewers: [email protected], [email protected]

Closes apache#1113 from symat/ZOOKEEPER-3530-PR

* ZOOKEEPER-3593: fix the default value of jute.maxbuffer in client side and an optimization for the documentation

- The default value of `jute.maxbuffer` in client side said it's 4MB, but actually, it was never working, because when the client reads the deserialized znode data, it also calls `checkLength(BinaryInputArchive.java:127)` where  `jute.maxbuffer` default value is 1MB. It's easy to reproduce, just read a znode more than 1MB with any special configure client. Look at the stack trace I attached in the JIRA
- Users also confused about that the doc said `jute.maxbuffer` must be set on all servers and clients, but their default value is not same in the [ZOOKEEPER-1295](https://issues.apache.org/jira/browse/ZOOKEEPER-1295)
- more details in the [ZOOKEEPER-3593](https://issues.apache.org/jira/browse/ZOOKEEPER-3593)

Author: maoling <[email protected]>

Reviewers: [email protected], [email protected]

Closes apache#1129 from maoling/ZOOKEEPER-3593

* ZOOKEEPER-3605: connThrottle needs to be assigned in alternate consructor

`connThrottle` needs to be assigned in alternate consructor to avoid NPEs

Author: randgalt <[email protected]>

Reviewers: Enrico Olivelli, Andor Molnár

Closes apache#1132 from Randgalt/ZOOKEEPER-3605

* ZOOKEEPER-1416 - Persistent, recursive watchers

### Background

Note: this is a port of apache#136

Implementation for a persistent, recursive watch addition for ZK. These watches are set via a new method, addPersistentWatch() and are removed via the existing watcher removal methods. Persistent, recursive watches have these characteristics: a) Once set, they do not auto-remove when triggered; b) they trigger for all event types (child, data, etc.) on the node they are registered for and any child znode recursively; c) they are efficiently implemented by using the existing watch internals. A new class PathIterator walks up the path parent-by-parent when checking if a watcher applies.

### Implementation Details

- A new enum manages the different "modes" for watchers: `WatcherMode`.
- For traditional, "standard" watchers, the code path is almost exactly the same. There is very little overhead other than a bit of extra checks in `WatchManager`
- Given how this is implemented it was difficult to add support when `WatchManagerOptimized` is used. I'm open to adding it for that version but it will take work. We should consider not supporting persistent/recursive watchers when WatchManagerOptimized is used. I notice that `WatchManagerOptimized` is not even mentioned in the docs.
- The mode for a given watcher/path pair is held in a map inside of `WatcherModeManager`. The absence of an entry means Standard. This way, there's no overhead for old, standard watchers.
- `PathParentIterator` is the "meat" of the implementation. Rather than set watchers on every ZNode implied by a recursive watcher. WatchManager passes any paths it processes through PathParentIterator which iterates up each parent znode looking for watchers.
- The remainder of the changes are scaffolding to match how other watchers are used as well as Jute/API changes to set persistent/recursive watchers

### Testing

The tests were written years ago. I think they're comprehensive but reviewers should pay attention to anything that was missed. There is much ZooKeeper knowledge that's only in the heads of ZK committers.

- `PersistentWatcherTest` - tests persistent, non-recursive watchers
- `PersistentRecursiveWatcherTest` - tests persistent, recursive watchers
- `PathParentIteratorTest`- exercises edges of PathParentIterator

Author: randgalt <[email protected]>

Reviewers: Enrico Olivelli <eolivelli@apache,org>, Norbert Kalmar <[email protected]>, Andor Molnár <[email protected]>, Justin Mao Ling <[email protected]>

Closes apache#1106 from Randgalt/ZOOKEEPER-1416

* ZOOKEEPER-3598: Fix potential data inconsistency issue due to CommitProcessor not gracefully shutdown

Note: use exit code 16 for SHUTDOWN_UNGRACEFULLY, since internally we've already using 15 for other exit code, which will be upstreamed later.

Author: Fangmin Lyu <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Michael Han <[email protected]>

Closes apache#1130 from lvfangmin/ZOOKEEPER-3598

* ZOOKEEPER-1260: Audit logging in ZooKeeper servers.

Author: Mohammad Arshad <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Andor Molnar <[email protected]>

Closes apache#1133 from arshadmohammad/ZOOKEEPER-1260-AuditLog-master

* ZOOKEEPER-3340: Introduce CircularBlockingQueue in QuorumCnxManager.java

There are many caveats and comments regarding Exception thrown 'which is safe to ignore'.  Added a new data structure that removes all of these comments and will perform without generating exceptions while more clearly implementing the desired behavior without the caveats.

Author: Beluga Behr <[email protected]>
Author: David Mollitor <[email protected]>
Author: David Mollitor <[email protected]>

Reviewers: [email protected], [email protected]

Closes apache#880 from belugabehr/ZOOKEEPER-3340 and squashes the following commits:

7b74dac [David Mollitor] Changed comment from conrete ArrayBlockingQueue to BlockingQueue
38d7e3f [David Mollitor] Shutdown ExecutorService in test
f5ea2b6 [David Mollitor] Updated to fix checkstyle ImportOrder: Extra separation
b3ac3cc [David Mollitor] Rebased to master
c63ab85 [Beluga Behr] Added updates from code review
d887780 [Beluga Behr] Introduce new class CircularBlockingQueue
2793db2 [Beluga Behr] Added additional logging
ce96b70 [Beluga Behr] ZOOKEEPER-3340: Improve Queue Usage in QuorumCnxManager.java

* ZOOKEEPER-2238: Support limiting the maximum number of connections/clients to a zookeeper server

Support limiting the maximum number of connections/clients to a zookeeper server

Author: sujithsimon22 <[email protected]>

Reviewers: Justin Mao Ling <[email protected]>, Brian Nixon <[email protected]>, Edward Ribeiro <[email protected]>, Enrico Olivelli <[email protected]>, Mohammad Arshad <[email protected]>

Closes apache#1108 from sujithsimon22/ZOOKEEPER-2238

* ZOOKEEPER-3571: Ensure test base directory before tests

Ensure `build.test.dir` is present as a directory. Otherwise many times it suffers from

```
java.io.IOException: No such file or directory

	at java.io.UnixFileSystem.createFileExclusively(Native Method)
	at java.io.File.createTempFile(File.java:2024)
	at org.apache.zookeeper.test.ClientBase.createTmpDir(ClientBase.java:371)
	at org.apache.zookeeper.test.ClientBase.setUpWithServerId(ClientBase.java:514)
	at org.apache.zookeeper.test.ClientBase.setUp(ClientBase.java:491)
```

Author: tison <[email protected]>

Reviewers: [email protected], [email protected]

Closes apache#1112 from TisonKun/ZOOKEEPER-3571

* Revert "ZOOKEEPER-3598: Fix potential data inconsistency issue due to CommitProcessor not gracefully shutdown"

This reverts commit 79f99af.

* ZOOKEEPER-3363: Drop ANT build.xml

- drop Ant and Ivy files
- drop old Cobertura README file
- drop old jdiff file

Author: Enrico Olivelli <[email protected]>
Author: Enrico Olivelli <[email protected]>

Reviewers: Norbert Kalmar <[email protected]>

Closes apache#1139 from eolivelli/fix/drop-ant

* ZOOKEEPER-3560: Add response cache to serve get children (2) requests.

ZOOKEEPER-3180 introduces response cache but it only covers getData requests. This commit is to extend the response cache based on the infrastructure set up by ZOOKEEPER-3180 to so the response of get children requests can also be served out of cache. Some design decisions:

* Only OpCode.getChildren2 is supported, as OpCode.getChildren does not have associated stats and current cache infra relies on stats to invalidate cache.

* The children list is stored in a separate response cache object so it does not pollute the existing data cache that's serving getData requests, and this separation also allows potential separate tuning of each cache based on workload characteristics.

* As a result of cache object separation, new server metrics is added to measure cache hit / miss for get children requests, that's separated from get data requests.

Similar as ZOOKEEPER-3180, the get children response cache is enabled by default, with a default cache size of 400, and can be disabled (together with get data response cache.).

Author: Michael Han <[email protected]>
Author: Michael Han <[email protected]>

Reviewers: [email protected], [email protected], [email protected]

Closes apache#1098 from hanm/cache

* ZOOKEEPER-3502: improve the server command: zabstate to have a better observation on the process of leader election

- Just as we knew, the rule of LE is that who has the biggest last zxid who will win.when the zxid is the same, chose the myid number bigger one.
- Do the following cmd will have a bird's-eye view on LE
```
 ➜  bin curl http://localhost:8081/commands/zabstate  http://localhost:8082/commands/zabstate http://localhost:8083/commands/zabstate
{
  "myid" : 1,
  "is_leader" : true,
  "endpoint" : "/127.0.0.1:2181",
  "voting" : true,
  "last_zxid" : "0xfa2c00000000",
  "zab_epoch" : 64044,
  "zab_counter" : 0,
  "zabstate" : "broadcast",
  "command" : "zabstate",
  "error" : null
}{
  "myid" : 2,
  "is_leader" : false,
  "endpoint" : "/127.0.0.1:2182",
  "voting" : true,
  "last_zxid" : "0xfa2b00000004",
  "zab_epoch" : 64043,
  "zab_counter" : 4,
  "zabstate" : "broadcast",
  "command" : "zabstate",
  "error" : null
}{
  "myid" : 3,
  "is_leader" : false,
  "endpoint" : "/127.0.0.1:2183",
  "voting" : true,
  "last_zxid" : "0xfa2b00000004",
  "zab_epoch" : 64043,
  "zab_counter" : 4,
  "zabstate" : "broadcast",
  "command" : "zabstate",
  "error" : null
}%
```
- more details in the [ZOOKEEPER-3502](https://issues.apache.org/jira/browse/ZOOKEEPER-3502)

Author: maoling <[email protected]>

Reviewers: Brian Nixon <[email protected]>, Andor Molnar <[email protected]>, Enrico Olivelli <[email protected]>

Closes apache#1052 from maoling/ZOOKEEPER-3502

* ZOOKEEPER-3570: make the special client xid constant

- more details in the [ZOOKEEPER-3570](https://issues.apache.org/jira/browse/ZOOKEEPER-3570)

Author: maoling <[email protected]>

Reviewers: Fangmin Lyu <[email protected]>, Brian Nixon <[email protected]>

Closes apache#1136 from maoling/ZOOKEEPER-3570

* ZOOKEEPER-2122: add SSL support for C-client

This PR is based on the works of Asnish Amarnath and Suhas Dantkale. Most of the kudos should go to them and those who were reviewing all the previous PRs.

**The PR includes the following changes from PR#639:**
- OPENSSL 1.1.1 version support in C-client

**The PR includes the following changes from PR#990:**
- also supporting OPENSSL 1.0.2
- SSL connection on non-blocking socket is handled correctly
- Support of Certificate Chains
- Fix Memory leaks
- Dynamically generated test certificates

**The following new changes were added into the PR:**
- fix CMake + VisualStudio2019 build with windows
- fix C CLI to compile / work both with windows and linux (I tested them manually)
- fix (and simplify) the way how the server is started with C unit tests, so it is compatible with maven build now
- the test case `testReadOnly` was failing with the previous PR because there was a bug in the C-client code, I fixed that
- I also added new test case: `testReadOnlyWithSSL`

**Testing this PR on linux:**
```
git clean -xdf

# compile ZooKeeper server plus the C-client code
mvn clean install -DskipTests -Pfull-build

# compile and execute C-client unit tests
cd zookeeper-client/
mvn clean install -Pfull-build
```

**Compile the code on windows (only cmake is supported):**
- download C-Make:  https://cmake.org/download/
- Install community edition of Visual Studio: https://visualstudio.microsoft.com/downloads/
- Download OpenSSL (e.g. 1.0.2): https://slproweb.com/products/Win32OpenSSL.html (e.g. install it to `c:\OpenSSL-Win64`)
- compile the java code using: `mvn clean install -DskipTests`
- go to the Client folder: `cd zookeeper-client\zookeeper-client-c`
- configure the project:  `cmake . -D WITH_OPENSSL=c:\OpenSSL-Win64`
- build the project: `cmake --build .`

**Testing the C-client with SSL manually:**
- run the `zookeeper-client/zookeeper-client-c/ssl/gencerts.sh` to generate certificate files (e.g. copy it to an empty folder like `/tmp/ssl/` and start is)
- start a ZooKeeper server, using some config file like this one:
```
tickTime=3000
initLimit=10
syncLimit=5
dataDir=/tmp/zkdata

secureClientPort=22281
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.keyStore.location=/tmp/ssl/server.jks
ssl.keyStore.password=password
ssl.trustStore.location=/tmp/ssl/servertrust.jks
ssl.trustStore.password=password
```
- start the command line client (cli.exe on windows, cli_mt or cli_st on linux): `./cli_mt --host localhost:22281 --ssl /tmp/ssl/server.crt,/tmp/ssl/client.crt,/tmp/ssl/clientkey.pem,password`

Author: Mate Szalay-Beko <[email protected]>
Author: Mate Szalay-Beko <[email protected]>

Reviewers: [email protected]

Closes apache#1107 from symat/ZOOKEEPER-2122 and squashes the following commits:

08294ce [Mate Szalay-Beko] ZOOKEEPER-2122: update readme + use FQDN in SSL certs during testing
17e504a [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-2122
317241d [Mate Szalay-Beko] ZOOKEEPER-2122: minor fix in SSL certificates used for testing
6f37b66 [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into HEAD
9809143 [Mate Szalay-Beko] ZOOKEEPER-2122: add SSL support for C-client

* ZOOKEEPER-3590: Zookeeper is unable to set the zookeeper.sasl.client.canonicalize.hostname using system property

See https://issues.apache.org/jira/browse/ZOOKEEPER-3590

This is a very small patch that gives sets the option after reading it from the system.getProperty().

There should not be any side effects since this is mostly copy paste.

Author: aristotelhs <[email protected]>

Reviewers: [email protected], [email protected]

Closes apache#1124 from aristotelhs/ZOOKEEPER-3590

* ZOOKEEPER-3627: Update Jackson to 2.9.10.1

Author: Colm O hEigeartaigh <[email protected]>

Reviewers: [email protected], [email protected]

Closes apache#1150 from coheigea/ZOOKEEPER-3627

* ZOOKEEPER-3473: Improving successful TLS handshake throughput with concurrent control

When there are lots of clients trying to re-establish sessions, there might be lots of half finished handshake timed out, and those failed ones keep reconnecting to another server and restarting the handshake from beginning again, which caused herd effect.

And the number of total ZK sessions could be supported within session timeout are impacted a lot after enabling TLS.

To improve the throughput, we added the TLS concurrent control to reduce the herd effect, and from out benchmark this doubled the sessions we could support within session timeout.

E2E test result:

Tested performance and correctness from E2E. For correctness, tested both secure and insecure
connections, the outstandingHandshakeNum will go to 0 eventually.

For performance, tested with 110k sessions with 10s session timeout, there is no session expire when leader election triggered, while before it can only support 50k sessions.

Author: Fangmin Lyu <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Andor Molnar <[email protected]>

Closes apache#1027 from lvfangmin/ZOOKEEPER-3473

* ZOOKEEPER-3546 - Allow optional deletion of never used Container Nodes

Edge cases can cause Container Nodes to never be deleted. i.e. if the container node is created and then the client that create the node crashes the container will not get deleted unless another client creates a node inside of it. This is because the initial implementation does not delete container nodes with a cversion of 0. This PR adds a new system property, "znode.container.maxNeverUsedIntervalMs", that can be set to delete containers with a cversion of 0 that have been retained for a period of time. This is a backward compatible change as the default value for this is Long.MAX_VALUE - i.e. never.

Author: randgalt <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Fangmin Lyu <[email protected]>

Closes apache#1138 from Randgalt/ZOOKEEPER-3546-allow-delete-of-never-used-containers

* ZOOKEEPER-3620: Allow to override calls to System.exit in server side code

- Introduce a way to override calls to System.exit
- enable DM_EXIT spotbugs rule

see https://issues.apache.org/jira/browse/ZOOKEEPER-3620 for more context.

Author: Enrico Olivelli <[email protected]>
Author: Enrico Olivelli <[email protected]>

Reviewers: [email protected]

Closes apache#1147 from eolivelli/fix/ZOOKEEPER-3620-no-systemexit and squashes the following commits:

a234f85 [Enrico Olivelli] Fix checkstyle
4c4fec4 [Enrico Olivelli] Fix spotbugs warning
ae339b7 [Enrico Olivelli] Revert changes to VerGen.java
0e5ee07 [Enrico Olivelli] Enable DM_EXIT spotbugs rule for the full code base
b05a4bf [Enrico Olivelli] ZOOKEEPER-3620 Allow to override calls to System.exit in server side code: - Use a common utility to call System.exit - Override calls to System.exit to a NO-OP function in tests

* ZOOKEEPER-3188: Improve resilience to network

This PR is the rebase of the [previous pull request](apache#730), so all the kudos should go to the original authors...

In [ZOOKEEPER-3188](https://issues.apache.org/jira/browse/ZOOKEEPER-3188) we add ability to specify several addresses for quorum operations. Also added reconnection attempts if connection to leader lost.

In this PR I rebased the changes on the current master, resolving some minor conflicts with:
- [ZOOKEEPER-3296](https://issues.apache.org/jira/browse/ZOOKEEPER-3296): Explicitly closing the sslsocket when it failed handshake to prevent issue where peers cannot join quorum
- [ZOOKEEPER-3320](https://issues.apache.org/jira/browse/ZOOKEEPER-3320): Leader election port stop listen when hostname unresolvable for some time
- [ZOOKEEPER-3385](https://issues.apache.org/jira/browse/ZOOKEEPER-3385): Add admin command to display leader
- [ZOOKEEPER-3386](https://issues.apache.org/jira/browse/ZOOKEEPER-3386): Add admin command to display voting view
- [ZOOKEEPER-3398](https://issues.apache.org/jira/browse/ZOOKEEPER-3398): Learner.connectToLeader() may take too long to time-out

I still want to test the feature manually (e.g. using docker containers with multiple virtual networks / interfaces). The steps to the manual test could be recorded in the [google docs](https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing) as well.

Also I think we could add a few more unit tests where we are using multiple addresses. The current tests are using a single address only.

Also the Zookeeper documentation needs to be changed (e.g. by a follow-up Jira?) to promote the new feature and the new config format (possibly including also the admin command documentation in relation with [ZOOKEEPER-3386](https://issues.apache.org/jira/browse/ZOOKEEPER-3386) and [ZOOKEEPER-3461](https://issues.apache.org/jira/browse/ZOOKEEPER-3461))

Author: Mate Szalay-Beko <[email protected]>
Author: Mate Szalay-Beko <[email protected]>

Reviewers: [email protected], [email protected]

Closes apache#1048 from symat/ZOOKEEPER-3188 and squashes the following commits:

3c6fc52 [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-3188
356882d [Mate Szalay-Beko] ZOOKEEPER-3188: document new configuration format for using multiple addresses
45b6c0f [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-3188
4b6bcea [Mate Szalay-Beko] ZOOKEEPER-3188: MultiAddress unit tests for Quorum TLS and Kerberos/Digest authentication
40bc44c [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-3188
f875f5c [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-3188
31805e7 [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-3188
0f95678 [Mate Szalay-Beko] ZOOKEEPER-3188: skip unreachable addresses when Learner connects to Leader
e232c55 [Mate Szalay-Beko] ZOOKEEPER-3188: fix flaky unit MultiAddress unit test
e892d8d [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-3188
6f2ab75 [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-3188
2eedf26 [Mate Szalay-Beko] ZOOKEEPER-3188: fix PR commits; handle case when Leader can not bind to port on startup
483d2fc [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-3188
a5d6bcb [Mate Szalay-Beko] ZOOKEEPER-3188: support for dynamic reconfig + add more unit tests
ed31d2c [Mate Szalay-Beko] ZOOKEEPER-3188: better shutdown for executors (following PR comments)
8713a5b [Mate Szalay-Beko] ZOOKEEPER-3188: add fixes for PR comments
05eae83 [Mate Szalay-Beko] Merge remote-tracking branch 'apache/master' into ZOOKEEPER-3188
e823af4 [Mate Szalay-Beko] Merge remote-tracking branch 'origin/master' into ZOOKEEPER-3188
de7bad2 [Mate Szalay-Beko] Merge remote-tracking branch 'origin/master' into ZOOKEEPER-3188
da98a8d [Mate Szalay-Beko] ZOOKEEPER-3188: fix JDK-13 warning
5bd1f4e [Mate Szalay-Beko] ZOOKEEPER-3188: supress spotbugs warning
42a52a6 [Mate Szalay-Beko] ZOOKEEPER-3188: improve based on code review comments
6c4220a [Mate Szalay-Beko] ZOOKEEPER-3188: fix SendWorker.asyncValidateIfSocketIsStillReachable
5b22432 [Mate Szalay-Beko] ZOOKEEPER-3188: fix LeaderElection to work with multiple election addresses
7bfbe7e [Mate Szalay-Beko] ZOOKEEPER-3188: Improve resilience to network

* ZOOKEEPER-3633: AdminServer commands throw NPE when only secure client port is used

When only secureClientPort is defined in the config and there is no regular clientPort,
then both the stat and the conf commands on the AdminServer result in 500 Server Error caused by
NullPointerExceptions. The problem is that no serverCnxFactory is defined in the
ZooKeeperServer in this case, we have only secureServerCnxnFactory.

In the fix we return info about both the secure and unsecure connections.
Example of the stat command output for secure-only configuration:
```
{
  "version" : "3.6.0-SNAPSHOT-8e8905069f4bff670c0492fe9e28ced0f86bca00, built on 11/29/2019 08:04 GMT",
  "read_only" : false,
  "server_stats" : {
    "packets_sent" : 1,
    "packets_received" : 1,
    "fsync_threshold_exceed_count" : 0,
    "client_response_stats" : {
      "last_buffer_size" : -1,
      "min_buffer_size" : -1,
      "max_buffer_size" : -1
    },
    "data_dir_size" : 671094270,
    "log_dir_size" : 671094270,
    "last_processed_zxid" : 20,
    "outstanding_requests" : 0,
    "server_state" : "standalone",
    "avg_latency" : 5.0,
    "max_latency" : 5,
    "min_latency" : 5,
    "num_alive_client_connections" : 1,
    "provider_null" : false,
    "uptime" : 15020
  },
  "client_response" : {
    "last_buffer_size" : -1,
    "min_buffer_size" : -1,
    "max_buffer_size" : -1
  },
  "node_count" : 6,
  "connections" : [ ],
  "secure_connections" : [ {
    "remote_socket_address" : "127.0.0.1:57276",
    "interest_ops" : 1,
    "outstanding_requests" : 0,
    "packets_received" : 1,
    "packets_sent" : 1
  } ],
  "command" : "stats",
  "error" : null
}
```

Author: Mate Szalay-Beko <[email protected]>

Reviewers: Andor Molnar <[email protected]>, Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>

Closes apache#1161 from symat/ZOOKEEPER-3633

* ZOOKEEPER-3595: restore the handling of the fsync parameter

Author: Jingguo Yao <[email protected]>

Reviewers: [email protected], [email protected], [email protected], [email protected]

Closes apache#1146 from yaojingguo/ZOOKEEPER-3595

* ZOOKEEPER-3546: fix missed change, default should be 0 not Long.MAX_VALUE

fix missed change, default should be 0 not Long.MAX_VALUE

Author: randgalt <[email protected]>

Reviewers: [email protected], [email protected]

Closes apache#1158 from Randgalt/ZOOKEEPER-3546-allow-delete-of-never-used-containers-fixit

* ZOOKEEPER-3635: Use Docker and Maven Release Plugin to prepare ZooKeeper releases

- configure maven-release-plugin
- configure maven-scm-plugin
- use maven-antrun-plugin in order to alter C client files during release procedure
- update Apache Parent pom to 21
- use openjdk8 for building releases

Author: Enrico Olivelli <[email protected]>
Author: Enrico Olivelli <[email protected]>
Author: Enrico Olivelli <[email protected]>

Reviewers: Norbert Kalmar <[email protected]>

Closes apache#1162 from eolivelli/fix/ZOOKEEPER-3635-new-release-proc
junyoungKimGit pushed a commit to junyoungKimGit/zookeeper that referenced this pull request Feb 7, 2020
### Background

Note: this is a port of apache#136

Implementation for a persistent, recursive watch addition for ZK. These watches are set via a new method, addPersistentWatch() and are removed via the existing watcher removal methods. Persistent, recursive watches have these characteristics: a) Once set, they do not auto-remove when triggered; b) they trigger for all event types (child, data, etc.) on the node they are registered for and any child znode recursively; c) they are efficiently implemented by using the existing watch internals. A new class PathIterator walks up the path parent-by-parent when checking if a watcher applies.

### Implementation Details

- A new enum manages the different "modes" for watchers: `WatcherMode`.
- For traditional, "standard" watchers, the code path is almost exactly the same. There is very little overhead other than a bit of extra checks in `WatchManager`
- Given how this is implemented it was difficult to add support when `WatchManagerOptimized` is used. I'm open to adding it for that version but it will take work. We should consider not supporting persistent/recursive watchers when WatchManagerOptimized is used. I notice that `WatchManagerOptimized` is not even mentioned in the docs.
- The mode for a given watcher/path pair is held in a map inside of `WatcherModeManager`. The absence of an entry means Standard. This way, there's no overhead for old, standard watchers.
- `PathParentIterator` is the "meat" of the implementation. Rather than set watchers on every ZNode implied by a recursive watcher. WatchManager passes any paths it processes through PathParentIterator which iterates up each parent znode looking for watchers.
- The remainder of the changes are scaffolding to match how other watchers are used as well as Jute/API changes to set persistent/recursive watchers

### Testing

The tests were written years ago. I think they're comprehensive but reviewers should pay attention to anything that was missed. There is much ZooKeeper knowledge that's only in the heads of ZK committers.

- `PersistentWatcherTest` - tests persistent, non-recursive watchers
- `PersistentRecursiveWatcherTest` - tests persistent, recursive watchers
- `PathParentIteratorTest`- exercises edges of PathParentIterator

Author: randgalt <[email protected]>

Reviewers: Enrico Olivelli <eolivelli@apache,org>, Norbert Kalmar <[email protected]>, Andor Molnár <[email protected]>, Justin Mao Ling <[email protected]>

Closes apache#1106 from Randgalt/ZOOKEEPER-1416
stickyhipp pushed a commit to stickyhipp/zookeeper that referenced this pull request Aug 19, 2020
### Background

Note: this is a port of apache#136

Implementation for a persistent, recursive watch addition for ZK. These watches are set via a new method, addPersistentWatch() and are removed via the existing watcher removal methods. Persistent, recursive watches have these characteristics: a) Once set, they do not auto-remove when triggered; b) they trigger for all event types (child, data, etc.) on the node they are registered for and any child znode recursively; c) they are efficiently implemented by using the existing watch internals. A new class PathIterator walks up the path parent-by-parent when checking if a watcher applies.

### Implementation Details

- A new enum manages the different "modes" for watchers: `WatcherMode`.
- For traditional, "standard" watchers, the code path is almost exactly the same. There is very little overhead other than a bit of extra checks in `WatchManager`
- Given how this is implemented it was difficult to add support when `WatchManagerOptimized` is used. I'm open to adding it for that version but it will take work. We should consider not supporting persistent/recursive watchers when WatchManagerOptimized is used. I notice that `WatchManagerOptimized` is not even mentioned in the docs.
- The mode for a given watcher/path pair is held in a map inside of `WatcherModeManager`. The absence of an entry means Standard. This way, there's no overhead for old, standard watchers.
- `PathParentIterator` is the "meat" of the implementation. Rather than set watchers on every ZNode implied by a recursive watcher. WatchManager passes any paths it processes through PathParentIterator which iterates up each parent znode looking for watchers.
- The remainder of the changes are scaffolding to match how other watchers are used as well as Jute/API changes to set persistent/recursive watchers

### Testing

The tests were written years ago. I think they're comprehensive but reviewers should pay attention to anything that was missed. There is much ZooKeeper knowledge that's only in the heads of ZK committers.

- `PersistentWatcherTest` - tests persistent, non-recursive watchers
- `PersistentRecursiveWatcherTest` - tests persistent, recursive watchers
- `PathParentIteratorTest`- exercises edges of PathParentIterator

Author: randgalt <[email protected]>

Reviewers: Enrico Olivelli <eolivelli@apache,org>, Norbert Kalmar <[email protected]>, Andor Molnár <[email protected]>, Justin Mao Ling <[email protected]>

Closes apache#1106 from Randgalt/ZOOKEEPER-1416
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Aug 31, 2022
### Background

Note: this is a port of apache#136

Implementation for a persistent, recursive watch addition for ZK. These watches are set via a new method, addPersistentWatch() and are removed via the existing watcher removal methods. Persistent, recursive watches have these characteristics: a) Once set, they do not auto-remove when triggered; b) they trigger for all event types (child, data, etc.) on the node they are registered for and any child znode recursively; c) they are efficiently implemented by using the existing watch internals. A new class PathIterator walks up the path parent-by-parent when checking if a watcher applies.

### Implementation Details

- A new enum manages the different "modes" for watchers: `WatcherMode`.
- For traditional, "standard" watchers, the code path is almost exactly the same. There is very little overhead other than a bit of extra checks in `WatchManager`
- Given how this is implemented it was difficult to add support when `WatchManagerOptimized` is used. I'm open to adding it for that version but it will take work. We should consider not supporting persistent/recursive watchers when WatchManagerOptimized is used. I notice that `WatchManagerOptimized` is not even mentioned in the docs.
- The mode for a given watcher/path pair is held in a map inside of `WatcherModeManager`. The absence of an entry means Standard. This way, there's no overhead for old, standard watchers.
- `PathParentIterator` is the "meat" of the implementation. Rather than set watchers on every ZNode implied by a recursive watcher. WatchManager passes any paths it processes through PathParentIterator which iterates up each parent znode looking for watchers.
- The remainder of the changes are scaffolding to match how other watchers are used as well as Jute/API changes to set persistent/recursive watchers

### Testing

The tests were written years ago. I think they're comprehensive but reviewers should pay attention to anything that was missed. There is much ZooKeeper knowledge that's only in the heads of ZK committers.

- `PersistentWatcherTest` - tests persistent, non-recursive watchers
- `PersistentRecursiveWatcherTest` - tests persistent, recursive watchers
- `PathParentIteratorTest`- exercises edges of PathParentIterator

Author: randgalt <[email protected]>

Reviewers: Enrico Olivelli <eolivelli@apache,org>, Norbert Kalmar <[email protected]>, Andor Molnár <[email protected]>, Justin Mao Ling <[email protected]>

Closes apache#1106 from Randgalt/ZOOKEEPER-1416
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Sep 3, 2022
### Background

Note: this is a port of apache#136

Implementation for a persistent, recursive watch addition for ZK. These watches are set via a new method, addPersistentWatch() and are removed via the existing watcher removal methods. Persistent, recursive watches have these characteristics: a) Once set, they do not auto-remove when triggered; b) they trigger for all event types (child, data, etc.) on the node they are registered for and any child znode recursively; c) they are efficiently implemented by using the existing watch internals. A new class PathIterator walks up the path parent-by-parent when checking if a watcher applies.

### Implementation Details

- A new enum manages the different "modes" for watchers: `WatcherMode`.
- For traditional, "standard" watchers, the code path is almost exactly the same. There is very little overhead other than a bit of extra checks in `WatchManager`
- Given how this is implemented it was difficult to add support when `WatchManagerOptimized` is used. I'm open to adding it for that version but it will take work. We should consider not supporting persistent/recursive watchers when WatchManagerOptimized is used. I notice that `WatchManagerOptimized` is not even mentioned in the docs.
- The mode for a given watcher/path pair is held in a map inside of `WatcherModeManager`. The absence of an entry means Standard. This way, there's no overhead for old, standard watchers.
- `PathParentIterator` is the "meat" of the implementation. Rather than set watchers on every ZNode implied by a recursive watcher. WatchManager passes any paths it processes through PathParentIterator which iterates up each parent znode looking for watchers.
- The remainder of the changes are scaffolding to match how other watchers are used as well as Jute/API changes to set persistent/recursive watchers

### Testing

The tests were written years ago. I think they're comprehensive but reviewers should pay attention to anything that was missed. There is much ZooKeeper knowledge that's only in the heads of ZK committers.

- `PersistentWatcherTest` - tests persistent, non-recursive watchers
- `PersistentRecursiveWatcherTest` - tests persistent, recursive watchers
- `PathParentIteratorTest`- exercises edges of PathParentIterator

Author: randgalt <[email protected]>

Reviewers: Enrico Olivelli <eolivelli@apache,org>, Norbert Kalmar <[email protected]>, Andor Molnár <[email protected]>, Justin Mao Ling <[email protected]>

Closes apache#1106 from Randgalt/ZOOKEEPER-1416
@tisonkun
Copy link
Member

Yes. It’s the triggering of missed events that I’m concerned about. It would be easy to add but I’m unsure if it’s needed.

@Randgalt I run into this case today. How can it be easy to add? At least we need to associate zxid with the request and the data tree node, as well as bookkeep the changes. Then we can filter the changes between the client setwatches request and the server state. Or we send some duplicate but the client can filter with some revisions.

@Randgalt
Copy link
Member Author

Randgalt commented Dec 28, 2022

Yes. It’s the triggering of missed events that I’m concerned about. It would be easy to add but I’m unsure if it’s needed.

@Randgalt I run into this case today. How can it be easy to add? At least we need to associate zxid with the request and the data tree node, as well as bookkeep the changes. Then we can filter the changes between the client setwatches request and the server state. Or we send some duplicate but the client can filter with some revisions.

The persistentWatches should be easy. Something like:

        for (String path : persistentWatches) {
            DataNode node = getNode(path);
            // determine if there's a change similar to code above this and process the watch if needed
            this.childWatches.addWatch(path, watcher, WatcherMode.PERSISTENT);
            this.dataWatches.addWatch(path, watcher, WatcherMode.PERSISTENT);
        }

persistentRecursiveWatches are harder. For each persistent recursive watch you'd need to walk down child paths of each watch, get the node and determine if the watch needs to be called (you'd probably need to keep a set of already visited nodes - I'm not sure). Depending on the level of the watch it could end up walking the entire database but there's no choice.

@tisonkun
Copy link
Member

tisonkun commented Dec 28, 2022

@Randgalt I get your point.

similar to code above this and process the watch if needed

Although, this can be more trick than it first looks like. Because a persistent watch is always set, only node.stat.getMzxid() > relativeZxid indicates a NodeDataChanged event (It can be actually a NodeCreated, which we cannot distinguish). Either the node exists or not cannot indicate NodeCreated or NodeDeleted. Also, at most it resends the latest event, instead of all the events during the network partition.

But I think I know what we can do and your original thought now :)

kezhuw added a commit to kezhuw/zookeeper that referenced this pull request May 22, 2023
…nt watches

I found this in reply to
apache#1950 (comment).

But it turns out a known issue
apache#1106 (comment).

> An important question to all committers. In DataTree.setWatches
> persistent watchers are not applied. This means that after a
> network partition, no persistent watchers will trigger. I
> don't have a feeling about this one way or another - the current
> implementation works fine for Curator's use cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants