-
Notifications
You must be signed in to change notification settings - Fork 7.3k
ZOOKEEPER-3720: Fix rolling upgrade failure (invalid protocol version) #1251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The multi-address feature introduced in ZOOKEEPER-3188 required changes in the Quorum protocol as we had to send all addresses in the connection initiation message to enable the receiving side to choose a reachable address in case of network failure. The new code can handle both the old and the new protocol versions to avoid 'invalid protocol' error e.g. during rolling restarts. However, the new protocol version still can not be used during rolling upgrade if the old servers are not supporting this protocol. In this case the old and the new servers would form two distinct partitions until all the servers get upgraded. To support rolling upgrades too, we want to disable the MultiAddress feature by default and use the old protocol. If the user would like enable the MultiAddress feature on a 3.6.0 cluster, she/he can do it either by 1) starting the cluster from scratch (without rolling upgrade), or 2) performing a rolling upgrade without the MultiAddress feature enabled then doing a rolling restart with a new configuration where the MultiAddress feature is enabled. During the rolling restart there will be no partitions, as all the servers in the cluster now will run ZooKeeper version 3.6.0 which understands now both protocols. The changes in this patch: - introducing new config property: multiAddress.enabled, disabled by default - updating QuorumCnxManager to be able to use both protocol versions and to use the old one if MultiAddress is disabled - failing with ConfigException if the user provides multiple addresses in the config while having MultiAddress disabled - updating the existing MultiAddress related tests to enable the feature first - add some new tests - update the documentation Testing: - I ran all the unit tests - Using https://github.com/symat/zk-rolling-upgrade-test - I tested rolling upgrade from 3.5.6 - I tested rolling restart to enable the MultiAddress feature - Using https://github.com/symat/zookeeper-docker-test - I tested the MultiAddress feature by disabling some virtual interfaces and waiting for the cluster to recover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (e.g. the zk/[email protected] client principal will be authenticated in ZooKeeper as zk/myhost) | ||
| Default: false | ||
|
|
||
| * *multiAddress.enable* : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiAddress.enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, thanks!
| checkIfZooKeeperQuorumWorks(newQuorumConfig); | ||
| } | ||
|
|
||
| @Test(expected = KeeperException.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this way of testing for an expected exception is too broad, you don't know where KeeperException has been thrown.
I suggest to push the assertion around the ReconfigTest.reconfig call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, it's nicer that way indeed, I will fix it
| Default: false | ||
|
|
||
| * *multiAddress.enable* : | ||
| (Java system property: **zookeeper.multiAddress.enable**) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, missing the d (internet has ruined me, no pun intended)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, thanks. :)
anmolnar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
I will do some real world testing soon.
| if (senderWorkerMap.get(sid) != null) { | ||
| LOG.debug("There is a connection already for server {}", sid); | ||
| if (electionAddr.size() > 1 && self.isMultiAddressReachabilityCheckEnabled()) { | ||
| if (electionAddr.size() > 1 && self.isMultiAddressEnabled() && self.isMultiAddressReachabilityCheckEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need the "enabled" check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not do any harm though... move it to the beginning of if please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, actually if multi-address is disabled, then we can't have more than one address anyway. Still, I think it is good to have the extra condition there (if for nothing else, maybe just to make the code easier to understand)
I will move it to the beginning.
this is true... still (for the overall quality) I think it is good that both issues (ICMP and the expected behavior in rolling restarts) were raised. I am just sad that we haven't deal with these during the development of ZOOKEEPER-3188. |
anmolnar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 lgtm.
Finished the rolling upgrade test from 3.5.6, it works fine with this patch.
| */ | ||
| public static final long PROTOCOL_VERSION = -65535L; | ||
| // the following protocol version was sent in every connection initiation message since ZOOKEEPER-2186 released in 3.4.7 | ||
| public static final long PROTOCOL_VERSION_3_4_7 = -65536L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this constant PROTOCOL_VERSION_V1
and the other one PROTOCOL_VERSION_V2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I named this way was that I am not sure if we won't need to add other previous versions later (e.g. to support rolling restart from pre-3.4.7 versions). But giving a second thought, I like your proposal better. These are just constants, we can rename them later in any ways. I will change this in the next commit.
| // feature is enabled. During rolling upgrade, we must make sure that all the servers can | ||
| // understand the protocol version we use to avoid multiple partitions. see ZOOKEEPER-3720 | ||
| long protocolVersion = self.isMultiAddressEnabled() ? PROTOCOL_VERSION_3_6_0_MULTI_ADDRESS : PROTOCOL_VERSION_3_4_7; | ||
| dout.writeLong(protocolVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of new protocol (v2) I would like to spend here a 64bit long of 'flags' (filled with zeros in 3.6.0) in order to introduce a more futureproof 'feature detection' that is better than using protocol version numbers.
We can then add 'multiaddress' as an enabled feature.
We should do more reasoning and design steps but this way of coding the initial handshake is quite common and it will open up the way for future improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I really like this idea with the 'feature flags', I will change accordingly.
The only constraint is that the version number must be negative. I think I saw in the git history, that some years ago, when the protocol version was not introduced yet, this was the way how someone solved the introduction of versioning. The old code sent some positive sid or message size, and the new code sent a negative version number. This way enforcing the old code to break and throw away the message, while the new code was able to verify if it got the message in the new format based on the negative number. But I need to double-check this tomorrow, maybe I remember wrong.
And also there can be maybe 1-2 negative numbers already used as protocol versions, so I will check which bits are still available.
Although I think it is enough to support the rolling upgrade from the first stable 3.4 release, which can simplify things, and we don't need to go back in time in the code too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I did some deeper digging in the git history, this is what I found:
The different version numbers / behaviors we had before:
- protocol version originally introduced by ZOOKEEPER-107 in 3.5.0
- before 3.5.0, always a positive SID was sent in the beginning of the connection initiation message and there were no address information there. Even 3.4.14 still only sends a positive SID.
- since ZOOKEEPER-107 (in 3.5.0+): version -65536L (0xffff0000) was sent, and on the receiver side:
- if any positive (incl. 0) Long number received, then accept the number as SID and get the address from the own config
- if -65536L was received, then parsing the message with the new format and use the address provided there
- for any other negative numbers, we fail
- in ZOOKEEPER-1633 (3.4.7), the receiver logic was making more clever to expect the SID as the second Long in the message, if the first Long was negative. So from 3.4.7 all the 3.4 servers can at least not die due to the new message formats, and rolling upgrades become possible without multiple partitions. However, all 3.4.X still sends only the positive SID and will not use any address information from the initial message (just reads the SID).
This also means, that we can choose any negative number as new protocol version for the MultiAddress feature, and then the rolling upgrade from 3.4.7-3.4.14 to 3.6.0 could work theoretically even with MultiAddress enabled, as long as we always send the SID right after the protocol version.
But the value of the protocol version is actually checked in 3.5.0, so we definitely need to disable MultiAddress to be able to get the rolling upgrade to work from 3.5.
Long story short: we only have a single protocol version, -65536 (0xffff0000) as of now, plus the case when positive numbers are sent. The -65535 (0xffff0001) seems to me a good protocol version candidate for the multiaddress compatible protocol, since it is also only a single bit difference between the two. So if later we will introduce the 'feature flags' we can already say that the last bit represented the multiaddress feature.
What do you think? (maybe I misunderstood your proposal here)
But I just don't know how this whole thing should look in the later releases. Also this whole protocol version is now only about the initial message. We don't know if there is any other incompatibility between the other messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eolivelli what do you think?
should we add the 'feature flags' implementation here in 3.6.0? Or is it enough if we choose a new optional version number that differs only in a single bit to the current version number, so that we are keeping this possibility of 'feature flags' open for the future? (currently we have only a single 'feature' anyway)
I am not sure if it worths to push for a more generic solution, given that we don't really know yet how to design this in the future.
|
LGTM! (I reviewed the code, but did not do any testing.) |
|
@ztzg thanks for checking! |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go with current patch.
We have to think more about such feature flag or whatever will be the best option
nkalmar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks Máté!
|
retest maven please |
|
retest this please |
The multi-address feature introduced in ZOOKEEPER-3188 required changes in the Quorum protocol as we had to send all addresses in the connection initiation message to enable the receiving side to choose a reachable address in case of network failure. The new code can handle both the old and the new protocol versions to avoid 'invalid protocol' error e.g. during rolling restarts. However, the new protocol version still can not be used during rolling upgrade if the old servers are not supporting this protocol. In this case the old and the new servers would form two distinct partitions until all the servers get upgraded. To support rolling upgrades too, we want to disable the MultiAddress feature by default and use the old protocol. If the user would like enable the MultiAddress feature on a 3.6.0 cluster, she/he can do it either by 1) starting the cluster from scratch (without rolling upgrade), or 2) performing a rolling upgrade without the MultiAddress feature enabled then doing a rolling restart with a new configuration where the MultiAddress feature is enabled. During the rolling restart there will be no partitions, as all the servers in the cluster now will run ZooKeeper version 3.6.0 which understands now both protocols. The changes in this patch: - introducing new config property: multiAddress.enabled, disabled by default - updating QuorumCnxManager to be able to use both protocol versions and to use the old one if MultiAddress is disabled - failing with ConfigException if the user provides multiple addresses in the config while having MultiAddress disabled - updating the existing MultiAddress related tests to enable the feature first - add some new tests - update the documentation Testing: - I ran all the unit tests - Using https://github.com/symat/zk-rolling-upgrade-test - I tested rolling upgrade from 3.5.6 - I tested rolling restart to enable the MultiAddress feature - Using https://github.com/symat/zookeeper-docker-test - I tested the MultiAddress feature by disabling some virtual interfaces and waiting for the cluster to recover Author: Mate Szalay-Beko <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]> Closes #1251 from symat/ZOOKEEPER-3720 (cherry picked from commit 3aa922c) Signed-off-by: Enrico Olivelli <[email protected]>
The multi-address feature introduced in ZOOKEEPER-3188 required changes in the Quorum protocol as we had to send all addresses in the connection initiation message to enable the receiving side to choose a reachable address in case of network failure. The new code can handle both the old and the new protocol versions to avoid 'invalid protocol' error e.g. during rolling restarts. However, the new protocol version still can not be used during rolling upgrade if the old servers are not supporting this protocol. In this case the old and the new servers would form two distinct partitions until all the servers get upgraded. To support rolling upgrades too, we want to disable the MultiAddress feature by default and use the old protocol. If the user would like enable the MultiAddress feature on a 3.6.0 cluster, she/he can do it either by 1) starting the cluster from scratch (without rolling upgrade), or 2) performing a rolling upgrade without the MultiAddress feature enabled then doing a rolling restart with a new configuration where the MultiAddress feature is enabled. During the rolling restart there will be no partitions, as all the servers in the cluster now will run ZooKeeper version 3.6.0 which understands now both protocols. The changes in this patch: - introducing new config property: multiAddress.enabled, disabled by default - updating QuorumCnxManager to be able to use both protocol versions and to use the old one if MultiAddress is disabled - failing with ConfigException if the user provides multiple addresses in the config while having MultiAddress disabled - updating the existing MultiAddress related tests to enable the feature first - add some new tests - update the documentation Testing: - I ran all the unit tests - Using https://github.com/symat/zk-rolling-upgrade-test - I tested rolling upgrade from 3.5.6 - I tested rolling restart to enable the MultiAddress feature - Using https://github.com/symat/zookeeper-docker-test - I tested the MultiAddress feature by disabling some virtual interfaces and waiting for the cluster to recover Author: Mate Szalay-Beko <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]> Closes #1251 from symat/ZOOKEEPER-3720 (cherry picked from commit 3aa922c) Signed-off-by: Enrico Olivelli <[email protected]>
The multi-address feature introduced in ZOOKEEPER-3188 required changes in the Quorum protocol as we had to send all addresses in the connection initiation message to enable the receiving side to choose a reachable address in case of network failure. The new code can handle both the old and the new protocol versions to avoid 'invalid protocol' error e.g. during rolling restarts. However, the new protocol version still can not be used during rolling upgrade if the old servers are not supporting this protocol. In this case the old and the new servers would form two distinct partitions until all the servers get upgraded. To support rolling upgrades too, we want to disable the MultiAddress feature by default and use the old protocol. If the user would like enable the MultiAddress feature on a 3.6.0 cluster, she/he can do it either by 1) starting the cluster from scratch (without rolling upgrade), or 2) performing a rolling upgrade without the MultiAddress feature enabled then doing a rolling restart with a new configuration where the MultiAddress feature is enabled. During the rolling restart there will be no partitions, as all the servers in the cluster now will run ZooKeeper version 3.6.0 which understands now both protocols. The changes in this patch: - introducing new config property: multiAddress.enabled, disabled by default - updating QuorumCnxManager to be able to use both protocol versions and to use the old one if MultiAddress is disabled - failing with ConfigException if the user provides multiple addresses in the config while having MultiAddress disabled - updating the existing MultiAddress related tests to enable the feature first - add some new tests - update the documentation Testing: - I ran all the unit tests - Using https://github.com/symat/zk-rolling-upgrade-test - I tested rolling upgrade from 3.5.6 - I tested rolling restart to enable the MultiAddress feature - Using https://github.com/symat/zookeeper-docker-test - I tested the MultiAddress feature by disabling some virtual interfaces and waiting for the cluster to recover Author: Mate Szalay-Beko <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]> Closes apache#1251 from symat/ZOOKEEPER-3720
The multi-address feature introduced in ZOOKEEPER-3188 required changes in the Quorum protocol as we had to send all addresses in the connection initiation message to enable the receiving side to choose a reachable address in case of network failure. The new code can handle both the old and the new protocol versions to avoid 'invalid protocol' error e.g. during rolling restarts. However, the new protocol version still can not be used during rolling upgrade if the old servers are not supporting this protocol. In this case the old and the new servers would form two distinct partitions until all the servers get upgraded. To support rolling upgrades too, we want to disable the MultiAddress feature by default and use the old protocol. If the user would like enable the MultiAddress feature on a 3.6.0 cluster, she/he can do it either by 1) starting the cluster from scratch (without rolling upgrade), or 2) performing a rolling upgrade without the MultiAddress feature enabled then doing a rolling restart with a new configuration where the MultiAddress feature is enabled. During the rolling restart there will be no partitions, as all the servers in the cluster now will run ZooKeeper version 3.6.0 which understands now both protocols. The changes in this patch: - introducing new config property: multiAddress.enabled, disabled by default - updating QuorumCnxManager to be able to use both protocol versions and to use the old one if MultiAddress is disabled - failing with ConfigException if the user provides multiple addresses in the config while having MultiAddress disabled - updating the existing MultiAddress related tests to enable the feature first - add some new tests - update the documentation Testing: - I ran all the unit tests - Using https://github.com/symat/zk-rolling-upgrade-test - I tested rolling upgrade from 3.5.6 - I tested rolling restart to enable the MultiAddress feature - Using https://github.com/symat/zookeeper-docker-test - I tested the MultiAddress feature by disabling some virtual interfaces and waiting for the cluster to recover Author: Mate Szalay-Beko <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]> Closes apache#1251 from symat/ZOOKEEPER-3720
The multi-address feature introduced in ZOOKEEPER-3188 required changes in the Quorum protocol as we had to send all addresses in the connection initiation message to enable the receiving side to choose a reachable address in case of network failure. The new code can handle both the old and the new protocol versions to avoid 'invalid protocol' error e.g. during rolling restarts. However, the new protocol version still can not be used during rolling upgrade if the old servers are not supporting this protocol. In this case the old and the new servers would form two distinct partitions until all the servers get upgraded. To support rolling upgrades too, we want to disable the MultiAddress feature by default and use the old protocol. If the user would like enable the MultiAddress feature on a 3.6.0 cluster, she/he can do it either by 1) starting the cluster from scratch (without rolling upgrade), or 2) performing a rolling upgrade without the MultiAddress feature enabled then doing a rolling restart with a new configuration where the MultiAddress feature is enabled. During the rolling restart there will be no partitions, as all the servers in the cluster now will run ZooKeeper version 3.6.0 which understands now both protocols. The changes in this patch: - introducing new config property: multiAddress.enabled, disabled by default - updating QuorumCnxManager to be able to use both protocol versions and to use the old one if MultiAddress is disabled - failing with ConfigException if the user provides multiple addresses in the config while having MultiAddress disabled - updating the existing MultiAddress related tests to enable the feature first - add some new tests - update the documentation Testing: - I ran all the unit tests - Using https://github.com/symat/zk-rolling-upgrade-test - I tested rolling upgrade from 3.5.6 - I tested rolling restart to enable the MultiAddress feature - Using https://github.com/symat/zookeeper-docker-test - I tested the MultiAddress feature by disabling some virtual interfaces and waiting for the cluster to recover Author: Mate Szalay-Beko <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]> Closes apache#1251 from symat/ZOOKEEPER-3720
Motivation: Zookeeper 3.6 added a new option that was ability to set mulitple addersss for quorum operations. apache/zookeeper#1048 The multiaddres was not compatible with the old protocol used in ZooKeeper 3.5.x, so the option has been disabled by apache/zookeeper#1251. In apache/zookeeper#1251, the default value of `QuorumPeer.multiAddressEnabled` was changed. https://github.com/apache/zookeeper/blob/e08cc2a782982964a57651f179a468b19e2e6010/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L174-L180 Since CentralDogma directly extends `QuorumPeer`, `zookeeper.multiadress` is activated automatically and Central Dogma using Zookeeper 3.5.x can't be upgraded with rolling deployment. Modifications: - Set `multiAddressEnabled` to false when creating `EmbeddedZooKeeper` Result: - Fixed an issue where CentralDogma does not perform rolling updates due to Zookeeper invalid protocol version. - Note that this bug affects 0.61.2, 0.61.3 and 0.61.4
Motivation: Zookeeper 3.6 added a new option that was ability to set multiple adders for quorum operations. apache/zookeeper#1048 The multi address was not compatible with the old protocol used in ZooKeeper 3.5.x, so apache/zookeeper#1251 has disabled the option. During fixing incompatibility in apache/zookeeper#1251, it omitted to change the default value of `QuorumPeer.multiAddressEnabled` to `false`. https://github.com/apache/zookeeper/blob/e08cc2a782982964a57651f179a468b19e2e6010/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L174-L180 The default is correctly set to false when a Zookeeper server is used as a standalone mode. However, if a new `QuorumPeer` instance is manually created, it will remain `true.` Since CentralDogma directly extends `QuorumPeer`, `zookeeper.multiadress` is activated automatically and Central Dogma using Zookeeper 3.5.x can't be upgraded with a rolling deployment. Modifications: - Set `multiAddressEnabled` to false when creating `EmbeddedZooKeeper` Result: - Fixed an issue where CentralDogma does not perform rolling updates due to the invalid protocol version of Zookeeper. - Note that this bug affects 0.61.2, 0.61.3 and 0.61.4
The multi-address feature introduced in ZOOKEEPER-3188 required
changes in the Quorum protocol as we had to send all addresses in
the connection initiation message to enable the receiving side to
choose a reachable address in case of network failure.
The new code can handle both the old and the new protocol versions to
avoid 'invalid protocol' error e.g. during rolling restarts. However,
the new protocol version still can not be used during rolling upgrade
if the old servers are not supporting this protocol. In this case the
old and the new servers would form two distinct partitions until all
the servers get upgraded. To support rolling upgrades too, we want to
disable the MultiAddress feature by default and use the old protocol.
If the user would like enable the MultiAddress feature on a 3.6.0
cluster, she/he can do it either by 1) starting the cluster from
scratch (without rolling upgrade), or 2) performing a rolling upgrade
without the MultiAddress feature enabled then doing a rolling restart
with a new configuration where the MultiAddress feature is enabled.
During the rolling restart there will be no partitions, as all the
servers in the cluster now will run ZooKeeper version 3.6.0 which
understands now both protocols.
The changes in this patch:
by default
and to use the old one if MultiAddress is disabled
addresses in the config while having MultiAddress disabled
feature first
Testing:
interfaces and waiting for the cluster to recover