Skip to content

Conversation

@symat
Copy link
Contributor

@symat symat commented Feb 12, 2020

The multi-address feature introduced in ZOOKEEPER-3188 required
changes in the Quorum protocol as we had to send all addresses in
the connection initiation message to enable the receiving side to
choose a reachable address in case of network failure.

The new code can handle both the old and the new protocol versions to
avoid 'invalid protocol' error e.g. during rolling restarts. However,
the new protocol version still can not be used during rolling upgrade
if the old servers are not supporting this protocol. In this case the
old and the new servers would form two distinct partitions until all
the servers get upgraded. To support rolling upgrades too, we want to
disable the MultiAddress feature by default and use the old protocol.

If the user would like enable the MultiAddress feature on a 3.6.0
cluster, she/he can do it either by 1) starting the cluster from
scratch (without rolling upgrade), or 2) performing a rolling upgrade
without the MultiAddress feature enabled then doing a rolling restart
with a new configuration where the MultiAddress feature is enabled.
During the rolling restart there will be no partitions, as all the
servers in the cluster now will run ZooKeeper version 3.6.0 which
understands now both protocols.

The changes in this patch:

  • introducing new config property: multiAddress.enabled, disabled
    by default
  • updating QuorumCnxManager to be able to use both protocol versions
    and to use the old one if MultiAddress is disabled
  • failing with ConfigException if the user provides multiple
    addresses in the config while having MultiAddress disabled
  • updating the existing MultiAddress related tests to enable the
    feature first
  • add some new tests
  • update the documentation

Testing:

The multi-address feature introduced in ZOOKEEPER-3188 required
changes in the Quorum protocol as we had to send all addresses in
the connection initiation message to enable the receiving side to
choose a reachable address in case of network failure.

The new code can handle both the old and the new protocol versions to
avoid 'invalid protocol' error e.g. during rolling restarts. However,
the new protocol version still can not be used during rolling upgrade
if the old servers are not supporting this protocol. In this case the
old and the new servers would form two distinct partitions until all
the servers get upgraded. To support rolling upgrades too, we want to
disable the MultiAddress feature by default and use the old protocol.

If the user would like enable the MultiAddress feature on a 3.6.0
cluster, she/he can do it either by 1) starting the cluster from
scratch (without rolling upgrade), or 2) performing a rolling upgrade
without the MultiAddress feature enabled then doing a rolling restart
with a new configuration where the MultiAddress feature is enabled.
During the rolling restart there will be no partitions, as all the
servers in the cluster now will run ZooKeeper version 3.6.0 which
understands now both protocols.

The changes in this patch:
- introducing new config property: multiAddress.enabled, disabled
by default
- updating QuorumCnxManager to be able to use both protocol versions
and to use the old one if MultiAddress is disabled
- failing with ConfigException if the user provides multiple
addresses in the config while having MultiAddress disabled
- updating the existing MultiAddress related tests to enable the
feature first
- add some new tests
- update the documentation

Testing:
- I ran all the unit tests
- Using https://github.com/symat/zk-rolling-upgrade-test
  - I tested rolling upgrade from 3.5.6
  - I tested rolling restart to enable the MultiAddress feature
- Using https://github.com/symat/zookeeper-docker-test
  - I tested the MultiAddress feature by disabling some virtual
    interfaces and waiting for the cluster to recover
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is like what I was expecting.
I left a couple of minor comments

thank you @symat

with this patch the first problem found my @phunt in 3.6.0rc0 wouldn't be discovered.

(e.g. the zk/[email protected] client principal will be authenticated in ZooKeeper as zk/myhost)
Default: false

* *multiAddress.enable* :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiAddress.enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, thanks!

checkIfZooKeeperQuorumWorks(newQuorumConfig);
}

@Test(expected = KeeperException.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this way of testing for an expected exception is too broad, you don't know where KeeperException has been thrown.
I suggest to push the assertion around the ReconfigTest.reconfig call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it's nicer that way indeed, I will fix it

Default: false

* *multiAddress.enable* :
(Java system property: **zookeeper.multiAddress.enable**)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, missing the d (internet has ruined me, no pun intended)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, thanks. :)

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.
I will do some real world testing soon.

if (senderWorkerMap.get(sid) != null) {
LOG.debug("There is a connection already for server {}", sid);
if (electionAddr.size() > 1 && self.isMultiAddressReachabilityCheckEnabled()) {
if (electionAddr.size() > 1 && self.isMultiAddressEnabled() && self.isMultiAddressReachabilityCheckEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need the "enabled" check here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not do any harm though... move it to the beginning of if please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, actually if multi-address is disabled, then we can't have more than one address anyway. Still, I think it is good to have the extra condition there (if for nothing else, maybe just to make the code easier to understand)
I will move it to the beginning.

@symat
Copy link
Contributor Author

symat commented Feb 12, 2020

with this patch the first problem found my @phunt in 3.6.0rc0 wouldn't be discovered.

this is true... still (for the overall quality) I think it is good that both issues (ICMP and the expected behavior in rolling restarts) were raised. I am just sad that we haven't deal with these during the development of ZOOKEEPER-3188.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 lgtm.
Finished the rolling upgrade test from 3.5.6, it works fine with this patch.

*/
public static final long PROTOCOL_VERSION = -65535L;
// the following protocol version was sent in every connection initiation message since ZOOKEEPER-2186 released in 3.4.7
public static final long PROTOCOL_VERSION_3_4_7 = -65536L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this constant PROTOCOL_VERSION_V1
and the other one PROTOCOL_VERSION_V2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I named this way was that I am not sure if we won't need to add other previous versions later (e.g. to support rolling restart from pre-3.4.7 versions). But giving a second thought, I like your proposal better. These are just constants, we can rename them later in any ways. I will change this in the next commit.

// feature is enabled. During rolling upgrade, we must make sure that all the servers can
// understand the protocol version we use to avoid multiple partitions. see ZOOKEEPER-3720
long protocolVersion = self.isMultiAddressEnabled() ? PROTOCOL_VERSION_3_6_0_MULTI_ADDRESS : PROTOCOL_VERSION_3_4_7;
dout.writeLong(protocolVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of new protocol (v2) I would like to spend here a 64bit long of 'flags' (filled with zeros in 3.6.0) in order to introduce a more futureproof 'feature detection' that is better than using protocol version numbers.

We can then add 'multiaddress' as an enabled feature.

We should do more reasoning and design steps but this way of coding the initial handshake is quite common and it will open up the way for future improvements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I really like this idea with the 'feature flags', I will change accordingly.

The only constraint is that the version number must be negative. I think I saw in the git history, that some years ago, when the protocol version was not introduced yet, this was the way how someone solved the introduction of versioning. The old code sent some positive sid or message size, and the new code sent a negative version number. This way enforcing the old code to break and throw away the message, while the new code was able to verify if it got the message in the new format based on the negative number. But I need to double-check this tomorrow, maybe I remember wrong.

And also there can be maybe 1-2 negative numbers already used as protocol versions, so I will check which bits are still available.

Although I think it is enough to support the rolling upgrade from the first stable 3.4 release, which can simplify things, and we don't need to go back in time in the code too much.

Copy link
Contributor Author

@symat symat Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I did some deeper digging in the git history, this is what I found:

The different version numbers / behaviors we had before:

  • protocol version originally introduced by ZOOKEEPER-107 in 3.5.0
  • before 3.5.0, always a positive SID was sent in the beginning of the connection initiation message and there were no address information there. Even 3.4.14 still only sends a positive SID.
  • since ZOOKEEPER-107 (in 3.5.0+): version -65536L (0xffff0000) was sent, and on the receiver side:
    • if any positive (incl. 0) Long number received, then accept the number as SID and get the address from the own config
    • if -65536L was received, then parsing the message with the new format and use the address provided there
    • for any other negative numbers, we fail
  • in ZOOKEEPER-1633 (3.4.7), the receiver logic was making more clever to expect the SID as the second Long in the message, if the first Long was negative. So from 3.4.7 all the 3.4 servers can at least not die due to the new message formats, and rolling upgrades become possible without multiple partitions. However, all 3.4.X still sends only the positive SID and will not use any address information from the initial message (just reads the SID).

This also means, that we can choose any negative number as new protocol version for the MultiAddress feature, and then the rolling upgrade from 3.4.7-3.4.14 to 3.6.0 could work theoretically even with MultiAddress enabled, as long as we always send the SID right after the protocol version.

But the value of the protocol version is actually checked in 3.5.0, so we definitely need to disable MultiAddress to be able to get the rolling upgrade to work from 3.5.

Long story short: we only have a single protocol version, -65536 (0xffff0000) as of now, plus the case when positive numbers are sent. The -65535 (0xffff0001) seems to me a good protocol version candidate for the multiaddress compatible protocol, since it is also only a single bit difference between the two. So if later we will introduce the 'feature flags' we can already say that the last bit represented the multiaddress feature.

What do you think? (maybe I misunderstood your proposal here)

But I just don't know how this whole thing should look in the later releases. Also this whole protocol version is now only about the initial message. We don't know if there is any other incompatibility between the other messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eolivelli what do you think?

should we add the 'feature flags' implementation here in 3.6.0? Or is it enough if we choose a new optional version number that differs only in a single bit to the current version number, so that we are keeping this possibility of 'feature flags' open for the future? (currently we have only a single 'feature' anyway)

I am not sure if it worths to push for a more generic solution, given that we don't really know yet how to design this in the future.

@ztzg
Copy link
Contributor

ztzg commented Feb 12, 2020

LGTM! (I reviewed the code, but did not do any testing.)

@symat
Copy link
Contributor Author

symat commented Feb 13, 2020

@ztzg thanks for checking!

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with current patch.
We have to think more about such feature flag or whatever will be the best option

@eolivelli
Copy link
Contributor

eolivelli commented Feb 14, 2020

@anmolnar @nkalmar are you okay with this final form?
If so please merge to branch-3.6

I will cut a new RC during the weekend
I will take care about 3.6.0 branch, I will port more patches

Copy link
Contributor

@nkalmar nkalmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks Máté!

@nkalmar
Copy link
Contributor

nkalmar commented Feb 15, 2020

retest maven please

@nkalmar
Copy link
Contributor

nkalmar commented Feb 15, 2020

retest this please

@eolivelli eolivelli closed this in 3aa922c Feb 16, 2020
eolivelli pushed a commit that referenced this pull request Feb 16, 2020
The multi-address feature introduced in ZOOKEEPER-3188 required
changes in the Quorum protocol as we had to send all addresses in
the connection initiation message to enable the receiving side to
choose a reachable address in case of network failure.

The new code can handle both the old and the new protocol versions to
avoid 'invalid protocol' error e.g. during rolling restarts. However,
the new protocol version still can not be used during rolling upgrade
if the old servers are not supporting this protocol. In this case the
old and the new servers would form two distinct partitions until all
the servers get upgraded. To support rolling upgrades too, we want to
disable the MultiAddress feature by default and use the old protocol.

If the user would like enable the MultiAddress feature on a 3.6.0
cluster, she/he can do it either by 1) starting the cluster from
scratch (without rolling upgrade), or 2) performing a rolling upgrade
without the MultiAddress feature enabled then doing a rolling restart
with a new configuration where the MultiAddress feature is enabled.
During the rolling restart there will be no partitions, as all the
servers in the cluster now will run ZooKeeper version 3.6.0 which
understands now both protocols.

The changes in this patch:
- introducing new config property: multiAddress.enabled, disabled
by default
- updating QuorumCnxManager to be able to use both protocol versions
and to use the old one if MultiAddress is disabled
- failing with ConfigException if the user provides multiple
addresses in the config while having MultiAddress disabled
- updating the existing MultiAddress related tests to enable the
feature first
- add some new tests
- update the documentation

Testing:
- I ran all the unit tests
- Using https://github.com/symat/zk-rolling-upgrade-test
  - I tested rolling upgrade from 3.5.6
  - I tested rolling restart to enable the MultiAddress feature
- Using https://github.com/symat/zookeeper-docker-test
  - I tested the MultiAddress feature by disabling some virtual
    interfaces and waiting for the cluster to recover

Author: Mate Szalay-Beko <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]>

Closes #1251 from symat/ZOOKEEPER-3720

(cherry picked from commit 3aa922c)
Signed-off-by: Enrico Olivelli <[email protected]>
asfgit pushed a commit that referenced this pull request Feb 16, 2020
The multi-address feature introduced in ZOOKEEPER-3188 required
changes in the Quorum protocol as we had to send all addresses in
the connection initiation message to enable the receiving side to
choose a reachable address in case of network failure.

The new code can handle both the old and the new protocol versions to
avoid 'invalid protocol' error e.g. during rolling restarts. However,
the new protocol version still can not be used during rolling upgrade
if the old servers are not supporting this protocol. In this case the
old and the new servers would form two distinct partitions until all
the servers get upgraded. To support rolling upgrades too, we want to
disable the MultiAddress feature by default and use the old protocol.

If the user would like enable the MultiAddress feature on a 3.6.0
cluster, she/he can do it either by 1) starting the cluster from
scratch (without rolling upgrade), or 2) performing a rolling upgrade
without the MultiAddress feature enabled then doing a rolling restart
with a new configuration where the MultiAddress feature is enabled.
During the rolling restart there will be no partitions, as all the
servers in the cluster now will run ZooKeeper version 3.6.0 which
understands now both protocols.

The changes in this patch:
- introducing new config property: multiAddress.enabled, disabled
by default
- updating QuorumCnxManager to be able to use both protocol versions
and to use the old one if MultiAddress is disabled
- failing with ConfigException if the user provides multiple
addresses in the config while having MultiAddress disabled
- updating the existing MultiAddress related tests to enable the
feature first
- add some new tests
- update the documentation

Testing:
- I ran all the unit tests
- Using https://github.com/symat/zk-rolling-upgrade-test
  - I tested rolling upgrade from 3.5.6
  - I tested rolling restart to enable the MultiAddress feature
- Using https://github.com/symat/zookeeper-docker-test
  - I tested the MultiAddress feature by disabling some virtual
    interfaces and waiting for the cluster to recover

Author: Mate Szalay-Beko <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]>

Closes #1251 from symat/ZOOKEEPER-3720

(cherry picked from commit 3aa922c)
Signed-off-by: Enrico Olivelli <[email protected]>
stickyhipp pushed a commit to stickyhipp/zookeeper that referenced this pull request Aug 19, 2020
The multi-address feature introduced in ZOOKEEPER-3188 required
changes in the Quorum protocol as we had to send all addresses in
the connection initiation message to enable the receiving side to
choose a reachable address in case of network failure.

The new code can handle both the old and the new protocol versions to
avoid 'invalid protocol' error e.g. during rolling restarts. However,
the new protocol version still can not be used during rolling upgrade
if the old servers are not supporting this protocol. In this case the
old and the new servers would form two distinct partitions until all
the servers get upgraded. To support rolling upgrades too, we want to
disable the MultiAddress feature by default and use the old protocol.

If the user would like enable the MultiAddress feature on a 3.6.0
cluster, she/he can do it either by 1) starting the cluster from
scratch (without rolling upgrade), or 2) performing a rolling upgrade
without the MultiAddress feature enabled then doing a rolling restart
with a new configuration where the MultiAddress feature is enabled.
During the rolling restart there will be no partitions, as all the
servers in the cluster now will run ZooKeeper version 3.6.0 which
understands now both protocols.

The changes in this patch:
- introducing new config property: multiAddress.enabled, disabled
by default
- updating QuorumCnxManager to be able to use both protocol versions
and to use the old one if MultiAddress is disabled
- failing with ConfigException if the user provides multiple
addresses in the config while having MultiAddress disabled
- updating the existing MultiAddress related tests to enable the
feature first
- add some new tests
- update the documentation

Testing:
- I ran all the unit tests
- Using https://github.com/symat/zk-rolling-upgrade-test
  - I tested rolling upgrade from 3.5.6
  - I tested rolling restart to enable the MultiAddress feature
- Using https://github.com/symat/zookeeper-docker-test
  - I tested the MultiAddress feature by disabling some virtual
    interfaces and waiting for the cluster to recover

Author: Mate Szalay-Beko <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]>

Closes apache#1251 from symat/ZOOKEEPER-3720
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Aug 31, 2022
The multi-address feature introduced in ZOOKEEPER-3188 required
changes in the Quorum protocol as we had to send all addresses in
the connection initiation message to enable the receiving side to
choose a reachable address in case of network failure.

The new code can handle both the old and the new protocol versions to
avoid 'invalid protocol' error e.g. during rolling restarts. However,
the new protocol version still can not be used during rolling upgrade
if the old servers are not supporting this protocol. In this case the
old and the new servers would form two distinct partitions until all
the servers get upgraded. To support rolling upgrades too, we want to
disable the MultiAddress feature by default and use the old protocol.

If the user would like enable the MultiAddress feature on a 3.6.0
cluster, she/he can do it either by 1) starting the cluster from
scratch (without rolling upgrade), or 2) performing a rolling upgrade
without the MultiAddress feature enabled then doing a rolling restart
with a new configuration where the MultiAddress feature is enabled.
During the rolling restart there will be no partitions, as all the
servers in the cluster now will run ZooKeeper version 3.6.0 which
understands now both protocols.

The changes in this patch:
- introducing new config property: multiAddress.enabled, disabled
by default
- updating QuorumCnxManager to be able to use both protocol versions
and to use the old one if MultiAddress is disabled
- failing with ConfigException if the user provides multiple
addresses in the config while having MultiAddress disabled
- updating the existing MultiAddress related tests to enable the
feature first
- add some new tests
- update the documentation

Testing:
- I ran all the unit tests
- Using https://github.com/symat/zk-rolling-upgrade-test
  - I tested rolling upgrade from 3.5.6
  - I tested rolling restart to enable the MultiAddress feature
- Using https://github.com/symat/zookeeper-docker-test
  - I tested the MultiAddress feature by disabling some virtual
    interfaces and waiting for the cluster to recover

Author: Mate Szalay-Beko <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]>

Closes apache#1251 from symat/ZOOKEEPER-3720
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Sep 3, 2022
The multi-address feature introduced in ZOOKEEPER-3188 required
changes in the Quorum protocol as we had to send all addresses in
the connection initiation message to enable the receiving side to
choose a reachable address in case of network failure.

The new code can handle both the old and the new protocol versions to
avoid 'invalid protocol' error e.g. during rolling restarts. However,
the new protocol version still can not be used during rolling upgrade
if the old servers are not supporting this protocol. In this case the
old and the new servers would form two distinct partitions until all
the servers get upgraded. To support rolling upgrades too, we want to
disable the MultiAddress feature by default and use the old protocol.

If the user would like enable the MultiAddress feature on a 3.6.0
cluster, she/he can do it either by 1) starting the cluster from
scratch (without rolling upgrade), or 2) performing a rolling upgrade
without the MultiAddress feature enabled then doing a rolling restart
with a new configuration where the MultiAddress feature is enabled.
During the rolling restart there will be no partitions, as all the
servers in the cluster now will run ZooKeeper version 3.6.0 which
understands now both protocols.

The changes in this patch:
- introducing new config property: multiAddress.enabled, disabled
by default
- updating QuorumCnxManager to be able to use both protocol versions
and to use the old one if MultiAddress is disabled
- failing with ConfigException if the user provides multiple
addresses in the config while having MultiAddress disabled
- updating the existing MultiAddress related tests to enable the
feature first
- add some new tests
- update the documentation

Testing:
- I ran all the unit tests
- Using https://github.com/symat/zk-rolling-upgrade-test
  - I tested rolling upgrade from 3.5.6
  - I tested rolling restart to enable the MultiAddress feature
- Using https://github.com/symat/zookeeper-docker-test
  - I tested the MultiAddress feature by disabling some virtual
    interfaces and waiting for the cluster to recover

Author: Mate Szalay-Beko <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Norbert Kalmar <[email protected]>, Andor Molnar <[email protected]>

Closes apache#1251 from symat/ZOOKEEPER-3720
ikhoon added a commit to ikhoon/centraldogma that referenced this pull request Jul 10, 2023
Motivation:

Zookeeper 3.6 added a new option that was ability to set mulitple
addersss for quorum operations. apache/zookeeper#1048
The multiaddres was not compatible with the old protocol used in
ZooKeeper 3.5.x, so the option has been disabled by apache/zookeeper#1251.

In apache/zookeeper#1251, the default value of
`QuorumPeer.multiAddressEnabled` was changed.
https://github.com/apache/zookeeper/blob/e08cc2a782982964a57651f179a468b19e2e6010/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L174-L180
Since CentralDogma directly extends `QuorumPeer`, `zookeeper.multiadress`
is activated automatically and Central Dogma using Zookeeper 3.5.x
can't be upgraded with rolling deployment.

Modifications:

- Set `multiAddressEnabled` to false when creating `EmbeddedZooKeeper`

Result:

- Fixed an issue where CentralDogma does not perform rolling updates due to Zookeeper invalid protocol version.
  - Note that this bug affects 0.61.2, 0.61.3 and 0.61.4
trustin pushed a commit to line/centraldogma that referenced this pull request Jul 17, 2023
Motivation:

Zookeeper 3.6 added a new option that was ability to set multiple adders
for quorum operations. apache/zookeeper#1048 The
multi address was not compatible with the old protocol used in ZooKeeper
3.5.x, so apache/zookeeper#1251 has disabled the
option.

During fixing incompatibility in
apache/zookeeper#1251, it omitted to change the
default value of `QuorumPeer.multiAddressEnabled` to `false`.

https://github.com/apache/zookeeper/blob/e08cc2a782982964a57651f179a468b19e2e6010/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L174-L180
The default is correctly set to false when a Zookeeper server is used as
a standalone mode. However, if a new `QuorumPeer` instance is manually
created, it will remain `true.`

Since CentralDogma directly extends `QuorumPeer`,
`zookeeper.multiadress` is activated automatically and Central Dogma
using Zookeeper 3.5.x can't be upgraded with a rolling deployment.

Modifications:

- Set `multiAddressEnabled` to false when creating `EmbeddedZooKeeper`

Result:

- Fixed an issue where CentralDogma does not perform rolling updates due
to the invalid protocol version of Zookeeper.
  - Note that this bug affects 0.61.2, 0.61.3 and 0.61.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants