ZOOKEEPER-1675: Make sync a quorum operation #2069

kezhuw · 2023-09-27T08:38:50Z

Previously, sync + read could not guarantee up-to-date data as sync will not touch quorum in case of no outstanding proposals.

Though, create/setData could be used as an rescue, but it is apparently ugly and error-prone. sync fits the semantics naturally.

This pr reverts ZOOKEEPER-2137 which using setData to circumvent no quorum sync.

Since sync is a public API, so this pr bump quorum protocol version to compatible with rolling upgrade. sync will only function like a quorum operation when all forwarding followers are upgraded.

Refs: ZOOKEEPER-1675, ZOOKEEPER-2136, ZOOKEEPER-3600(#1137)

ctubbsii

Wouldn't it be better to have an explicit public API method for this? It seems unreliable if programmers don't know how the ZK servers are configured, and trying to dynamically set a system property prior to calling an API method doesn't seem like a great option.

kezhuw · 2023-09-29T02:23:26Z

trying to dynamically set a system property prior to calling an API method doesn't seem like a great option.

quorumSync is a server side property, client are innocent to this.

It seems unreliable if programmers don't know how the ZK servers are configured

ZooKeeper have both client API and server daemon, it is somewhat inevitable if we change server side behavior of an api. quorumSync is provided mainly for feature gate and rolling upgrade. I was thinking to bump protocol version(a.k.a. sync is a quorum only if all server are upgraded to 3.10.0) so we don't need it in rolling upgrade. I am ok to drop it if that is solved and we don't want a feature gate for this.

Wouldn't it be better to have an explicit public API method for this?

Not sure. I think sync fits this purpose naturally, otherwise we won't have to explain about sync + read.

I think the question for us should be "Is it a bug for sync + read to read dated data ?"

If it is yes, then all above questions shouldn't be issues. Otherwise, we should resort to new apis as you said, for example ZOOKEEPER-3600(#1137). I am leaning towards it is a bug, so here we are.

ctubbsii · 2023-09-29T02:46:24Z

@kezhuw Sorry, I might be confused about the relationship between the proposed quorumSync parameter and the behavior that users see. I agree sync + read -> dated data should be considered a bug.

This is the situation I'm imagining, so please correct me if I'm misunderstanding something:

Scenario 1: user does sync + read and server side is set to quorumSync false. User sees dated data.
Scenario 2: user does sync + read and server side is set to quorumSync true. User sees current data.

The user has no way to know what the server's configuration is set to. In both scenarios, the user's actions are the same... they call the same APIs to sync and read. The problem I'm seeing is that the user has no way to know whether they are seeing the buggy behavior or not. So, it's an unreliable experience.

On the other hand, if there was a separate API, the user could explicitly call it:

Scenario 3: user does sync + read and no special server-side configuration. User sees dated data. (expected, documented in javadoc)
Scenario 4: user does quorumSync + read using new quorumSync API and no special server-side configuration. User sees current data. (also expected, and documented in javdoc)

In scenarios 3 and 4, the user can reliably count on the documented behavior, based on the method they call. In scenarios 1 and 2, they cannot... they have to have some insight into the server-side configuration, which they cannot know, in order to have any chance at relying on the correct behavior of sync + read.

So, I conclude that it'd be better to:

Have separate public APIs so the user can rely on the behavior they expect for the API they used, OR
Just fix the current sync behavior, without making it configurable, so user can rely on the behavior they expect once ZK is upgraded. They don't need to have special knowledge of how the server is configured... only that it has been upgraded to fix the bug.

kezhuw · 2023-09-29T03:00:53Z

Have separate public APIs so the user can rely on the behavior they expect for the API they used

Will this cause much confusion in world after 3.10.0 ? Does client really want to choose to "dated data" ?

Just fix the current sync behavior, without making it configurable, so user can rely on the behavior they expect once ZK is upgraded. They don't need to have special knowledge of how the server is configured... only that it has been upgraded to fix the bug.

I am positive to this approach. I think we probably are aligned to make sync a quorum operation without making it configurable.

ctubbsii · 2023-09-29T03:37:58Z

Will this cause much confusion in world after 3.10.0 ? Does client really want to choose to "dated data" ?

No, you are probably right. I can't imagine anybody would want this. I was only thinking for consistency of current behavior. But, I don't think there's a use case for that.

I am positive to this approach. I think we probably are aligned to make sync a quorum operation without making it configurable.

💯

kezhuw · 2023-09-30T03:13:21Z

Reopen for failed tests: ZOOKEEPER-4216 and ZOOKEEPER-4512.

eolivelli

I am not sure about this change.

IIRC the sync() operation ensures that the server you are connected to is up-to-date and then you can read(). This works well when you write to ZK to some peer, then from another node of your application you read from a different ZK peer.

If we change what sync() does at the moment and make it more heavyweight we are going to break applications, in production, because the load on ZK will increase.
This is something that you would see only in production underload, because developers working locally won't notice the difference.
If you want a different 'sync' then we must provide a new API:

add a flag on the request (not sure it is doable with JUTE)
add a new request type

We can discuss on the ML about why you need this

eolivelli · 2023-10-01T06:15:56Z

zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java

-        // getting a quorum from all necessary configurations.
-        if (!p.hasAllQuorums()) {
-            return false;
+        Proposal previous = outstandingProposals.get(zxid - 1);


This change seems unrelated

It is changed for:

Make sure p has majority acked.

Commit also preceding quorum read to guard against downgrading.

Previously, we return directly if there is preceding proposal, but now we are committing it if preceding proposal is a quorum read. So the order becomes matter.

eolivelli · 2023-10-01T06:25:52Z

In any case the client must explicitly opt-im to the new sync().
Zookeeper API is very stable and changing the semantics may have bad consequences (especially at scale).

kezhuw · 2023-10-01T08:30:58Z

IIRC the sync() operation ensures that the server you are connected to is up-to-date and then you can read(). This works well when you write to ZK to some peer, then from another node of your application you read from a different ZK peer.

It is not guaranteed currently in case of leader change. QuorumSyncTest will fail if you change followersProtocolVersion < ProtocolVersion.V3_10_0 to true.

If we change what sync() does at the moment and make it more heavyweight we are going to break applications, in production, because the load on ZK will increase.
This is something that you would see only in production underload, because developers working locally won't notice the difference.

Two cents from my side.

Comparing to getData, setData and other data tree operations, sync is relatively rare.
This implementation will only issue a quorum operation when there is no outstanding proposal. So the cluster is probably not in a heavy load. In leader with outstanding proposals, sync will block on last proposal as before.

I think the purposes are clear when people resort to sync + read, that is up-to-date data irrespective of cluster state(e.g. leader change). Comparing to this, I think the increasing in operation latency and cluster load when cluster has no outstanding proposal are probably acceptable.

In any case the client must explicitly opt-im to the new sync().

Though, I am not positive to this road, but I guess we can resort to a per-client option, say quorumSync, in ZooKeeperBuilder(ZOOKEEPER-4697) (#2001) in case we are going this way finally. Client side sync can issue different operations based on that option.

The main doubt from me is that why people are intentionally want sync + read in case of them know quorumSync + read and the fault(not up-to-date data) of sync + read ? From my perspective, it is a bug for sync + read to read not up-to-date data.

We can discuss on the ML about why you need this

I will prepare a discussion thread in dev mailing list later.

kezhuw · 2023-10-01T10:32:54Z

I have started a discussion thread for the direction: https://lists.apache.org/thread/ogbg4sptpz56cwjbcvcpnysryr0c0pjm

Previously, `sync` + `read` could not guarantee up-to-date data as `sync` will not touch quorum in case of no outstanding proposals. Though, `create`/`setData` could be used as an rescue, but it is apparently heavy, ugly and error-prone. `sync` fits the semantics naturally. This pr bumps the quorum protocol version to make changes compatible with rolling upgrade. This is because `sync` is a public API. The whole cluster must function normally in rolling upgrade. `sync` will behave like a quorum operation once all forwarding followers are upgraded to the new version. This pr issues a quorum operation only when there is no outstanding proposals, so to avoid overloading possibly heavy loading cluster. It will increase latency in this case, but `sync` + `read` should care more about up-to-date data. This pr also reverts ZOOKEEPER-2137 which using `setData` to circumvent old behavior of `sync`. Refs: ZOOKEEPER-1675, ZOOKEEPER-2136, ZOOKEEPER-3600

kezhuw mentioned this pull request Sep 27, 2023

ZOOKEEPER-3594: Don't propose error transactions #2070

Open

ctubbsii reviewed Sep 29, 2023

View reviewed changes

kezhuw closed this Sep 30, 2023

kezhuw reopened this Sep 30, 2023

eolivelli requested changes Oct 1, 2023

View reviewed changes

ztzg force-pushed the master branch from 1c60545 to e2070be Compare October 3, 2023 12:57

kezhuw force-pushed the ZOOKEEPER-1675-quorum-sync branch from bafbb2e to 50a0cb8 Compare October 9, 2024 13:53

kezhuw force-pushed the ZOOKEEPER-1675-quorum-sync branch from 50a0cb8 to 3741e09 Compare October 9, 2024 15:07

ZOOKEEPER-1675: Make sync a quorum operation #2069

Are you sure you want to change the base?

ZOOKEEPER-1675: Make sync a quorum operation #2069

Uh oh!

Conversation

kezhuw commented Sep 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ctubbsii left a comment

Choose a reason for hiding this comment

Uh oh!

kezhuw commented Sep 29, 2023

Uh oh!

ctubbsii commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kezhuw commented Sep 29, 2023

Uh oh!

ctubbsii commented Sep 29, 2023

Uh oh!

kezhuw commented Sep 30, 2023

Uh oh!

eolivelli left a comment

Choose a reason for hiding this comment

Uh oh!

eolivelli Oct 1, 2023

Choose a reason for hiding this comment

Uh oh!

kezhuw Oct 1, 2023

Choose a reason for hiding this comment

Uh oh!

eolivelli commented Oct 1, 2023

Uh oh!

kezhuw commented Oct 1, 2023

Uh oh!

kezhuw commented Oct 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kezhuw commented Sep 27, 2023 •

edited

Loading

ctubbsii commented Sep 29, 2023 •

edited

Loading