Skip to content

Conversation

@kezhuw
Copy link
Member

@kezhuw kezhuw commented Sep 27, 2023

Previously, sync + read could not guarantee up-to-date data as sync will not touch quorum in case of no outstanding proposals.

Though, create/setData could be used as an rescue, but it is apparently ugly and error-prone. sync fits the semantics naturally.

This pr reverts ZOOKEEPER-2137 which using setData to circumvent no quorum sync.

Since sync is a public API, so this pr bump quorum protocol version to compatible with rolling upgrade. sync will only function like a quorum operation when all forwarding followers are upgraded.

Refs: ZOOKEEPER-1675, ZOOKEEPER-2136, ZOOKEEPER-3600(#1137)

Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to have an explicit public API method for this? It seems unreliable if programmers don't know how the ZK servers are configured, and trying to dynamically set a system property prior to calling an API method doesn't seem like a great option.

@kezhuw
Copy link
Member Author

kezhuw commented Sep 29, 2023

trying to dynamically set a system property prior to calling an API method doesn't seem like a great option.

quorumSync is a server side property, client are innocent to this.

It seems unreliable if programmers don't know how the ZK servers are configured

ZooKeeper have both client API and server daemon, it is somewhat inevitable if we change server side behavior of an api. quorumSync is provided mainly for feature gate and rolling upgrade. I was thinking to bump protocol version(a.k.a. sync is a quorum only if all server are upgraded to 3.10.0) so we don't need it in rolling upgrade. I am ok to drop it if that is solved and we don't want a feature gate for this.

Wouldn't it be better to have an explicit public API method for this?

Not sure. I think sync fits this purpose naturally, otherwise we won't have to explain about sync + read.

I think the question for us should be "Is it a bug for sync + read to read dated data ?"

If it is yes, then all above questions shouldn't be issues. Otherwise, we should resort to new apis as you said, for example ZOOKEEPER-3600(#1137). I am leaning towards it is a bug, so here we are.

@ctubbsii
Copy link
Member

ctubbsii commented Sep 29, 2023

@kezhuw Sorry, I might be confused about the relationship between the proposed quorumSync parameter and the behavior that users see. I agree sync + read -> dated data should be considered a bug.

This is the situation I'm imagining, so please correct me if I'm misunderstanding something:

  • Scenario 1: user does sync + read and server side is set to quorumSync false. User sees dated data.
  • Scenario 2: user does sync + read and server side is set to quorumSync true. User sees current data.

The user has no way to know what the server's configuration is set to. In both scenarios, the user's actions are the same... they call the same APIs to sync and read. The problem I'm seeing is that the user has no way to know whether they are seeing the buggy behavior or not. So, it's an unreliable experience.

On the other hand, if there was a separate API, the user could explicitly call it:

  • Scenario 3: user does sync + read and no special server-side configuration. User sees dated data. (expected, documented in javadoc)
  • Scenario 4: user does quorumSync + read using new quorumSync API and no special server-side configuration. User sees current data. (also expected, and documented in javdoc)

In scenarios 3 and 4, the user can reliably count on the documented behavior, based on the method they call. In scenarios 1 and 2, they cannot... they have to have some insight into the server-side configuration, which they cannot know, in order to have any chance at relying on the correct behavior of sync + read.

So, I conclude that it'd be better to:

  1. Have separate public APIs so the user can rely on the behavior they expect for the API they used, OR
  2. Just fix the current sync behavior, without making it configurable, so user can rely on the behavior they expect once ZK is upgraded. They don't need to have special knowledge of how the server is configured... only that it has been upgraded to fix the bug.

@kezhuw
Copy link
Member Author

kezhuw commented Sep 29, 2023

Have separate public APIs so the user can rely on the behavior they expect for the API they used

Will this cause much confusion in world after 3.10.0 ? Does client really want to choose to "dated data" ?

Just fix the current sync behavior, without making it configurable, so user can rely on the behavior they expect once ZK is upgraded. They don't need to have special knowledge of how the server is configured... only that it has been upgraded to fix the bug.

I am positive to this approach. I think we probably are aligned to make sync a quorum operation without making it configurable.

@ctubbsii
Copy link
Member

Will this cause much confusion in world after 3.10.0 ? Does client really want to choose to "dated data" ?

No, you are probably right. I can't imagine anybody would want this. I was only thinking for consistency of current behavior. But, I don't think there's a use case for that.

I am positive to this approach. I think we probably are aligned to make sync a quorum operation without making it configurable.

💯

@kezhuw
Copy link
Member Author

kezhuw commented Sep 30, 2023

Reopen for failed tests: ZOOKEEPER-4216 and ZOOKEEPER-4512.

@kezhuw kezhuw closed this Sep 30, 2023
@kezhuw kezhuw reopened this Sep 30, 2023
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this change.

IIRC the sync() operation ensures that the server you are connected to is up-to-date and then you can read(). This works well when you write to ZK to some peer, then from another node of your application you read from a different ZK peer.

If we change what sync() does at the moment and make it more heavyweight we are going to break applications, in production, because the load on ZK will increase.
This is something that you would see only in production underload, because developers working locally won't notice the difference.
If you want a different 'sync' then we must provide a new API:

  • add a flag on the request (not sure it is doable with JUTE)
  • add a new request type

We can discuss on the ML about why you need this

// getting a quorum from all necessary configurations.
if (!p.hasAllQuorums()) {
return false;
Proposal previous = outstandingProposals.get(zxid - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is changed for:

  1. Make sure p has majority acked.
  2. Commit also preceding quorum read to guard against downgrading.

Previously, we return directly if there is preceding proposal, but now we are committing it if preceding proposal is a quorum read. So the order becomes matter.

@eolivelli
Copy link
Contributor

In any case the client must explicitly opt-im to the new sync().
Zookeeper API is very stable and changing the semantics may have bad consequences (especially at scale).

@kezhuw
Copy link
Member Author

kezhuw commented Oct 1, 2023

IIRC the sync() operation ensures that the server you are connected to is up-to-date and then you can read(). This works well when you write to ZK to some peer, then from another node of your application you read from a different ZK peer.

It is not guaranteed currently in case of leader change. QuorumSyncTest will fail if you change followersProtocolVersion < ProtocolVersion.V3_10_0 to true.

If we change what sync() does at the moment and make it more heavyweight we are going to break applications, in production, because the load on ZK will increase.
This is something that you would see only in production underload, because developers working locally won't notice the difference.

Two cents from my side.

  1. Comparing to getData, setData and other data tree operations, sync is relatively rare.
  2. This implementation will only issue a quorum operation when there is no outstanding proposal. So the cluster is probably not in a heavy load. In leader with outstanding proposals, sync will block on last proposal as before.

I think the purposes are clear when people resort to sync + read, that is up-to-date data irrespective of cluster state(e.g. leader change). Comparing to this, I think the increasing in operation latency and cluster load when cluster has no outstanding proposal are probably acceptable.

In any case the client must explicitly opt-im to the new sync().

Though, I am not positive to this road, but I guess we can resort to a per-client option, say quorumSync, in ZooKeeperBuilder(ZOOKEEPER-4697) (#2001) in case we are going this way finally. Client side sync can issue different operations based on that option.

The main doubt from me is that why people are intentionally want sync + read in case of them know quorumSync + read and the fault(not up-to-date data) of sync + read ? From my perspective, it is a bug for sync + read to read not up-to-date data.

We can discuss on the ML about why you need this

I will prepare a discussion thread in dev mailing list later.

@kezhuw
Copy link
Member Author

kezhuw commented Oct 1, 2023

I have started a discussion thread for the direction: https://lists.apache.org/thread/ogbg4sptpz56cwjbcvcpnysryr0c0pjm

Previously, `sync` + `read` could not guarantee up-to-date data as
`sync` will not touch quorum in case of no outstanding proposals.

Though, `create`/`setData` could be used as an rescue, but it is
apparently heavy, ugly and error-prone. `sync` fits the semantics
naturally.

This pr bumps the quorum protocol version to make changes compatible
with rolling upgrade. This is because `sync` is a public API. The whole
cluster must function normally in rolling upgrade. `sync` will behave
like a quorum operation once all forwarding followers are upgraded to
the new version.

This pr issues a quorum operation only when there is no outstanding
proposals, so to avoid overloading possibly heavy loading cluster. It
will increase latency in this case, but `sync` + `read` should care more
about up-to-date data.

This pr also reverts ZOOKEEPER-2137 which using `setData` to circumvent
old behavior of `sync`.

Refs: ZOOKEEPER-1675, ZOOKEEPER-2136, ZOOKEEPER-3600
@kezhuw kezhuw force-pushed the ZOOKEEPER-1675-quorum-sync branch from 50a0cb8 to 3741e09 Compare October 9, 2024 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants