Skip to content

Conversation

@vladimirivic
Copy link
Contributor

Ensembles that have a high write request rate could be skipping proposing requests that contain errors instead of having the quorum vote on those transactions.

For example, having a sizable write traffic that results in error transactions would be creating additional network and log disk space overhead for each of the requests that would only return errors to the client in the end (e.g. delete non-existing path, version mismatch, trying to create a node that already exists etc).

Currently, there is no such logic in the ProposalRequestProcessor, every request that comes out of PrepRequestProcessor will be proposed to the quorum.

Proposed solution workflow:

A client sends a new write request trying to setData on an non-existing node
The server accepts the request and sends it through the PrepRequestProcessor
PrepRequestProcessor detects the error and assigns the error transaction to the request
Between PrepRequestProcessor and ProposalRequestProcessor there is another processor named SkipRequestProcessor which sole responsibility is to decide will the request be forwarded or returned to the originating quorum peer (Leader or Learner).
The quorum peer waits for all previous requests to complete before the error request proceeds with echoing the error back to the client.
Requirements:

We should be conservative about the use of ZXID. If the request generates an error transaction we don't want to increment the last proposed ZXID and cause any gaps in the log.
Request that are found to be invalid should be sent directly to the origin (either to the corresponding Learner or to the Leader directly) so there is exactly one roundtrip for each request with error transaction.
All the requests must preserve its order, the changes must be backwards compatible with the Zookeeper ordering guarantees.
Challenges:

Skipping request without having them go through the proposal pipeline poses a challenge in preserving Zookeeper transaction order.
Avoiding any additional changes inside CommitProcessor if possible.
Have unified logic for all three different paths in which write requests can come through:
Via Leader placed directly into the PrepRequestProcessor
Via Follower where the request is forwarded to the leader and also passed to CommitProcessor to wait for COMMIT packet
Via Observer, where the request is forwarded to the Leader and the Observer waits for the INFORM packet

https://issues.apache.org/jira/browse/ZOOKEEPER-3594

@asf-ci
Copy link

asf-ci commented Dec 10, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1704/

Build result: FAILURE

[...truncated 498.45 KB...][JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/pom.xml to org.apache.zookeeper/parent/3.6.0-SNAPSHOT/parent-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-recipes/zookeeper-recipes-lock/pom.xml to org.apache.zookeeper/zookeeper-recipes-lock/3.6.0-SNAPSHOT/zookeeper-recipes-lock-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-contrib/pom.xml to org.apache.zookeeper/zookeeper-contrib/3.6.0-SNAPSHOT/zookeeper-contrib-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-client/pom.xml to org.apache.zookeeper/zookeeper-client/3.6.0-SNAPSHOT/zookeeper-client-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/pom.xml to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT-tests.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT-tests.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT-sources.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT-sources.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT-javadoc.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT-javadoc.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-recipes/zookeeper-recipes-queue/pom.xml to org.apache.zookeeper/zookeeper-recipes-queue/3.6.0-SNAPSHOT/zookeeper-recipes-queue-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/pom.xml to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/target/zookeeper-docs-3.6.0-SNAPSHOT.jar to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/target/zookeeper-docs-3.6.0-SNAPSHOT-tests.jar to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT-tests.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/target/zookeeper-docs-3.6.0-SNAPSHOT-sources.jar to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT-sources.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-recipes/pom.xml to org.apache.zookeeper/zookeeper-recipes/3.6.0-SNAPSHOT/zookeeper-recipes-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-client/zookeeper-client-c/pom.xml to org.apache.zookeeper/zookeeper-client-c/3.6.0-SNAPSHOT/zookeeper-client-c-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-contrib/zookeeper-contrib-rest/pom.xml to org.apache.zookeeper/zookeeper-contrib-rest/3.6.0-SNAPSHOT/zookeeper-contrib-rest-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-assembly/pom.xml to org.apache.zookeeper/zookeeper-assembly/3.6.0-SNAPSHOT/zookeeper-assembly-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-contrib/zookeeper-contrib-zooinspector/pom.xml to org.apache.zookeeper/zookeeper-contrib-zooinspector/3.6.0-SNAPSHOT/zookeeper-contrib-zooinspector-3.6.0-SNAPSHOT.pomchannel stopped[SpotBugs] Skipping execution of recorder since overall result is 'FAILURE'Setting status of 6616250 to FAILURE with url https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1704/ and message: 'FAILURE 'Using context: JenkinsMaven

Copy link
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick comments on errors I find when trying to rebuild this branch locally...

import org.apache.zookeeper.server.RequestProcessor;
import org.apache.zookeeper.server.ServerMetrics;
import org.apache.zookeeper.server.SyncRequestProcessor;
import org.apache.zookeeper.server.TxnLogEntry;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this is defined. Maybe in a new file that's forgot to be included as part of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should not be here.

peerLastZxid = ss.getLastZxid();


ableToSkipTxn = QuorumZooKeeperServer.isSkipTxnAvailable(learnerProtocolVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ableToSkipTxn and learnerProtocolVersion are not declared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this one, I will update this part.

if (lastZxid > request.zxid) {
ServerMetrics.getMetrics().QUORUM_PEER_SKIP_OUT_OF_ORDER.add(1);
LOG.error("Skip request is out of order!!!");
System.exit(ExitCode.QUORUM_PEER_SKIP_OUT_OF_ORDER.getValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ServiceUtils.requestSystemExit instead. For reference: #1147

Copy link
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add more review comments.

}

public static boolean isSkipTxnAvailable(int learnerProtocolVersion) {
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we always want to return true here as opposed to have a system property control the return value? Also, learnerProtocolVersion is not used.. is this intentional?

It would be also good to add a brief comment on what this does - it's not very obvious to me in first glance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intentional when the learner protocol version is used but I wont be adding that change to reduce the footprint of this pull request. I will refactor this in a followup commit. I will use the system variable.

sendPacket(pkt);
}

synchronized void proposalSkipped(QuorumPacket qp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reuse the existing sendPacket ? The only difference is sendPacket also updates lastProposedZxid with the packet zxid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be possible. I will explore that option. One problem I see is that a skip request can override lastProposedZxid for a valid request that is in flight.


long getNextZxid() {
return hzxid.incrementAndGet();
return hzxid.get() + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not be thread safe, right? would this be a concern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't be a concern as long as we keep processing requests in order inside PrepRequestProcessor. If that changes this can break but that is not likely to happen.

return hzxid.get() + 1;
}

void incrementZxid() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another idea - instead of having this to increase the zxid we can instead have a decrementZxid, which will only be called if we have an error transaction. By doing that, we can keep the existing implementation of getNextZxid . This way we only decrement zxid for error transactions; for usual transactions we only call getNextZxid once (as opposed to the call both getNextZxid and incrementZxid ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great alternative and it was my first approach but then I though we need to remember the decrement zxid which I wanted to eliminate. But that was early when I started working on this. I will take another look and see how things look now if I use decrement zxid.

* The request will be immediately sent back to the learner if there is no pending commits
* otherwise the request will be put inside the handler's queue to wait for pending commit
* to complete before is sent so we make sure we send all requests in order.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove these comments, they are duplicate of the same comments on the SkipRequestHandler interface.

return owner == ServerCnxn.me;
}

public boolean isFromFollower() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not used.. is this needed?

*
*/
public boolean canSkip() {
return skipRequestHandler != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this always return true? As far as I can see skipRequestHandler will never be null as it's set by setSkipRequestHandler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, this is related to learner protocol version, something outside the scope of this. This is not required to be in this pull request.

* The request will be immediately sent back to the learner if there is no pending commits
* otherwise the request will be put inside the handler's queue to wait for pending commit
* to complete before is sent so we make sure we send all requests in order.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments can be removed (same comments already present in interface).

* LeaderHandler commits directly in it's own commit processor.
* LearnerHandler must send SKIP message to the Learners so they can
* commit the request in their own commit processor.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same - duplicate comments with interface that can be removed.

* that occurred right before that commit.
*/
public class LeaderHandler implements SkipRequestHandler {
private static final Logger LOG = LoggerFactory.getLogger(LeaderHandler.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger is never used.. remove?

Copy link
Contributor Author

@vladimirivic vladimirivic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanm thanks for the through review! First time porting changes to open source. I will update the pull request with the changes after the holidays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants