-
Notifications
You must be signed in to change notification settings - Fork 7.3k
ZOOKEEPER-3594: Ability to skip proposing requests with error transactions #1172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 498.45 KB...][JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/pom.xml to org.apache.zookeeper/parent/3.6.0-SNAPSHOT/parent-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-recipes/zookeeper-recipes-lock/pom.xml to org.apache.zookeeper/zookeeper-recipes-lock/3.6.0-SNAPSHOT/zookeeper-recipes-lock-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-contrib/pom.xml to org.apache.zookeeper/zookeeper-contrib/3.6.0-SNAPSHOT/zookeeper-contrib-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-client/pom.xml to org.apache.zookeeper/zookeeper-client/3.6.0-SNAPSHOT/zookeeper-client-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/pom.xml to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT-tests.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT-tests.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT-sources.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT-sources.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT-javadoc.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT-javadoc.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-recipes/zookeeper-recipes-queue/pom.xml to org.apache.zookeeper/zookeeper-recipes-queue/3.6.0-SNAPSHOT/zookeeper-recipes-queue-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/pom.xml to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/target/zookeeper-docs-3.6.0-SNAPSHOT.jar to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/target/zookeeper-docs-3.6.0-SNAPSHOT-tests.jar to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT-tests.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/target/zookeeper-docs-3.6.0-SNAPSHOT-sources.jar to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT-sources.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-recipes/pom.xml to org.apache.zookeeper/zookeeper-recipes/3.6.0-SNAPSHOT/zookeeper-recipes-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-client/zookeeper-client-c/pom.xml to org.apache.zookeeper/zookeeper-client-c/3.6.0-SNAPSHOT/zookeeper-client-c-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-contrib/zookeeper-contrib-rest/pom.xml to org.apache.zookeeper/zookeeper-contrib-rest/3.6.0-SNAPSHOT/zookeeper-contrib-rest-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-assembly/pom.xml to org.apache.zookeeper/zookeeper-assembly/3.6.0-SNAPSHOT/zookeeper-assembly-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-contrib/zookeeper-contrib-zooinspector/pom.xml to org.apache.zookeeper/zookeeper-contrib-zooinspector/3.6.0-SNAPSHOT/zookeeper-contrib-zooinspector-3.6.0-SNAPSHOT.pomchannel stopped[SpotBugs] Skipping execution of recorder since overall result is 'FAILURE'Setting status of 6616250 to FAILURE with url https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1704/ and message: 'FAILURE 'Using context: JenkinsMaven |
hanm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some quick comments on errors I find when trying to rebuild this branch locally...
| import org.apache.zookeeper.server.RequestProcessor; | ||
| import org.apache.zookeeper.server.ServerMetrics; | ||
| import org.apache.zookeeper.server.SyncRequestProcessor; | ||
| import org.apache.zookeeper.server.TxnLogEntry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where this is defined. Maybe in a new file that's forgot to be included as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should not be here.
| peerLastZxid = ss.getLastZxid(); | ||
|
|
||
|
|
||
| ableToSkipTxn = QuorumZooKeeperServer.isSkipTxnAvailable(learnerProtocolVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ableToSkipTxn and learnerProtocolVersion are not declared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this one, I will update this part.
| if (lastZxid > request.zxid) { | ||
| ServerMetrics.getMetrics().QUORUM_PEER_SKIP_OUT_OF_ORDER.add(1); | ||
| LOG.error("Skip request is out of order!!!"); | ||
| System.exit(ExitCode.QUORUM_PEER_SKIP_OUT_OF_ORDER.getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use ServiceUtils.requestSystemExit instead. For reference: #1147
hanm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add more review comments.
| } | ||
|
|
||
| public static boolean isSkipTxnAvailable(int learnerProtocolVersion) { | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that we always want to return true here as opposed to have a system property control the return value? Also, learnerProtocolVersion is not used.. is this intentional?
It would be also good to add a brief comment on what this does - it's not very obvious to me in first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intentional when the learner protocol version is used but I wont be adding that change to reduce the footprint of this pull request. I will refactor this in a followup commit. I will use the system variable.
| sendPacket(pkt); | ||
| } | ||
|
|
||
| synchronized void proposalSkipped(QuorumPacket qp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reuse the existing sendPacket ? The only difference is sendPacket also updates lastProposedZxid with the packet zxid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be possible. I will explore that option. One problem I see is that a skip request can override lastProposedZxid for a valid request that is in flight.
|
|
||
| long getNextZxid() { | ||
| return hzxid.incrementAndGet(); | ||
| return hzxid.get() + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not be thread safe, right? would this be a concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't be a concern as long as we keep processing requests in order inside PrepRequestProcessor. If that changes this can break but that is not likely to happen.
| return hzxid.get() + 1; | ||
| } | ||
|
|
||
| void incrementZxid() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another idea - instead of having this to increase the zxid we can instead have a decrementZxid, which will only be called if we have an error transaction. By doing that, we can keep the existing implementation of getNextZxid . This way we only decrement zxid for error transactions; for usual transactions we only call getNextZxid once (as opposed to the call both getNextZxid and incrementZxid ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a great alternative and it was my first approach but then I though we need to remember the decrement zxid which I wanted to eliminate. But that was early when I started working on this. I will take another look and see how things look now if I use decrement zxid.
| * The request will be immediately sent back to the learner if there is no pending commits | ||
| * otherwise the request will be put inside the handler's queue to wait for pending commit | ||
| * to complete before is sent so we make sure we send all requests in order. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove these comments, they are duplicate of the same comments on the SkipRequestHandler interface.
| return owner == ServerCnxn.me; | ||
| } | ||
|
|
||
| public boolean isFromFollower() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is not used.. is this needed?
| * | ||
| */ | ||
| public boolean canSkip() { | ||
| return skipRequestHandler != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this always return true? As far as I can see skipRequestHandler will never be null as it's set by setSkipRequestHandler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, this is related to learner protocol version, something outside the scope of this. This is not required to be in this pull request.
| * The request will be immediately sent back to the learner if there is no pending commits | ||
| * otherwise the request will be put inside the handler's queue to wait for pending commit | ||
| * to complete before is sent so we make sure we send all requests in order. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments can be removed (same comments already present in interface).
| * LeaderHandler commits directly in it's own commit processor. | ||
| * LearnerHandler must send SKIP message to the Learners so they can | ||
| * commit the request in their own commit processor. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same - duplicate comments with interface that can be removed.
| * that occurred right before that commit. | ||
| */ | ||
| public class LeaderHandler implements SkipRequestHandler { | ||
| private static final Logger LOG = LoggerFactory.getLogger(LeaderHandler.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger is never used.. remove?
vladimirivic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanm thanks for the through review! First time porting changes to open source. I will update the pull request with the changes after the holidays.
Ensembles that have a high write request rate could be skipping proposing requests that contain errors instead of having the quorum vote on those transactions.
For example, having a sizable write traffic that results in error transactions would be creating additional network and log disk space overhead for each of the requests that would only return errors to the client in the end (e.g. delete non-existing path, version mismatch, trying to create a node that already exists etc).
Currently, there is no such logic in the ProposalRequestProcessor, every request that comes out of PrepRequestProcessor will be proposed to the quorum.
Proposed solution workflow:
A client sends a new write request trying to setData on an non-existing node
The server accepts the request and sends it through the PrepRequestProcessor
PrepRequestProcessor detects the error and assigns the error transaction to the request
Between PrepRequestProcessor and ProposalRequestProcessor there is another processor named SkipRequestProcessor which sole responsibility is to decide will the request be forwarded or returned to the originating quorum peer (Leader or Learner).
The quorum peer waits for all previous requests to complete before the error request proceeds with echoing the error back to the client.
Requirements:
We should be conservative about the use of ZXID. If the request generates an error transaction we don't want to increment the last proposed ZXID and cause any gaps in the log.
Request that are found to be invalid should be sent directly to the origin (either to the corresponding Learner or to the Leader directly) so there is exactly one roundtrip for each request with error transaction.
All the requests must preserve its order, the changes must be backwards compatible with the Zookeeper ordering guarantees.
Challenges:
Skipping request without having them go through the proposal pipeline poses a challenge in preserving Zookeeper transaction order.
Avoiding any additional changes inside CommitProcessor if possible.
Have unified logic for all three different paths in which write requests can come through:
Via Leader placed directly into the PrepRequestProcessor
Via Follower where the request is forwarded to the leader and also passed to CommitProcessor to wait for COMMIT packet
Via Observer, where the request is forwarded to the Leader and the Observer waits for the INFORM packet
https://issues.apache.org/jira/browse/ZOOKEEPER-3594