Register response.closeHandler() in VertxBlockingOutput and test for clients closing connections #5451

johnaohara · 2019-11-13T19:54:25Z

There is a race cond accessing waitingForDrain in VertxBlockingOutput.awaitWriteable(), where request.response().drainHandler() can be called from a seperate thread, but access to the private boolean waitingForDrain is not guarded in request.response().drainHandler() .

I have used the same locking pattern used for request.response().exceptionHandler() and request.response().endHandler().

However, I have concerns that this could result in a deadlock if we are holding the intrinsic lock on request.connection() VertxBlockingOutput.awaitWriteable() and a call to request.response().drainHandler() in a seperate thread would block waiting for the lock.

Would it be better for waitingForDrain to be defined as volatile?

johnaohara · 2019-11-13T20:05:06Z

I think this race is causing the worker pool threads to lock in #5443

stuartwdouglas

This is already called under lock.

johnaohara · 2019-11-14T06:05:35Z

That is what I thought when I first looked, but I am assuming that the thread that calls request.response().drainHandler() will be a different thread that called VertxBlockingOutput.write(). We register the DrainHandler in awaitWriteable(), but don't invoke it in the same thread. A call by another thread to request.response().drainHandler() will not be holding the lock.

stuartwdouglas · 2019-11-14T06:07:37Z

Both threads will be holding the same lock though, there is
assert Thread.holdsLock(request.connection()); in awaitWritable()

stuartwdouglas · 2019-11-14T06:08:44Z

Actually I think isWriteable might be able to change in the background. I probably need to re-check it after the drain handler is registered.

johnaohara · 2019-11-14T06:12:22Z

The locking semantics is synchronized, they both can't hold the same lock and both progress at the same time, one will block waiting for the other to proceed at the lock boundary.

johnaohara · 2019-11-14T06:16:25Z

Although there is an assert to verify that the lock is held by the thread instantiating the new DrainHandler, there is no assert or locking within the drain handler handle() method, which has to be called from a separate thread.

johnaohara · 2019-11-14T06:29:53Z

Thinking about this, this isn't an atomicity issue, it is a visibility issue, and as I mentioned we are at risk of a deadlock with synchronized locking. I think the correct fix here is to change private volatile boolean waitingForDrain and remove the synchronization in the event handlers. If a thread that is writing has locked on request.connection(), then none of the event handlers could proceed,

johnaohara · 2019-11-14T08:07:36Z

Also, synchronzation does not make provisions for timeouts, leaving applications deadlocked if there is an error. I think a locking impl with timeouts and retries makes sense here, if for some reason the obj monitor does not receive the notification, we could time out and re-check the write queue as we have access to that through the request api request.response().writeQueueFull()

johnaohara · 2019-11-14T10:39:44Z

After further investigation, it looks like what is happening is;

When the client closes a connection without having fully read the response an exception io.vertx.core.VertxException: Connection was closed is thrown from the event loop
This exceptionHandler and the endHandler are then both called from the event loop, so context has switched from worker pool to event loop
Most worker pool threads catch the exception and thrown an IOException from awaitWriteable()

I am debugging further to map out the life cycle of the connection after an VertxException is thrown in the event loop

I think that protected Throwable throwable also needs guarding as this calls to exceptionHandler from the event loop and are not guarded by synchronized

johnaohara · 2019-11-14T12:11:40Z

@stuartwdouglas we were not registering a closeHandler for the response and notifying the worker thread when the connection is closed from client-side while there is data remaining to be written. I have updated my PR accordingly.

patriot1burke · 2019-11-14T13:49:32Z

@johnaohara I'm not seeing your deadlock scenario. You can only have deadlock if there are two separate things being locked. There are no locks being used, only synchronization wait/notify. There is only synchronization on request.connection() so its impossible to have deadlock.

What I think the possible error is that there is no synchronized block in the drain handler. Maybe this isn't true, but I thought notify had to be called within a synchronized block. Apologies for missing this

johnaohara · 2019-11-14T14:09:26Z

@johnaohara I'm not seeing your deadlock scenario. You can only have deadlock if there are two separate things being locked. There are no locks being used, only synchronization wait/notify. There is only synchronization on request.connection() so its impossible to have deadlock.

When I wrote this I had misunderstood what was happening.

What I was concerned about was a livelock scenario, where you have a worker thread obtaining a lock on request().connection() [1], and while that lock is held [2] another thread (event-loop) tries to acquire a lock on the same object [3]. My concern was in this case call to exceptionHandler(), endHandler() come from the event-loop, which is a separate thread, and the event-loop wouldn't be able to progress.

What I think the possible error is that there is no synchronized block in the drain handler. Maybe this isn't true, but I thought notify had to be called within a synchronized block. Apologies for missing this

What was missing was registering a closeHandler() that is called from the event-loop when the client closes the connection and also interrupts the worker thread waiting to write data

1 -

quarkus/extensions/resteasy/runtime/src/main/java/io/quarkus/resteasy/runtime/standalone/VertxBlockingOutput.java

Line 89 in af32598

synchronized (request.connection()) {

2 -

quarkus/extensions/resteasy/runtime/src/main/java/io/quarkus/resteasy/runtime/standalone/VertxBlockingOutput.java

Line 137 in af32598

request.connection().wait();

3 -

quarkus/extensions/resteasy/runtime/src/main/java/io/quarkus/resteasy/runtime/standalone/VertxBlockingOutput.java

Line 34 in af32598

synchronized (request.connection()) {

...steasy/runtime/src/main/java/io/quarkus/resteasy/runtime/standalone/VertxBlockingOutput.java

stuartwdouglas · 2019-11-14T20:07:20Z

I think what we actually need is #5491

johnaohara · 2019-11-14T20:14:56Z

#5491 still finishes with the worker pool threads locked.

johnaohara · 2019-11-14T20:26:26Z

@stuartwdouglas have rebased on top of #5491 and added check for a closed connection, this doesn't hang when connections are closed in the middle of writing

johnaohara · 2019-11-14T20:34:23Z

Both be218e4 and #5491 do not reset the buffer, can we guarantee that buffers won't be pooled? and we won't be leaving buffers in an invalid state?

stuartwdouglas · 2019-11-14T21:25:22Z

buffer.clear() does not reset the buffer, it is buffer.release() that does this.

It looks like if you call write with the response closed the buffer will not be freed, once it is passed into Netty though it should be safe. I am adding try/catch to the write methods to handle this.

johnaohara · 2019-11-14T21:32:56Z

buffer.release() decrements the reference count, and deallocates the buffer when the ref count reaches 0, or in the case of a pulled buffer, it will be returned to the pool. buffer.clear() set the readIndex to the writeIndex, which is in effect an empty buffer. This has the same effect of reading all the data in the buffer. I don't think we want to release the buffer at this point in the code path.

I had another branch that does not try to write the buffer to the response if the connection has been closed

stuartwdouglas · 2019-11-14T21:33:05Z

I have updated quarkus-http with these changes: quarkusio/quarkus-http@4a8fde5

Do you want to update this PR or do you want me to?

johnaohara · 2019-11-14T21:52:14Z

@stuartwdouglas have updated to throw an IOException if the client has closed the connection, and release the buffer if en exception occurs. The stack trace is a lot clearer now as well

...steasy/runtime/src/main/java/io/quarkus/resteasy/runtime/standalone/VertxBlockingOutput.java

stuartwdouglas · 2019-11-14T22:08:34Z

Can you squash the commits?

johnaohara · 2019-11-14T22:11:16Z

I can, but data.release() is not correct here, the ref count is 0, decrementing causes netty to throw a io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1 exception. I think we need to just call data.clear()

johnaohara added kind/bug Something isn't working area/resteasy-classic labels Nov 13, 2019

johnaohara requested review from patriot1burke and stuartwdouglas November 13, 2019 19:54

johnaohara self-assigned this Nov 13, 2019

johnaohara mentioned this pull request Nov 13, 2019

Thread locked after stress test #5443

Closed

stuartwdouglas suggested changes Nov 13, 2019

View reviewed changes

johnaohara force-pushed the 5443-visibility branch from c810ccd to 7160e24 Compare November 14, 2019 12:08

gsmet added the backport? label Nov 14, 2019

johnaohara changed the title ~~Fix race condition on VertxBlockingOutput.waitingForDrain~~ Register response.closeHandler() in VertxBlockingOutput and test for clients closing connections Nov 14, 2019

johnaohara requested a review from stuartwdouglas November 14, 2019 12:18

johnaohara force-pushed the 5443-visibility branch from 7160e24 to af32598 Compare November 14, 2019 12:32

stuartwdouglas reviewed Nov 14, 2019

View reviewed changes

...steasy/runtime/src/main/java/io/quarkus/resteasy/runtime/standalone/VertxBlockingOutput.java Outdated Show resolved Hide resolved

johnaohara force-pushed the 5443-visibility branch from af32598 to be218e4 Compare November 14, 2019 20:25

stuartwdouglas reviewed Nov 14, 2019

View reviewed changes

...steasy/runtime/src/main/java/io/quarkus/resteasy/runtime/standalone/VertxBlockingOutput.java Outdated Show resolved Hide resolved

johnaohara force-pushed the 5443-visibility branch from 4aa4f6f to f0c802b Compare November 14, 2019 22:37

Register a close handler and test for closed client connections

c4e8693

johnaohara force-pushed the 5443-visibility branch from f0c802b to c4e8693 Compare November 14, 2019 23:23

stuartwdouglas approved these changes Nov 15, 2019

View reviewed changes

stuartwdouglas added the triage/waiting-for-ci Ready to merge when CI successfully finishes label Nov 15, 2019

johnaohara merged commit b85aa59 into quarkusio:master Nov 15, 2019

johnaohara added this to the 1.1.0 milestone Nov 15, 2019

gsmet removed the backport? label Nov 15, 2019

gsmet modified the milestones: 1.1.0, 1.0.0.Final Nov 15, 2019

Register response.closeHandler() in VertxBlockingOutput and test for clients closing connections #5451

Register response.closeHandler() in VertxBlockingOutput and test for clients closing connections #5451

Uh oh!

Conversation

johnaohara commented Nov 13, 2019

Uh oh!

johnaohara commented Nov 13, 2019

Uh oh!

stuartwdouglas left a comment

Choose a reason for hiding this comment

Uh oh!

johnaohara commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stuartwdouglas commented Nov 14, 2019

Uh oh!

stuartwdouglas commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnaohara commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

patriot1burke commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

Uh oh!

stuartwdouglas commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

stuartwdouglas commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

stuartwdouglas commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

Uh oh!

stuartwdouglas commented Nov 14, 2019

Uh oh!

johnaohara commented Nov 14, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johnaohara commented Nov 14, 2019 •

edited

Loading

johnaohara commented Nov 14, 2019 •

edited

Loading

johnaohara commented Nov 14, 2019 •

edited

Loading

johnaohara commented Nov 14, 2019 •

edited

Loading