-
Notifications
You must be signed in to change notification settings - Fork 805
Fix race condition that caused inprogress connect to be canceled #2618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
src/Grpc.Net.Client/Balancer/Subchannel.cs:269
- Using Debug.Assert here may not enforce the invariant in production builds. Consider adding a runtime check that handles the unexpected case when connectionRequested is false.
Debug.Assert(connectionRequested, "Ensure that only expected state made it to this point.");
test/Grpc.Net.Client.Tests/Balancer/ConnectionManagerTests.cs:686
- [nitpick] Consider unsubscribing from the MessageLogged event or scoping the handler to the duration of the test to avoid potential interference with other tests if the event is raised multiple times.
testSink.MessageLogged += (w) =>
@@ -327,6 +327,15 @@ private async Task ConnectTransportAsync() | |||
return; | |||
} | |||
|
|||
// There is already a connect in-progress on this transport. | |||
// Don't cancel and start again as that causes queued requests waiting on the connect to fail. | |||
if (_connectContext != null && !_connectContext.Disposed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to check the file to see if this had a race condition (i.e. can _connectContext
become null
; it can't) - it isn't, but may I suggest:
if (_connectContext is { Disposed: false })
so that it is obviously not a race (i.e. single-read)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fine; very minor style thing - not a hill either of us needs to die on
Fixes #2612
A previous fix to stop deadlocks - #2601 - caused transient request errors. If a subcannel's not connected, and multiple gRPC calls queue at the same time, it's possible for a connection to be rapidly started and canceled. Although the state of the subchannel is quickly resolved, the queued calls could get an error result.
The fix is to check if a connection request is in progress later in the connection pipeline. If there is already a request then interrupt the delay if it is waiting and exit.