Skip to content

Fix race condition that caused inprogress connect to be canceled #2618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 9, 2025

Conversation

JamesNK
Copy link
Member

@JamesNK JamesNK commented Apr 9, 2025

Fixes #2612

A previous fix to stop deadlocks - #2601 - caused transient request errors. If a subcannel's not connected, and multiple gRPC calls queue at the same time, it's possible for a connection to be rapidly started and canceled. Although the state of the subchannel is quickly resolved, the queued calls could get an error result.

The fix is to check if a connection request is in progress later in the connection pipeline. If there is already a request then interrupt the delay if it is waiting and exit.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

src/Grpc.Net.Client/Balancer/Subchannel.cs:269

  • Using Debug.Assert here may not enforce the invariant in production builds. Consider adding a runtime check that handles the unexpected case when connectionRequested is false.
Debug.Assert(connectionRequested, "Ensure that only expected state made it to this point.");

test/Grpc.Net.Client.Tests/Balancer/ConnectionManagerTests.cs:686

  • [nitpick] Consider unsubscribing from the MessageLogged event or scoping the handler to the duration of the test to avoid potential interference with other tests if the event is raised multiple times.
testSink.MessageLogged += (w) =>

@@ -327,6 +327,15 @@ private async Task ConnectTransportAsync()
return;
}

// There is already a connect in-progress on this transport.
// Don't cancel and start again as that causes queued requests waiting on the connect to fail.
if (_connectContext != null && !_connectContext.Disposed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to check the file to see if this had a race condition (i.e. can _connectContext become null; it can't) - it isn't, but may I suggest:

if (_connectContext is { Disposed: false })

so that it is obviously not a race (i.e. single-read)?

Copy link
Contributor

@mgravell mgravell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine; very minor style thing - not a hill either of us needs to die on

@JamesNK JamesNK merged commit 6f03f0e into grpc:master Apr 9, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TaskCanceledException when reconnecting after disconnect
2 participants