Skip to content

[Bug]: ResiliencePipeline CircuitBreaker doesn't handle Half Open State Correctly when probing call throws unhandeled exception - half open state can never be left again #1979

@DominicUllmann

Description

@DominicUllmann

Describe the bug

We use a resilience pipeline with a circuit breaker. We distinguish between handled and unhandled exceptions. When the probe call is completed with an unhandled exception, the circuit breaker stays in half open state and no additional probe call is ever allowed.
The issue is probably in https://github.com/App-vNext/Polly/blame/10730b9f723f80559b0c115c2102db974382fdfd/src/Polly.Core/CircuitBreaker/Controller/CircuitStateController.cs#L131. The state controller only handles the case where the state before was open. But it doesn't correctly handle the case where the state was already half-open.
Based on https://github.com/App-vNext/Polly/wiki/Circuit-Breaker#half-open, a non handled exception results in staying in the half open state. A next probe call should then be allowed to either transition to closed or open, but it is not.

Expected behavior

The circuit breaker should not stay forever in half-open state after a non handled exception occured in the probe call used in the half open state. Instead a next probe call must be allowed and should then lead to either a next non handled exception, a handled exception or a success result.

Actual behavior

The circuit breaker stays forever in half open state after the probe call resulted in a non handled exception.

Steps to reproduce

Please have a look at:
https://github.com/DominicUllmann/PollyHalfOpenIssue/tree/main

The issue occurs in the unit test TestResiliencePipelineDirectly2, There we throw an unhandled exception from the probe call.
In this test, we first ensure to open the circuit breaker, then we verify that it is really open. Then we wait for more than the reset time and make a call throwing an unhandled exception (MyUnhandledException). Afterwards we wait again (which should not be necessary in half-open state) and try to send a successful call. But in this state, we can never leave half open state again, neither to open or closed.

The the unit test TestResiliencePipelineDirectly shows what happens when we use a handled exception instead. In this case everything works as expected (i.e. transition from half-open to open and then after the reset time again to closed).

Exception(s) (if any)

No response

Polly version

8.3.0

.NET Version

8.0.102

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions