Add race condition fix in critical section logic #1248

andystaples · 2025-10-06T16:47:33Z

This PR addresses a very specific bug with a race condition caused by improper handling of orchestration flow.
The bug exists with this kind of setup in Durable Functions using the dotnet-isolated programming model, and having more than one input to tags:

        [Function(nameof(EntityLockNullReferenceOrchestation))]
        public static async Task<string> RunOrchestrator([OrchestrationTrigger] TaskOrchestrationContext context, string[] tags)
        {
            var tagsData = new ConcurrentQueue<Tuple<string, string>>();
            var categoriesReportsTasks = tags.Select(tag => GetOrBuildTagData(context, tag, tagsData)).ToArray();
            await Task.WhenAll(categoriesReportsTasks);
            return string.Join(", ", tagsData.Select(t => $"{t.Item1}: {t.Item2}"));
        }

        private static async Task GetOrBuildTagData(TaskOrchestrationContext context, string tag, ConcurrentQueue<Tuple<string, string>> tagsData)
        {
            var dataSet1Id = new EntityInstanceId(nameof(DummyTagCacheEntity), tag);
            var cachedData = await context.Entities.CallEntityAsync<string>(dataSet1Id, nameof(DummyTagCacheEntity.Get));
            await using (await context.Entities.LockEntitiesAsync(dataSet1Id))
            {
                await context.Entities.CallEntityAsync(dataSet1Id, nameof(DummyTagCacheEntity.Set), "Hello");
            }
            tagsData.Enqueue(new(tag, "Hello"));
        }

This is non-supported behavior, as awaiting WhenAll on tasks that are comprised of multiple Durable operations is likely to cause non-determinism exceptions, and also has issues with entity locks in particular as one orchestration should only hold one entity lock at a time.

However, this pattern today throws a NullReferenceException in OrchestrationEntityContext.RecoverLockAfterCall as RecoverLockAfterCall gets called while the lock is still pending due to the race condition, so availableLocks has not been initialized. The fix in this PR causes the orchestrator to instead throw System.InvalidOperationException: Must not enter another critical section from within a critical section. which is more helpful in identifying the race condition in the tasks.

- Fix making a specific entity locking race condition error correctly

mllab-nl · 2025-10-09T12:39:33Z

Am I the only one who feels like a test is missing ? 🙃

andystaples · 2025-10-09T17:01:25Z

Am I the only one who feels like a test is missing ? 🙃

We will want to add testing for this change, but since DurableTask.Core does not expose entity functionality (and even if it did, I don't think the orchestration APIs don't even allow the same kind of race condition here) we will add the tests in the E2E test suite in azure-functions-durable-extension.
For now, I have tested the change locally and verified that it has the intended effect

cgillum

I’m good with this change based on manual testing. Integration testing will need to be done separately in another repo.

Add race condition fix in critical section logic

6c70ed6

- Fix making a specific entity locking race condition error correctly

andystaples mentioned this pull request Oct 6, 2025

NullReferenceException in OrchestrationEntityContext.RecoverLockAfterCall Azure/azure-functions-durable-extension#3186

Closed

andystaples requested review from bachuv, cgillum, nytian, sebastianburckhardt and sophiatev October 6, 2025 17:14

sebastianburckhardt approved these changes Oct 7, 2025

View reviewed changes

cgillum approved these changes Oct 9, 2025

View reviewed changes

andystaples merged commit 00f70b7 into main Oct 9, 2025
44 checks passed

andystaples deleted the andystaples/bugfix-critical-section-race-condition branch October 9, 2025 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add race condition fix in critical section logic #1248

Add race condition fix in critical section logic #1248

Uh oh!

andystaples commented Oct 6, 2025

Uh oh!

mllab-nl commented Oct 9, 2025

Uh oh!

andystaples commented Oct 9, 2025

Uh oh!

cgillum left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add race condition fix in critical section logic #1248

Add race condition fix in critical section logic #1248

Uh oh!

Conversation

andystaples commented Oct 6, 2025

Uh oh!

mllab-nl commented Oct 9, 2025

Uh oh!

andystaples commented Oct 9, 2025

Uh oh!

cgillum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants