Skip to content

NRE in WorkflowConsumer after workflow has been completed with Redis deleteCompleted true #1376

@Tolstovku

Description

@Tolstovku

Describe the bug
Exception:

Error executing item 90251b0c-a1b7-4f33-8678-69ceb3027aa7 - Object reference not set to an instance of an object.
System.NullReferenceException: Object reference not set to an instance of an object.
   at WorkflowCore.Services.BackgroundTasks.WorkflowConsumer.ProcessItem(String itemId, CancellationToken cancellationToken)
   at WorkflowCore.Services.BackgroundTasks.WorkflowConsumer.ProcessItem(String itemId, CancellationToken cancellationToken)
   at WorkflowCore.Services.BackgroundTasks.QueueConsumer.ExecuteItem(String itemId, EventWaitHandle waitHandle, Activity activity)

If Redis is used for Queues and PersistenceProvider, there is a high chance of NRE happening in WorkflowConsumer after the Workflow has been completed. My best guess (which is easily reproducable) is that it happens due to the fact that if item is already being processed at the moment of WorkflowConsumer next iteration, it will be queued again. But the Workflow has been already deleted from Redis by the next iteration, so it throws.

QueueConsumer.cs:

 lock (_activeTasks)
                    {
                        hasTask = _activeTasks.ContainsKey(item);
                    }
                    if (hasTask)
                    {
                        _secondPasses.Add(item);
                        if (!EnableSecondPasses)
                            await QueueProvider.QueueWork(item, Queue); // <--- This part
                        activity?.Dispose();
                        continue;
                    }

I am not very knowledgeable of all the nuances of WFC processing, so I'm not sure why would there be a need queue item again when it is already being processed, this seems strange to me. Would be grateful for the explanation!

To Reproduce

  • Have RedisPersistenceProvider with deleteCompleted set to true.
  • Have a Workflow which has long enough last step that WorkflowConsumer is going to be ran twice during the step execution.
  • On the the next iteration WorkflowConsumer will still have completed WorkflowId in queue and will try to retrieve it from Redis, but it is already deleted, causing NRE.

Expected behavior
No NRE happening. Or no WorkflowId items in queue after workflow has been completed.

Additional context
One way to mitigate the issue is removing workflow from queue on completion by adding
_redis.ListRemove({queueName}, {workflow-id}, 0) to RedisPersistenceProvider in case of deleteComplete, but this seems to be sub-optimal solution due to iterating the whole list. Maybe there is a better way?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions