[FS-280753] - redis lock enabled for decide method #78

akashvenkatesan0 · 2025-06-11T09:18:11Z

Pull Request type

Bugfix
Feature
Refactoring (no functional changes, no api changes)
Build related changes (Please run ./gradlew generateLock saveLock to refresh dependencies)
WHOSUSING.md
Other (please describe):

NOTE: Please remember to run ./gradlew spotlessApply to fix any format violations.

Changes in this PR

Describe the new behavior from this PR, and why it's needed
Issue #

When executing workflows that utilize Fork/Join constructs where multiple sub-workflows run in parallel, we're encountering an issue where a single task (whether it's a system task or a custom task) is being scheduled multiple times with the same attempt number (attempt 0). This issue is easily reproducible in a local environment.
The root cause lies in Conductor's DeciderService (DeciderService#decide(com.netflix.conductor.model.WorkflowModel)), which is responsible for determining and scheduling the next set of tasks by placing them in a queue for task workers to pick up.
The current implementation schedules the next set of tasks based on their statuses retrieved from the workflow entity in the database, which reflects the most recent execution state. Once a task is selected for scheduling, it's marked as executed to prevent duplicate execution.
This decide method is invoked from multiple places such as after the workflow is initiated, upon task completion, and by the sweeper service (which handles rescheduling of timed-out tasks). In scenarios where two tasks (within a fork/join structure) complete concurrently on separate threads and invoke the decide method, the same task can be scheduled multiple times with the same attempt number.
The above decide method is called from WorkflowExecutor#decide(java.lang.String) which includes a locking mechanism to prevent such race conditions, we had disabled it under the assumption that only sequential workflows would be executed—where this issue doesn't occur.
However, after enabling the localOnly lock (already implemented in Conductor for single-instance deployments), the issue no longer reproduces locally. In production environments, we may need to rely on the Redis-based lock (also implemented and currently in use for task status updates). The Netflix Conductor community also strongly recommends enabling locking when working with parallel workflows (see link).

Fix:
Enabled redis lock as suggested by the community.

Alternatives considered

Describe alternative implementation you have considered

The base branch was changed.

thamodaran-marudhudass and others added 26 commits March 13, 2025 04:41

added changes for central message

dafa740

added retry logic

90f5572

added test coverage

5ea81e2

added custom exception

04b3f9f

added javadoc comments

da2a78a

changed filtering implementation

61e3b8b

code review changes

a1ee98f

retry maxinterval removed

3d649cf

changed to default method in status listener

e2b6d3f

refractoring changes

30a41da

unwanted dependencies removed

a36e8f0

changed unirest to httpclient

07a6fec

review comment changes

5d28fc2

added proper logs and java docs

795bc9a

added reason for task and input for workflow

d93bd65

added success log

fc7d331

updated property

5f29c0a

added null check for accountId

aec880f

added null check for account id

5539feb

filter sub workflow task events

2f2ff5d

added debug logs

cebd921

added debug logs

8c09759

added completed and terminated logs

55d92a1

removed unwanted logs

b4b1906

terminate workflow issue

1897147

redis lock enabled

bdcc836

narasimhanft previously approved these changes Jun 11, 2025

View reviewed changes

logeshkumar-ramar previously approved these changes Jun 16, 2025

View reviewed changes

logeshkumar-ramar added 2 commits June 23, 2025 10:08

keeping it as a configurable value

dd381a3

Update application.properties

08ec2c8

ramratanjava previously approved these changes Jun 23, 2025

View reviewed changes

akashvenkatesan0 added 2 commits June 25, 2025 02:47

redis lock config added

f6a251d

redis lock config added

14565e9

Base automatically changed from FS-239161 to journey_prestaging June 25, 2025 06:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FS-280753] - redis lock enabled for decide method #78

[FS-280753] - redis lock enabled for decide method #78

Uh oh!

akashvenkatesan0 commented Jun 11, 2025

Uh oh!

Uh oh!

[FS-280753] - redis lock enabled for decide method #78

Are you sure you want to change the base?

[FS-280753] - redis lock enabled for decide method #78

Uh oh!

Conversation

akashvenkatesan0 commented Jun 11, 2025

Pull Request type

Changes in this PR

Alternatives considered

Uh oh!

Uh oh!