Skip to content

Conversation

@sophiatev
Copy link
Contributor

@sophiatev sophiatev commented Oct 15, 2025

This PR adds a new rewind strategy that will be leveraged by the DTS backend. Rather than manually deleting the failed task/suborchestration rows from the history table (as is done by the current rewind implementation), a new history event ExecutionRewoundEvent is introduced that acts in a similar way to other orchestration management actions (for example, termination, suspend, or resume requests). A user's request to rewind the orchestration will send this event to the orchestration. Upon receiving it, the SDK layer (specifically TaskOrchestrationDispatcher) will generate a new history for the orchestration with a new execution ID and the corresponding failed rows removed. It will create ExecutionRewoundEvents for all the failed suborchestrations as well, thereby recursively rewinding to the deepest failed leaves. It is the backend's responsibility to correctly process this new history and handle dispatching the ExecutionRewoundEvents.

Open questions

  1. In the way that the SDK currently alters the history, we could end up with an OrchestratorStartedEvent immediately followed by an OrchestratorCompletedEvent if all the events in between got deleted. Are we okay with this?
  2. Do we want to update the Timestamp of the new ExecutionStartedEvent in the new history to be the time it is actually created, or leave it as the time the original ExecutionStartedEvent was created?

Tracing

There are two options for how tracing will look for a rewound orchestration. One is to just append the rerun Activities/suborchestrations to the existing orchestration span, and the other is to create a new span. (The second option came about because when I enabled distributed tracing, without changing any code yet, I realized that a new span was being created when the orchestration was rerun. I think this might be because the trace activity for the original span is disposed upon orchestration completion, not quite sure). After discussion with team members, we decided to go with option two. This is an example of what it looks like (note that both spans appear under the original "create_orchestration" request):

image

The first span is the original run of the orchestration with the failed suborchestrations/Activities. The second span is the rewind which contains only the rerun failed suborchestrations/Activties (note, for example, that "RunSucceedSubOrchestrator" does not appear in the second span, since this suborchestration was successful and not rerun). This felt like a better approach because

  1. It is very easy to see what exactly is rerun when the orchestration is rewound
  2. The duration of the second span reflects the time it takes the rewound orchestration to complete
  3. In some sense, rewinding the orchestration is sort of like "re-executing" it, just with a subset of the original suborchestrations/Activities

@sophiatev sophiatev marked this pull request as ready for review October 15, 2025 19:12
@sophiatev sophiatev requested a review from cgillum October 15, 2025 19:12
Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments. Were you planning on adding some tests for this new behavior?

isRewinding = true;
if (rewindEvent.ParentTraceContext != null)
{
startEvent.ParentTraceContext = rewindEvent.ParentTraceContext;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we change the trace context for the rewound orchestration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this deserves some more explanation. I updated the PR summary with an example.

workItem.OrchestrationRuntimeState.AddEvent(message.Event);
// In this case, the ExecutionRewoundEvent has already been added to the history and is just sent as a way to trigger the failed deepest suborchestrations to rerun.
// We do not redundantly add it to the history in this situation.
if (!(message.Event is ExecutionRewoundEvent executionRewoundEvent && workItem.OrchestrationRuntimeState.OrchestrationStatus == OrchestrationStatus.Running))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the OrchestrationStatus.Running check for? I'm having a hard time understanding the reason for this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are two situations (+ another edge case one not worth mentioning here) where an orchestration will receive a rewind request. The first is a customer-triggered rewind. In this case this is a genuinely new history event that we want to add to the orchestration history. The second scenario is if an orchestration has no failed suborchestrations, it will resend itself the rewind request to "jumpstart" re-execution. (Previously, "generic events" were being used for this "jumpstarting" purpose). This second request is strictly to force a re-execution so it shouldn't be added to the history. For the first scenario, the orchestration will be in a terminal state, but for the second, it will be "running", which is why that check is there.

I tried to explain this in the comment but let me know if there's a way to make it more clear.

if (runtimeState.ExecutionStartedEvent.TryGetParentTraceContext(out ActivityContext parentTraceContext))
{
// We set a new client span ID here so that the execution of the rewound suborchestration is not tied to the
// old parent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back to my other question: why do we want to remove the old trace associations? In some ways, I think of "rewind" as another form of suspend/resume, in which case we don't change the trace ID. I'm wondering why we'd treat rewind any differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, check out the new PR description.

@sophiatev
Copy link
Contributor Author

Were you planning on adding some tests for this new behavior?

These are all in the DTS repo since that will be the only backend for now that leverages this new rewind strategy (I think they're pretty extensive: testing failed orchestrations with a fanout activity pattern to make sure all failed activities are rerun, testing a fanout pattern with failed suborchestrations as well, testing rewinding with a purged failed suborchestration, etc.)

@cgillum
Copy link
Member

cgillum commented Oct 30, 2025

Responding to the open questions:

In the way that the SDK currently alters the history, we could end up with an OrchestratorStartedEvent immediately followed by an OrchestratorCompletedEvent if all the events in between got deleted. Are we okay with this?

This should be fine. I don't think it will materially affect anything.

Do we want to update the Timestamp of the new ExecutionStartedEvent in the new history to be the time it is actually created, or leave it as the time the original ExecutionStartedEvent was created?

I think you'll want to leave it as the time the original ExecutionStartedEvent was created. Otherwise, all the visualization will get messed up since the timestamps will be out of order relative to the rest of the history.

@sophiatev sophiatev merged commit 8576bc5 into main Nov 3, 2025
44 checks passed
@sophiatev sophiatev deleted the stevosyan/adding-new-rewind-strategy branch November 3, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants