[SHUFFLE][WIP]Prototype using remote storage for shuffle #53028

karuppayya · 2025-11-13T00:34:10Z

What changes were proposed in this pull request?

This change(design doc) adds support to use remote storage for shuffle data storage.

The primary goal is to enhance the elasticity and resilience of Spark workloads, leading to substantial cost optimization opportunities.

This is a PoC to elicit feedback from community.

Why are the changes needed?

This change decouples storage from compute, therein helping to minimize shuffle failure and also in better scaling of the cluster.

Does this PR introduce any user-facing change?

This change adds 3 SQL configs to enable the feature
Remote storage location for shuffle
spark.shuffle.remote.storage.path=<remote storage path>

Config that determines if the feature needs to be used
spark.sql.shuffle.consolidation.enabled=true|false

Shuffle plugin to use when the feature is enabled(This needs to be configured currently, but we can switch to this automatically when feature is enabled TBD )
spark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.sort.remote.HybridShuffleDataIO -

How was this patch tested?

Manual testing. Unit test to be added.
Trying to get feedback from community before writing elaborate tests.

Was this patch authored or co-authored using generative AI tooling?

No

github-actions bot added SQL BUILD CORE labels Nov 13, 2025

Shufflevault: Shuffle on S3

bb655de

karuppayya force-pushed the shufflevault branch from 90e45fb to bb655de Compare November 14, 2025 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SHUFFLE][WIP]Prototype using remote storage for shuffle #53028

[SHUFFLE][WIP]Prototype using remote storage for shuffle #53028

karuppayya commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SHUFFLE][WIP]Prototype using remote storage for shuffle #53028

Are you sure you want to change the base?

[SHUFFLE][WIP]Prototype using remote storage for shuffle #53028

Conversation

karuppayya commented Nov 13, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant