[SHUFFLE][WIP]Prototype using remote storage for shuffle #53028
+948
−46
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This change(design doc) adds support to use remote storage for shuffle data storage.
The primary goal is to enhance the elasticity and resilience of Spark workloads, leading to substantial cost optimization opportunities.
This is a PoC to elicit feedback from community.
Why are the changes needed?
This change decouples storage from compute, therein helping to minimize shuffle failure and also in better scaling of the cluster.
Does this PR introduce any user-facing change?
This change adds 3 SQL configs to enable the feature
Remote storage location for shuffle
spark.shuffle.remote.storage.path=<remote storage path>Config that determines if the feature needs to be used
spark.sql.shuffle.consolidation.enabled=true|falseShuffle plugin to use when the feature is enabled(This needs to be configured currently, but we can switch to this automatically when feature is enabled TBD )
spark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.sort.remote.HybridShuffleDataIO-How was this patch tested?
Manual testing. Unit test to be added.
Trying to get feedback from community before writing elaborate tests.
Was this patch authored or co-authored using generative AI tooling?
No