-
Notifications
You must be signed in to change notification settings - Fork 245
feat: Reuse existing file instead of reopening during shuffle write #2577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2577 +/- ##
=============================================
- Coverage 56.12% 42.19% -13.94%
- Complexity 976 1093 +117
=============================================
Files 119 146 +27
Lines 11743 13747 +2004
Branches 2251 2353 +102
=============================================
- Hits 6591 5800 -791
- Misses 4012 6978 +2966
+ Partials 1140 969 -171 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
if let Some(spill_data) = self.partition_writers[i].spill_file.as_ref() { | ||
let mut spill_file = | ||
BufReader::new(File::open(spill_data.temp_file.path()).map_err(to_df_err)?); | ||
let mut spill_file = BufReader::new(&spill_data.file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spill_data.file
is opened in write-only mode.
datafusion-comet/native/core/src/execution/shuffle/shuffle_writer.rs
Lines 1169 to 1180 in 8f89c1c
let spill_data = OpenOptions::new() | |
.write(true) | |
.create(true) | |
.truncate(true) | |
.open(spill_file.path()) | |
.map_err(|e| { | |
DataFusionError::Execution(format!("Error occurred while spilling {e}")) | |
})?; | |
self.spill_file = Some(SpillFile { | |
temp_file: spill_file, | |
file: spill_data, | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
Also, could you provide more information about the |
updated. |
Which issue does this PR close?
Closes #.
Rationale for this change
Reduce the system call cost to reuse the existing file
What changes are included in this PR?
How are these changes tested?