-
Notifications
You must be signed in to change notification settings - Fork 246
chore: Simplify on-heap memory configuration #2599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2599 +/- ##
============================================
+ Coverage 56.12% 59.17% +3.04%
- Complexity 976 1444 +468
============================================
Files 119 146 +27
Lines 11743 13719 +1976
Branches 2251 2353 +102
============================================
+ Hits 6591 8118 +1527
- Misses 4012 4379 +367
- Partials 1140 1222 +82 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
"This config is optional. If this is not specified, it will be set to " + | ||
s"`spark.comet.memory.overhead.factor` * `spark.executor.memory`. $TUNING_GUIDE.") | ||
.internal() | ||
"when running Spark in on-heap mode.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add default value in the doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `spark.comet.columnar.shuffle.memory.factor` | Fraction of Comet memory to be allocated per executor process for columnar shuffle when running in on-heap mode. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 | | ||
| `spark.comet.exec.onHeap.enabled` | Whether to allow Comet to run in on-heap mode. Required for running Spark SQL tests. | false | | ||
| `spark.comet.exec.onHeap.memoryPool` | The type of memory pool to be used for Comet native execution when running Spark in on-heap mode. Available pool types are `greedy`, `fair_spill`, `greedy_task_shared`, `fair_spill_task_shared`, `greedy_global`, `fair_spill_global`, and `unbounded`. | greedy_task_shared | | ||
| `spark.comet.memoryOverhead` | The amount of additional memory to be allocated per executor process for Comet, in MiB, when running Spark in on-heap mode. | 1073741824b | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the default value to be 1024?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. we could update GenerateDocs to recognize byte configs and show in the unit they were defined as. I'll take a look.
.bytesConf(ByteUnit.MiB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit. The default is now shown as 1024 MiB
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove couple of small nits
Which issue does this PR close?
N/A
Rationale for this change
Simplify the on-heap memory configuration, which is intended for test use only now.
The number of configs is reduced from 7 to 4. The remaining configs, which are now documented in a new
Development & Testing Settings
section in the configuration guide are:COMET_ONHEAP_ENABLED
COMET_ONHEAP_MEMORY_POOL_TYPE
COMET_ONHEAP_MEMORY_OVERHEAD
COMET_ONHEAP_SHUFFLE_MEMORY_FACTOR
What changes are included in this PR?
COMET_MEMORY_OVERHEAD_FACTOR
COMET_MEMORY_OVERHEAD_MIN_MIB
COMET_COLUMNAR_SHUFFLE_MEMORY_SIZE
ONHEAP
orOFFHEAP
How are these changes tested?
CI