Skip to content

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Oct 17, 2025

Which issue does this PR close?

N/A

Rationale for this change

Simplify the on-heap memory configuration, which is intended for test use only now.

The number of configs is reduced from 7 to 4. The remaining configs, which are now documented in a new Development & Testing Settings section in the configuration guide are:

  • COMET_ONHEAP_ENABLED
  • COMET_ONHEAP_MEMORY_POOL_TYPE
  • COMET_ONHEAP_MEMORY_OVERHEAD
  • COMET_ONHEAP_SHUFFLE_MEMORY_FACTOR

What changes are included in this PR?

  • Remove some internal configs:
    • COMET_MEMORY_OVERHEAD_FACTOR
    • COMET_MEMORY_OVERHEAD_MIN_MIB
    • COMET_COLUMNAR_SHUFFLE_MEMORY_SIZE
  • Rename all memory config constants to contain ONHEAP or OFFHEAP

How are these changes tested?

CI

@codecov-commenter
Copy link

codecov-commenter commented Oct 17, 2025

Codecov Report

❌ Patch coverage is 70.83333% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.17%. Comparing base (f09f8af) to head (f668822).
⚠️ Report is 623 commits behind head on main.

Files with missing lines Patch % Lines
...src/main/scala/org/apache/comet/GenerateDocs.scala 0.00% 5 Missing ⚠️
...ain/scala/org/apache/comet/CometExecIterator.scala 66.66% 1 Missing ⚠️
...park/src/main/scala/org/apache/spark/Plugins.scala 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2599      +/-   ##
============================================
+ Coverage     56.12%   59.17%   +3.04%     
- Complexity      976     1444     +468     
============================================
  Files           119      146      +27     
  Lines         11743    13719    +1976     
  Branches       2251     2353     +102     
============================================
+ Hits           6591     8118    +1527     
- Misses         4012     4379     +367     
- Partials       1140     1222      +82     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove changed the title chore: Simplify on-heap memory configuration [WIP] chore: Simplify on-heap memory configuration Oct 17, 2025
@andygrove andygrove marked this pull request as ready for review October 17, 2025 18:41
"This config is optional. If this is not specified, it will be set to " +
s"`spark.comet.memory.overhead.factor` * `spark.executor.memory`. $TUNING_GUIDE.")
.internal()
"when running Spark in on-heap mode.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add default value in the doc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated docs show the default value:

def

| `spark.comet.columnar.shuffle.memory.factor` | Fraction of Comet memory to be allocated per executor process for columnar shuffle when running in on-heap mode. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
| `spark.comet.exec.onHeap.enabled` | Whether to allow Comet to run in on-heap mode. Required for running Spark SQL tests. | false |
| `spark.comet.exec.onHeap.memoryPool` | The type of memory pool to be used for Comet native execution when running Spark in on-heap mode. Available pool types are `greedy`, `fair_spill`, `greedy_task_shared`, `fair_spill_task_shared`, `greedy_global`, `fair_spill_global`, and `unbounded`. | greedy_task_shared |
| `spark.comet.memoryOverhead` | The amount of additional memory to be allocated per executor process for Comet, in MiB, when running Spark in on-heap mode. | 1073741824b |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the default value to be 1024?

Copy link
Member Author

@andygrove andygrove Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. we could update GenerateDocs to recognize byte configs and show in the unit they were defined as. I'll take a look.

.bytesConf(ByteUnit.MiB)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit. The default is now shown as 1024 MiB.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove couple of small nits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants