doc: job-options memory options #1298

pvbouwel · 2025-08-06T10:17:56Z

Clarify the documentation on memory related job-options. If the backend has a default of None then a limit of 0 bytes is set rather then no enforcement.

Also clarify that it is purely a limit and not a reservation and avoid mentioning UDFs for memory overhead.

Clarify the documentation on memory related job-options. If the backend has a default of None then [a limit of 0 bytes is set](18e512d#diff-f606c26975b555ceec5f9fd97aa22cb9372fdd3d508db2eb8ae248f3b559f2eeL134) rather then no enforcement. Also clarify that it is purely a limit and not a reservation.

openeogeotrellis/job_options.py

soxofaan · 2025-08-07T06:31:10Z

openeogeotrellis/job_options.py

            "Typical processes that use python-memory are UDF's, sar_backscatter or Sentinel 3 data loading. "
-            "Leaving this setting empty will allow Python to use almost all of the executor-memoryOverhead, but may lead to unclear error messages when the memory limit is reached."
+            "This memory not a reservation so it can act as executor_memory_overhead but it is enforced as a limit."


missing "is" here I think?

I'm also a bit confused by the "so it can act as executor_memory_overhead":
does that bind to "reservation" or "not a reservation"?

Also: what is the (user-relevant) difference between being a reservation and a limit?

A reservation means something is exclusively reserved and can only be used for that purpose so generally it acts as a lower bound in terms of memory. A limit defines a limitation on a resource and thus in case of memory acts as an upper bound.

So "so it can act as executor_memory_overhead" binds to "not a reservation" because if you specify Python Memory to be 4GiB any other process can consume it hence it can be used as a "executor_memory_overhead" as well so you should only use "executor_memory_overhead" if you want to have more memory and want to make sure it is not assignable to Python. Otherwise it is better to use Python memory as that can be for Python or anything else.

I am struggling however to have it in concise documentation and I thought the reservation and limit would help but it seems to cause confusion by itself .

Yeah, these option descriptions use terminology like "allocation", "reservation" and "limit", and apparently my Spark memory management knowledge is a bit too rusty (like most users I guess) to understand the nuances.

Should we also say that we advice to work with python-memory rather than executor-memory-overhead?
The total memory is also a limited, but enforced by 'process killers' that simply kill the entire process, leading to hard-to-analyze debug messages.
The python-memory limit is enforced internally by python itself, which has 2 advantages:

garbage collection and caching schemes can take the limit into account, and free memory when the limit is reached

error messages clearly state when the error is caused by reaching the limit

@jdries with "error messages clearly state when the error is caused by reaching the limit" you mean it is clear it is a memory related issue, right? Becuase the error I saw was just [enforce fail at alloc_cpu.cpp:66] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 75497472 bytes. Error code 12 (Cannot allocate memory)

Knowing that it is the pyspark limit still requires the user to realize it is a Python process that tries to perform the allocation.

But I guess you mean that this is much clearer compared to a process just disappearing?

Perhaps it might be better to have a doc section on memory management and refer to that doc (e.g. a link to https://open-eo.github.io/openeo-geopyspark-driver/... ) ? Because now we need to explain executor_memory, executor_memory_overhead and python_memory separately but they influence each other.

Co-authored-by: Stefaan Lippens <[email protected]>

soxofaan reviewed Aug 7, 2025

View reviewed changes

pvbouwel and others added 4 commits August 7, 2025 10:02

Update openeogeotrellis/job_options.py

9ebfdb9

Co-authored-by: Stefaan Lippens <[email protected]>

doc: typo missing is

cd64e0c

doc: typo whitespace

8c6f01d

doc: another attempt on memory options.

37100bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

doc: job-options memory options #1298

doc: job-options memory options #1298

pvbouwel commented Aug 6, 2025

Uh oh!

Uh oh!

soxofaan Aug 7, 2025

Uh oh!

soxofaan Aug 7, 2025

Uh oh!

pvbouwel Aug 7, 2025

Uh oh!

soxofaan Aug 7, 2025

Uh oh!

jdries Aug 8, 2025

Uh oh!

pvbouwel Aug 8, 2025

Uh oh!

Uh oh!

doc: job-options memory options #1298

Are you sure you want to change the base?

doc: job-options memory options #1298

Conversation

pvbouwel commented Aug 6, 2025

Uh oh!

Uh oh!

soxofaan Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

soxofaan Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

pvbouwel Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

soxofaan Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

jdries Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

pvbouwel Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!