Skip to content

doc: job-options memory options #1298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

doc: job-options memory options #1298

wants to merge 5 commits into from

Conversation

pvbouwel
Copy link
Contributor

@pvbouwel pvbouwel commented Aug 6, 2025

Clarify the documentation on memory related job-options. If the backend has a default of None then a limit of 0 bytes is set rather then no enforcement.

Also clarify that it is purely a limit and not a reservation and avoid mentioning UDFs for memory overhead.

Clarify the documentation on memory related job-options.
If the backend has a default of None then [a limit of 0 bytes is set](18e512d#diff-f606c26975b555ceec5f9fd97aa22cb9372fdd3d508db2eb8ae248f3b559f2eeL134) rather then no enforcement.

Also clarify that it is purely a limit and not a reservation.
"Typical processes that use python-memory are UDF's, sar_backscatter or Sentinel 3 data loading. "
"Leaving this setting empty will allow Python to use almost all of the executor-memoryOverhead, but may lead to unclear error messages when the memory limit is reached."
"This memory not a reservation so it can act as executor_memory_overhead but it is enforced as a limit."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "is" here I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a bit confused by the "so it can act as executor_memory_overhead":
does that bind to "reservation" or "not a reservation"?

Also: what is the (user-relevant) difference between being a reservation and a limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reservation means something is exclusively reserved and can only be used for that purpose so generally it acts as a lower bound in terms of memory. A limit defines a limitation on a resource and thus in case of memory acts as an upper bound.

So "so it can act as executor_memory_overhead" binds to "not a reservation" because if you specify Python Memory to be 4GiB any other process can consume it hence it can be used as a "executor_memory_overhead" as well so you should only use "executor_memory_overhead" if you want to have more memory and want to make sure it is not assignable to Python. Otherwise it is better to use Python memory as that can be for Python or anything else.

I am struggling however to have it in concise documentation and I thought the reservation and limit would help but it seems to cause confusion by itself .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these option descriptions use terminology like "allocation", "reservation" and "limit", and apparently my Spark memory management knowledge is a bit too rusty (like most users I guess) to understand the nuances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also say that we advice to work with python-memory rather than executor-memory-overhead?
The total memory is also a limited, but enforced by 'process killers' that simply kill the entire process, leading to hard-to-analyze debug messages.
The python-memory limit is enforced internally by python itself, which has 2 advantages:

  • garbage collection and caching schemes can take the limit into account, and free memory when the limit is reached
  • error messages clearly state when the error is caused by reaching the limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdries with "error messages clearly state when the error is caused by reaching the limit" you mean it is clear it is a memory related issue, right? Becuase the error I saw was just [enforce fail at alloc_cpu.cpp:66] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 75497472 bytes. Error code 12 (Cannot allocate memory)

Knowing that it is the pyspark limit still requires the user to realize it is a Python process that tries to perform the allocation.

But I guess you mean that this is much clearer compared to a process just disappearing?

Perhaps it might be better to have a doc section on memory management and refer to that doc (e.g. a link to https://open-eo.github.io/openeo-geopyspark-driver/... ) ? Because now we need to explain executor_memory, executor_memory_overhead and python_memory separately but they influence each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants