-
Notifications
You must be signed in to change notification settings - Fork 11
Description
The Bug
When running docker compose up
on newer Apple Silicon hardware (tested on an M4 MacBook Air), the search-1
container (OpenSearch) fails to start. It enters a crash loop, repeatedly showing a fatal error from the Java Runtime Environment.
This blocks local development for anyone on the affected hardware, preventing contributions. I encountered this while working on a separate issue related to adding run_numbers
to CMS RAW data records.
Observed Behaviour
The search-1
container log is flooded with the following fatal error, indicating an "Illegal Instruction" signal (SIGILL
):
search-1 | # A fatal error has been detected by the Java Runtime Environment:
search-1 | #
search-1 | # SIGILL (0x4) at pc=0x0000f10bbfd40c5c, pid=15, tid=16
search-1 | #
search-1 | # JRE version: (21.0.6+7) (build )
search-1 | # Java VM: OpenJDK 64-Bit Server VM (21.0.6+7-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
search-1 | # Problematic frame:
search-1 | # j java.lang.System.registerNatives()V+0 [email protected]
Steps to Reproduce
- Use a Mac with new Apple Silicon hardware (M4).
- Clone the
cernopendata-portal
repository. - From the project root, run
docker compose up
. - Observe that the
db-1
andcache-1
containers run, butsearch-1
enters a restart loop, and the process eventually fails withdependency failed to start: container opendatacernch-search-1 is unhealthy
.
The Fix That worked for me
The root cause appears to be the JVM attempting to use ARM's Scalable Vector Extension (SVE) instructions, which are either not supported or incorrectly implemented in the JVM version included with the default OpenSearch image.
The issue can be resolved with a two-part fix:
- Disable SVE in the JVM: This is done by passing a specific flag to the JVM via an environment variable.
- Upgrade the OpenSearch Image: The default
opensearch:2
image did not seem to respect the environment variable. It was necessary to switch to a more specific, newer version for the flag to take effect.
The following changes to the search
service in docker-compose.yml
make the portal run successfully on an M4 Mac:
Current docker-compose.yml
(for the search
service):
search:
restart: "always"
image: docker.io/opensearchproject/opensearch:2
environment:
- bootstrap.memory_lock=true
- discovery.type=single-node
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
# ... other vars
Working docker-compose.yml
(for the search
service):
search:
restart: "always"
# 1. Image version was updated to one that respects _JAVA_OPTIONS.
# The user reported using a version like "2.11.9". We should
# pin this to a specific, tested version.
image: docker.io/opensearchproject/opensearch:2.11.1 # Or a newer tested version
environment:
# 2. This environment variable is added to fix the JVM crash.
- _JAVA_OPTIONS=-XX:UseSVE=0
- bootstrap.memory_lock=true
- discovery.type=single-node
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
# ... other vars
Proposed Solution
To ensure the portal is usable out-of-the-box for all contributors on modern hardware, I suggest we update the docker-compose.yml
in the main branch.
- Pin the
search
service's image to a specific, recent version of OpenSearch that is confirmed to work with the_JAVA_OPTIONS
flag (e.g.,2.11.1
or newer). - Add the
_JAVA_OPTIONS=-XX:UseSVE=0
environment variable to thesearch
service.
This will resolve a key development blocker and preserve this knowledge for future contributors.
cc @tiborsimko