Skip to content

[ci] aarch64 python-package job failing: "cannot allocate memory in static TLS block" #6509

@jameslamb

Description

@jameslamb

Description

For the last few days, I've observed the aarch64 CI job (which we run on an x86_64 box, using QEMU for emulation), failing with errors like the following during test collection:

___________ ERROR collecting tests/python_package_test/test_basic.py ___________
ImportError while importing test module '/LightGBM/tests/python_package_test/test_basic.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/root/miniforge/envs/test-env/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_basic.py:12: in <module>
    from sklearn.datasets import dump_svmlight_file, load_svmlight_file, make_blobs
/root/miniforge/envs/test-env/lib/python3.12/site-packages/sklearn/__init__.py:97: in <module>
    from .utils._show_versions import show_versions
/root/miniforge/envs/test-env/lib/python3.12/site-packages/sklearn/utils/_show_versions.py:15: in <module>
    from ._openmp_helpers import _openmp_parallelism_enabled
E   ImportError: /root/miniforge/envs/test-env/lib/python3.12/site-packages/sklearn/utils/../../../../libgomp.so.1: cannot allocate memory in static TLS block

Reproducible example

This is happening across several different PRs, with changesets that are very unlikely to be causing this, suggesting it's some other change in the environment. For example:

Environment info

N/A

Additional Comments

"TLS" in this error refers to "thread-local storage".

There is lots of prior discussion on similar issues:

All of those are about using libgomp on aarch64.

From https://bugzilla.redhat.com/show_bug.cgi?id=1722181:

The GNU TLS2 model which I'm afraid aarch64 uses unfortunately eats from the same TLS preallocated pool as libraries that require static TLS like libgomp, where it is performance critical to have it as static TLS.

On opencv/opencv#14884, there's some discussion about this specifically being caused by bundled libgomp in multiple Python packages, and there are suggestions that importing those libraries earlier (and therefore loading their libgomp earlier) can resolve this.

These also have some helpful information:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions