Skip to content

Expand check for libraries provided by the host #2077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

murraybd
Copy link
Contributor

In addition to not generating runtime dependencies or provides for libcuda.so.1 we also do not want to create them for libnvcuvid.so.1 and libnvidia-encode.so.1.

In addition to not generating runtime dependencies or provides for
libcuda.so.1 we also do not want to create them for libnvcuvid.so.1 and
libnvidia-encode.so.1.
@murraybd murraybd force-pushed the expand-check-for-host-provided-so-names branch from 3253280 to 10c1236 Compare July 10, 2025 20:30
@murraybd murraybd requested a review from dannf July 10, 2025 22:25
Copy link
Contributor

@dannf dannf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the reason we want to filter out these libraries is different than with libcuda.so.1. The reason we want to filter these additional libraries has nothing to do with the whole host passthrough business. It is because there are multiple packages that provide the same soname (one in each -cuda-X variant), and every package that depends on libnvcuvid.so.1 should really just get 1 specific one.

So if we build a binary cuda-app-cuda-12.6 that links against libnvcuvid.so.1, we want it to always get nvidia-libnvcuvid-12.6. Not nvidia-libnvcuvid-11.8, and not libnvcuvid-12.9. In theory, we could add an additional explicit dependency on nvidia-libnvcuvid-12.6 and the resolver would be able to figure out that the only way to resolve both the sca-added so:libnvcuvid.so.1 and nvidia-libnvcuvid-12.6 is to use nvidia-libnvcuvid-12.6 for both. But apko's resolver doesn't work that way. Instead, apko's resolver will find some package that satisfies so:libnvcuvid.so.1. Once it finds one, that's it - it's resolved. Let's say it fixated on nvidia-libnvcuvid-11.8. Later, it will try to resolve nvidia-libnvcuvid-12.6. Well, that won't work. nvidia-libnvcuvid-12.6 conflicts with nvidia-libnvcuvid-11.8. apko will then just refuse to continue. Slumber party over, I'm calling mom to pick me up.

We should probably keep these soname lists logically separate, with separate comments explaining their purposes. Maybe a isHostProvidedLibrary() and a isLibraryWithMultipleVariants()? fwiw - I think there are just 2 libs in the libcuda.so.1 category - the other being libnvidia-ml.so.1. While there are a ton of libraries in the libnvcuvid bucket.

@smoser
Copy link
Contributor

smoser commented Jul 11, 2025

@dannf thank you for that really well written comment.

Copy link
Contributor

@dannf dannf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love your commit message that explains the details about where this list comes from. One minor / optional suggestion would be to also note how one can repeat the process you used to generate this list, in case it needs updating in the future.

@dannf
Copy link
Contributor

dannf commented Jul 25, 2025

@dannf thank you for that really well written comment.

Thanks - but, FTR, it turns out to be (no longer?) correct. I attempted to reproduce that scenario and, at least today, apko does seem to properly handle additional dependencies that disambiguate a generated so: dep.

(I had wondered why some packages had options->no-provides in the past, so i tried removing them, and that broke things in a way that matched the above scenario, but I haven't gone back to retry that experiment to confirm)

nvidia-container-toolkit will provide all of these libraries to a
container if the NVIDIA_DRIVER_CAPABILITIES=all environment variable is
set. To avoid conflicts with the host let's not generate provides for
any of them.

The list of libraries was generated by installing
nvidia-container-toolkit 1.17.8-1 on an Ubuntu 24.04 system with an
NVIDIA GPU and then running the Chainguard bash docker container with
`-e NVIDIA_DRIVER_CAPABILITIES=all --gpus all` and checking /usr/lib/
for all libraries with the same version number as the NVIDIA drivers
installed on the host.
@murraybd murraybd force-pushed the expand-check-for-host-provided-so-names branch from a604dec to 781bd2b Compare July 25, 2025 18:25
@murraybd murraybd merged commit 89f2ac4 into chainguard-dev:main Jul 25, 2025
59 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants