-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
Is your feature request related to a problem? Please describe.
Environment : Linux (with SELinux) + Container (Docker / Podman) + NVIDIA GPU (CUDA)
output : $ Failed to initialize NVML: Insufficient Permissions
Describe the solution you'd like
NVIDIA GPU need to change SELinux policy
chcon -t container_file_t /dev/nvidia*
or Permanentlysemanage fcontext -a -t container_file_t '/dev/nvidia*')
it changes system_u:object_r:xserver_misc_device_t:s0 on /dev/nvidia* files to container can access.
I found the solution and already knowing that PR created and merged (#5252).
Describe alternatives you've considered
Using CDI (Container Device Interface) to more simplify solving the problem
as NVIDIA Docs says, podman recommend using CDI (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-podman) and docker using CDI is also solve problem more simply, and less modify the system.
- docker
add --device nvidia.com/gpu=0
- docker-compose
add
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids:
- nvidia.com/gpu=0
capabilities: [gpu]
device argument can get on nvidia-ctk cdi list (nvidia-ctk provided by nvidia-container-toolkit)
Additional context
I attach my testing case.
-
original way is using --gpus all, when we faced that error message. (before change SELinux policy)
-
CDI Runtime way (when container image doesn't support) is working
-
CDI recommend way also working.
and LocalAI support CDI Recommend way(working for me without SELinux patch), so i suggest to using CDI as main to attach NVIDIA GPU.