Skip to content

docker: make --nvidia actually work on hybrid GPU laptops; clear error when toolkit is missing#5

Merged
mrpollo merged 1 commit into
mainfrom
fix-nvidia-runtime-check
May 13, 2026
Merged

docker: make --nvidia actually work on hybrid GPU laptops; clear error when toolkit is missing#5
mrpollo merged 1 commit into
mainfrom
fix-nvidia-runtime-check

Conversation

@TSC21
Copy link
Copy Markdown
Member

@TSC21 TSC21 commented May 12, 2026

The --nvidia flag in docker_run.sh was effectively broken on hybrid Intel + NVIDIA laptops:

  1. /dev/dri was only forwarded in the non-nvidia branch. With --runtime nvidia, the integrated GPU is no longer visible, so the Mesa iris driver fails to query DRM and Gazebo's Ogre renderer cannot create a GLX/EGL screen. Result:
    MESA: error: Failed to query drm device.
    libGL error: glx: failed to create dri3 screen
    libGL error: failed to load driver: iris
    libEGL warning: egl: failed to create dri2 screen
    
  2. GLX was not routed to NVIDIA's vendor library, so even when the toolkit is installed, Gazebo would still try iris first.
  3. The toolkit-not-installed error was cryptic (unknown or invalid runtime name: nvidia), confusing workshop attendees who hadn't installed nvidia-container-toolkit.

This PR:

  • Forwards /dev/dri (with the host video/render GIDs) in all GUI modes — the nvidia path now has both NVIDIA and Mesa available.
  • Adds __NV_PRIME_RENDER_OFFLOAD=1 + __GLX_VENDOR_LIBRARY_NAME=nvidia so GLX uses NVIDIA's vendor library on hybrid systems. Verified: glxinfo -B reports OpenGL renderer: NVIDIA GeForce GTX 1070 ....
  • Adds a pre-flight docker info check; if the nvidia runtime is missing, prints install + nvidia-ctk runtime configure instructions and a hint about the non-nvidia fallback.
  • docs/setup.md extended to mention nvidia-ctk runtime configure --runtime=docker + Docker restart.

@TSC21 TSC21 force-pushed the fix-nvidia-runtime-check branch 2 times, most recently from b5917b4 to e4a3b4d Compare May 12, 2026 21:46
@TSC21 TSC21 changed the title docker: clearer error when --nvidia is used without the NVIDIA Container Toolkit docker: make --nvidia actually work on hybrid GPU laptops; clear error when toolkit is missing May 12, 2026
@TSC21 TSC21 requested a review from mrpollo May 12, 2026 21:51
@mrpollo
Copy link
Copy Markdown

mrpollo commented May 13, 2026

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown

Copilot AI commented May 13, 2026

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in commit 706cffa. The conflict in docker/docker_run.sh was between origin/main's /dev/dri forwarding (only in the non-nvidia else branch) and this PR's approach of forwarding /dev/dri unconditionally for all GUI modes. Kept the PR's version which is the correct fix for hybrid GPU laptops.

Three closely related changes to docker_run.sh so the --nvidia path
actually renders Gazebo on the discrete GPU:

1. Always forward /dev/dri (instead of only in the non-nvidia branch).
   The nvidia runtime by itself only ships NVIDIA's GL stack; without
   /dev/dri the Mesa iris driver fails to query DRM and Ogre cannot
   create a GLX/EGL screen, producing
       MESA: error: Failed to query drm device.
       libGL error: glx: failed to create dri3 screen
       libGL error: failed to load driver: iris

2. Pass __NV_PRIME_RENDER_OFFLOAD=1 and __GLX_VENDOR_LIBRARY_NAME=nvidia
   so GLX is routed through NVIDIA's vendor library on hybrid laptops.
   With this set, glxinfo -B reports
       OpenGL vendor:   NVIDIA Corporation
       OpenGL renderer: NVIDIA GeForce GTX 1070 ...

3. Pre-flight check for the 'nvidia' runtime via `docker info`. If
   the NVIDIA Container Toolkit is not installed/registered, the
   bare docker error is the cryptic
       unknown or invalid runtime name: nvidia
   Print a clear message pointing to the install guide and the
   `nvidia-ctk runtime configure --runtime=docker` + restart step,
   and mention the fallback (omit --nvidia for the integrated GPU).

docs/setup.md is updated to mention the runtime registration step
alongside the toolkit install.
@mrpollo mrpollo force-pushed the fix-nvidia-runtime-check branch from 706cffa to b6f1fc5 Compare May 13, 2026 20:44
@mrpollo mrpollo merged commit bece00f into main May 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants