-
Notifications
You must be signed in to change notification settings - Fork 16
Improve CUDA detection #156
Conversation
In Docker 19.03+ NVIDIA Container Toolkit can integrates via an OCI prestart hook instead of requiring to be listed as a separate runtime. Signed-off-by: Emily Casey <[email protected]>
Docker may support CUDA GPUs without explicit nvidia runtime. Signed-off-by: Emily Casey <[email protected]>
Reviewer's GuideThe PR enhances CUDA support detection by first checking for the nvidia-container-runtime binary, then falling back to an existing nvidia runtime via the Docker API, and updates container creation to only apply the nvidia runtime when available. Sequence diagram for improved CUDA support detectionsequenceDiagram
participant Caller
participant "ProbeGPUSupport()"
participant "exec.LookPath('nvidia-container-runtime')"
participant "HasNVIDIARuntime()"
participant "dockerClient.Info()"
Caller->>"ProbeGPUSupport()": call
"ProbeGPUSupport()"->>"exec.LookPath('nvidia-container-runtime')": check for binary
alt Binary found
"ProbeGPUSupport()"-->>Caller: return GPUSupportCUDA
else Binary not found
"ProbeGPUSupport()"->>"HasNVIDIARuntime()": check for runtime
"HasNVIDIARuntime()"->>"dockerClient.Info()": get info
"dockerClient.Info()"-->>"HasNVIDIARuntime()": info
alt Runtime found
"HasNVIDIARuntime()"-->>"ProbeGPUSupport()": true
"ProbeGPUSupport()"-->>Caller: return GPUSupportCUDA
else Runtime not found
"HasNVIDIARuntime()"-->>"ProbeGPUSupport()": false
"ProbeGPUSupport()"-->>Caller: return GPUSupportNone
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @ekcasey, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the system's ability to detect and utilize CUDA GPU support across various Docker configurations. It introduces a more adaptive detection mechanism that prioritizes modern Docker 19.03+ setups using OCI hooks, while still supporting older environments that rely on the explicit Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves CUDA detection by first checking for nvidia-container-runtime on the system's PATH, which is the modern approach for Docker 19.03+, and then falls back to checking for a configured nvidia runtime for older setups. The logic for creating containers is also updated to only specify --runtime=nvidia when that runtime is actually available. The changes are well-reasoned and correctly implement the described logic. I have one suggestion to improve error handling by logging a warning if checking for the nvidia runtime fails, rather than silently ignoring the error.
| if ok, err := gpupkg.HasNVIDIARuntime(ctx, dockerClient); err == nil && ok { | ||
| hostConfig.Runtime = "nvidia" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation silently ignores any error returned from gpupkg.HasNVIDIARuntime. While the logic correctly avoids setting the nvidia runtime in case of an error, it would be better to log a warning to inform the user that the check failed. This could help in debugging issues where GPU support is expected but not working correctly.
if ok, err := gpupkg.HasNVIDIARuntime(ctx, dockerClient); err != nil {
printer.Printf("Warning: failed to check for nvidia runtime: %v\n", err)
} else if ok {
hostConfig.Runtime = "nvidia"
}
xenoscopic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late to the party, but LGTM!
|
I will say this CUDA detection can be hard to get 100% right, there probably should be an option to manually specify CUDA (and in future, Vulkan, etc.). And sometimes people want to specifically use a certain backend. There was a group of people I was working with at Ampere for example that wanted to do CPU inferencing rather than use the Nvidia GPU. |
With Docker 19.03+, NVIDIA Container Toolkit integrates via an OCI prestart hook instead of a separate
nvidiaruntime. When checking for CUDA support, assume existence of the runtime hook (nvidia-container-runtime) on the path indicates CUDA support.Older setups with
nvidia-docker2we may still require--runtime=nvidia. Check for this when creating the containers and only supply --runtime=nvidia when such a runtime is available. Assume the existence of the runtime also implies CUDA support.Summary by Sourcery
Enhance GPU support detection to handle both Docker 19.03+ prestart hook integration and legacy nvidia runtime configurations
Enhancements: