The driver being used here is pretty old by now
|
DEBIAN_FRONTEND=noninteractive sudo apt install nvidia-driver-525 -y |
and despite the expanded CUDA minor version compatibility, some issues arise, e.g. CUDA 12.8/9 produce a PTX that is incompatible with the current driver; concretely, I'm getting
RuntimeError: module load failed with status code 222: CUDA_ERROR_UNSUPPORTED_PTX_VERSION
in conda-forge/tinygrad-feedstock#12.
Furthermore, AFAICT, the build here is actually installing the Debian-based CUDA drivers (named nvidia-drivers-XXX) though we're in an ubuntu image (which has different naming for its native CUDA packaging, e.g. nvidia-graphics-driver-XXX[-server])
|
IMAGE_NAME := ubuntu-2404-$(IMAGE_TYPE)-$(TIMESTAMP) |
|
export DIB_CLOUD_IMAGES=https://cloud-images.ubuntu.com/noble/20251026/ |
Ideally we could update the driver to use the Ubuntu-native packaging for the CUDA drivers? Ubuntu also has new enough drivers already, whereas Debian is currently stuck on 550, which is apparently not yet compatible with the PTX of CUDA 12.9.
AFAICT, this would work as follows
sudo apt update
sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers install nvidia:580
CC @aktech @jaimergp
The driver being used here is pretty old by now
open-gpu-server/vm-images/elements/cuda/post-install.d/05-cuda-install
Line 34 in b83117b
and despite the expanded CUDA minor version compatibility, some issues arise, e.g. CUDA 12.8/9 produce a PTX that is incompatible with the current driver; concretely, I'm getting
in conda-forge/tinygrad-feedstock#12.
Furthermore, AFAICT, the build here is actually installing the Debian-based CUDA drivers (named
nvidia-drivers-XXX) though we're in an ubuntu image (which has different naming for its native CUDA packaging, e.g.nvidia-graphics-driver-XXX[-server])open-gpu-server/Makefile
Line 7 in b83117b
open-gpu-server/vm-images/build-image.sh
Line 11 in b83117b
Ideally we could update the driver to use the Ubuntu-native packaging for the CUDA drivers? Ubuntu also has new enough drivers already, whereas Debian is currently stuck on 550, which is apparently not yet compatible with the PTX of CUDA 12.9.
AFAICT, this would work as follows
CC @aktech @jaimergp