Build an unofficial TensorFlow v2.21.0 GPU wheel for the NVIDIA GeForce RTX 5090,
RTX 50 Series, and other Blackwell workstation/consumer cards that use CUDA
compute capability 12.0. The default output is a Python 3.12, Linux x86_64
TensorFlow wheel built in Docker with CUDA 12.8, cuDNN 9, native sm_120
cubins, and compute_120 PTX fallback code.
This repository is for developers who need a TensorFlow RTX 5090 wheel before
or outside the official TensorFlow release matrix, especially for Blackwell
CUDA workloads where stock wheels may not include sm_120 support.
The final .whl is exported to dist/ and repaired with relative CUDA
RUNPATHs so downstream projects can use TensorFlow with the nvidia-* CUDA pip
packages without manually setting LD_LIBRARY_PATH.
Default TensorFlow GPU wheel configuration:
| Setting | Default |
|---|---|
| TensorFlow version | v2.21.0 |
| TensorFlow commit pin | a481b10260dfdf833a1b16007eead49c1d7febf3 |
| Python ABI | Ubuntu CPython 3.12 / cp312 |
| Platform | Linux x86_64 |
| CUDA build image | nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04 |
| Hermetic CUDA/cuDNN | 12.8.1 / 9.8.0 |
| CUDA architectures | sm_120,compute_120 |
| Bazel CUDA config | cuda_nvcc |
| Wheel suffix | +selfbuild |
| Wheel output | dist/ |
sm_120 gives native RTX 5090 and RTX 50 Series Blackwell cubins. compute_120
embeds PTX so the NVIDIA driver has a JIT fallback path for compatible 12.x
devices.
This builder defaults to NVIDIA Blackwell workstation/consumer GPUs with CUDA compute capability 12.0. NVIDIA's CUDA GPU Compute Capability table lists the following 12.0 cards, which are the intended default target for this wheel:
- GeForce RTX 50 Series: RTX 5090, RTX 5080, RTX 5070 Ti, RTX 5070, RTX 5060 Ti, RTX 5060, and RTX 5050
- NVIDIA RTX PRO Blackwell: RTX PRO 6000 Blackwell Server Edition, RTX PRO 6000 Blackwell Workstation Edition, RTX PRO 6000 Blackwell Max-Q Workstation Edition, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, RTX PRO 4000 Blackwell SFF Edition, and RTX PRO 2000 Blackwell
For those cards, sm_120 is the native cubin target and compute_120 is the
embedded PTX target.
The default wheel is not a universal NVIDIA GPU wheel. It does not include native cubins for older or different compute capabilities such as:
| GPU generation or card family | Common CUDA target | Default support |
|---|---|---|
| NVIDIA GB200/B200 Blackwell data center | sm_100 |
Rebuild with an explicit sm_100/compute_100 target |
| NVIDIA GB300/B300 Blackwell data center | sm_103 |
Rebuild with an explicit sm_103/compute_103 target |
| NVIDIA GH200, H200, H100 Hopper | sm_90 |
Rebuild with sm_90/compute_90 |
| RTX 6000 Ada, RTX 4090, RTX 4080, RTX 4070, RTX 4060 | sm_89 |
Rebuild with sm_89/compute_89 |
| RTX A6000/A5000/A4000 and GeForce RTX 30 Series Ampere | sm_86 |
Rebuild with sm_86/compute_86 |
| NVIDIA A100/A30 Ampere data center | sm_80 |
Rebuild with sm_80/compute_80 |
When adding 10.x, 12.x, or future CUDA targets, use a CUDA image and nvcc
version that can compile those architectures.
To build one TensorFlow wheel for several GPU generations, change the
compute_capabilities setting through the interactive menu and regenerate the
Dockerfile. For example:
sm_89,compute_89,sm_120,compute_120
More architecture targets increase TensorFlow build time and wheel size. For
the RTX 5090 specifically, keep sm_120,compute_120 so the wheel contains
native Blackwell kernels plus a PTX fallback.
The default artifact is a CPython cp312 wheel for Linux x86_64. This
repository does not build Windows, macOS, Linux aarch64, or Jetson wheels by
default. Non-default Python versions are possible, but each wheel should be
installed and smoke-tested in a matching Python environment.
- Linux x86_64 host
- Docker with Buildx and BuildKit
- Python 3 to run
main.py - Recent NVIDIA driver for RTX 5090 or RTX 50 Series runtime testing
- Enough disk space, memory, and time for a TensorFlow source build
TensorFlow source is cloned and compiled inside Docker. The host Python environment is only used to run the build driver.
Review the resolved defaults:
python3 main.py --show-configGenerate the Dockerfile and TensorFlow Bazel config:
python3 main.py --generateBuild the TensorFlow v2.21.0 CUDA wheel:
python3 main.py --buildSuccessful default builds write the wheel to:
dist/tensorflow-2.21.0+selfbuild-cp312-cp312-linux_x86_64.whl
Build logs are written to logs/.
You can also run the interactive menu:
python3 main.pyThe default build uses Ubuntu's packaged Python 3.12. For other Python ABIs, the interactive menu can switch the Python distribution to one of:
ubuntu: Ubuntu package CPython, recommended for the default Python 3.12 builddeadsnakes: Deadsnakes PPA CPython, useful for quick experimentssource: CPython built from python.org sources inside Docker
The source-built path currently supports Python 3.10, 3.11, and 3.13.
It avoids relying on Launchpad during unattended build queues, at the cost of
building CPython before TensorFlow. Wheel filenames and virtualenv commands
will use the matching Python ABI tag, such as cp311 or cp313.
Install with TensorFlow's CUDA extras so the required NVIDIA CUDA libraries are pulled in as pip packages. For the default Python 3.12 wheel:
python3.12 -m venv /tmp/tf-5090
. /tmp/tf-5090/bin/activate
python -m pip install --upgrade pip
python -m pip install 'dist/tensorflow-2.21.0+selfbuild-cp312-cp312-linux_x86_64.whl[and-cuda]'If reinstalling into an environment that already has the same wheel version, force pip to replace the installed files:
python -m pip install --force-reinstall \
'dist/tensorflow-2.21.0+selfbuild-cp312-cp312-linux_x86_64.whl[and-cuda]'Run a small TensorFlow import and GPU check:
python - <<'PY'
import json
import tensorflow as tf
print(tf.__version__)
print(json.dumps(tf.sysconfig.get_build_info(), indent=2, sort_keys=True))
print(tf.config.list_physical_devices("GPU"))
PYExpected highlights:
- Version:
2.21.0+selfbuild cuda_compute_capabilities:["sm_120", "compute_120"]- One or more RTX 5090, RTX 50 Series, or compatible Blackwell GPUs listed
Leave LD_LIBRARY_PATH unset for the smoke test. The wheel repair step adds
relative ELF RUNPATHs into TensorFlow shared objects so TensorFlow can find CUDA
libraries installed by the nvidia-* pip packages under
site-packages/nvidia/*/lib.
Use manual LD_LIBRARY_PATH only as a diagnostic if you are testing an
unrepaired or custom wheel.
TensorFlow v2.21.0 has a cuda_clang build path, but its hermetic LLVM 18
toolchain rejects sm_120. This builder uses TensorFlow's cuda_nvcc Bazel
config so CUDA 12.8 nvcc compiles native RTX 5090 and RTX 50 Series kernels.
main.py: interactive and non-interactive Docker build driverDockerfile: generated BuildKit recipe.tf_configure.bazelrc: generated TensorFlow Bazel configdist/: wheel outputlogs/: Docker build logs.tf5090-build.json: optional local saved build settings
dist/, logs/, and .tf5090-build.json are ignored by git.
This is an unofficial downstream TensorFlow build, not an official TensorFlow release. For GitHub releases, keep the wheel filename and release notes clear about the target:
- TensorFlow
v2.21.0 - Python ABI, for example
cp312for the default build - Linux x86_64
- CUDA 12.8 / cuDNN 9
- NVIDIA Blackwell compute capability 12.0
- RTX 5090 / RTX 50 Series
sm_120
The local version suffix (+selfbuild) is intentional for a downstream wheel.
If you publish multiple variants, use distinct local suffixes such as
+rtx5090.cuda128.sm120.
- TensorFlow source builds are large. Docker image and BuildKit caches can use tens of gigabytes.
- The generated wheel depends on NVIDIA CUDA pip packages when installed with
[and-cuda]; it does not vendor those libraries directly. - Non-default Python builds should be installed and smoke-tested in a matching Python environment.
- Arbitrary TensorFlow tags may require matching changes to Python, CUDA, cuDNN, Bazel, or TensorFlow build flags.