Skip to content

Support GPU Passthrough to VMs #106

@MalteJ

Description

@MalteJ

Summary

This work item enables FeOS to attach one or more physical GPUs directly to a FeOS-managed Virtual Machine (VM) using PCIe passthrough.

This functionality is critical for supporting GPU-accelerated workloads such as Artificial Intelligence (AI), Machine Learning (ML), scientific computing, and high-performance graphics within VMs. The implementation will extend the VM API to allow specifying GPUs by their host PCIe address.


Scope

✅ In Scope

  • Extend the FeOS VM API to allow specifying one or more GPUs via their host PCIe address for attachment to a VM.
  • Implement the backend logic for PCIe passthrough of a complete physical GPU (e.g., using IOMMU / vfio-pci).
  • Ensure the guest VM can recognize the attached GPU and that appropriate vendor drivers (e.g., NVIDIA, AMD) can be installed and utilized.
  • Support for passing through multiple GPUs to a single VM.

❌ Out of Scope

  • GPU virtualization technologies like NVIDIA vGPU or AMD MxGPU (SR-IOV). This issue focuses exclusively on full device passthrough.
  • Live migration of VMs with attached GPUs.
  • Dynamic hot-plugging of GPUs. GPUs must be attached when the VM is created or started.
  • Host-side GPU driver installation and configuration. This issue assumes the host is correctly prepared for passthrough.

Responsible Areas

  • FeOS VM Management
  • FeOS API

Contributors


Acceptance Criteria

  • API

    • The VM API is extended to accept a list of PCIe addresses for GPUs in the VM specification.
    • The API performs validation to ensure the specified PCIe devices exist and are available for passthrough.
  • VM Runtime & Guest OS

    • A VM can be successfully launched with one or more GPUs passed through to it.
    • The guest operating system correctly identifies the hardware of the passed-through GPU(s) (e.g., visible in lspci).
    • Vendor-specific drivers (e.g., NVIDIA driver) can be installed successfully inside the guest OS.
    • A GPU-accelerated application or utility (e.g., nvidia-smi, a CUDA/OpenCL sample) runs successfully within the VM and can access the GPU's capabilities.
    • The FeOS host correctly isolates the device, preventing host-level drivers from claiming it while it is assigned to a VM.

Action Items

  • Design the API extension in the VM model for specifying GPU devices.
  • Implement the backend logic to configure the hypervisor for GPU passthrough (e.g., managing IOMMU groups, binding to vfio-pci).
  • Ensure that all functions of a GPU (e.g., graphics and audio components on the same PCIe card) are passed through together.
  • Add robust validation and error handling for cases where a GPU is unavailable or passthrough fails.
  • Create integration tests that:
    • Launch a VM with a single GPU and verify its functionality in the guest.
    • Launch a VM with multiple GPUs and verify their functionality in the guest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions