Metal: stacked patches for MPS lifecycle, CI, and relu-metal-cpp fix#2
Open
Metal: stacked patches for MPS lifecycle, CI, and relu-metal-cpp fix#2
Conversation
Use stream->commandEncoder() instead of creating encoders directly via [cmdBuf computeCommandEncoder] to properly integrate with PyTorch's MPS stream encoder lifecycle management (kernel coalescing). Direct encoder creation bypasses the stream's internal _commandEncoder state and crashes on sequential kernel dispatches. Lower the default Metal standard from metal3.2 (macOS 15+) to metal3.1 (macOS 14+) since all current kernel features (bfloat16_t, simd_sum, simd_shuffle, threadgroup_barrier) are available in Metal 3.1. Add multi-strategy Metal toolchain detection for macOS 14+: - Separate Metal toolchain component (macOS 26+ cryptex mount) - xcrun/xcode-select based detection - Direct /Applications/Xcode*.app filesystem scan fallback Also clear SDKROOT in xcrunHost to prevent Nix-set SDK paths from interfering with system xcrun. Fixes: huggingface#307 Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
Test Metal kernel builds across multiple macOS versions to verify compatibility with the metal3.1 standard (macOS 14+). Use sandbox=relaxed for Nix to support __noChroot builds that access the host Metal toolchain. The separate Metal toolchain download is only needed on macOS 26+. Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
Add C bridge functions (getMPSCommandEncoder, mpsSynchronize, mpsDispatchSync) to metallib_loader.mm so the C++ metal-cpp example can properly integrate with PyTorch's MPS stream encoder lifecycle without needing ObjC++ code in the main kernel file. Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
macOS 14 builds succeed but MPS tests may OOM on runners with limited unified memory. Use continue-on-error so macos-14 failures don't block the workflow. Update Metal docs to reflect macOS 15+ as the supported baseline with macOS 14 best-effort. Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
6 tasks
- Added section on vLLM Metal integration (March 2026) - Documented platform backend, attention backend, worker/runner status - Noted smoke test passing, E2E validation in progress - Listed key findings on MPS lazy evaluation and memory model - Updated open questions to include vLLM performance baseline This tracks the active E2E validation work toward closing the gap between HF kernel ecosystem and llama.cpp on macOS. Co-developed-by: Claude Code v2.0.76 (claude-haiku-4-5-20251001)
Allow per-kernel Metal standard version configuration via build.toml, following the pattern of cuda-flags, hip-flags, and sycl-flags. The default remains metal4.0 (upstream's current value). Kernels that need broader macOS compatibility can set metal-std-version = "metal3.1" (macOS 14+) or "metal3.2" (macOS 15+). AIR versions are forward- compatible, so metal3.1 kernels run on Metal 4 hardware. Changes: - Add metal_std_version field to Kernel::Metal in config structs (v2, v3, mod) - Pass field through Jinja template context to generated CMake - Accept METAL_STD_VERSION in metal_kernel_component() and propagate to compile_metal_shaders() via parent scope - Default to metal4.0 in compile-metal.cmake when not specified - Set metal-std-version = "metal3.1" in relu-metal-cpp example for broad macOS 14+ compatibility Co-developed-by: Claude Code v2.1.58 (claude-opus-4-6)
Create a shared test utilities package that consolidates duplicated device detection, tolerance tables, and allclose helpers across all kernel repos. The package is automatically available in all kernel dev/test shells via the default pythonCheckInputs. Modules: - device: get_device(), get_available_devices(), skip_if_no_gpu() - tolerances: DEFAULT_TOLERANCES dict, get_tolerances(dtype) - allclose: fp8_allclose() with MPS float64 workaround Wired into nix overlay and set as default pythonCheckInputs in genKernelFlakeOutputs so downstream repos get it automatically. Updated template test to use kernels_test_utils imports. Co-developed-by: Claude Code v2.1.58 (claude-opus-4-6)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked patches for Metal support improvements:
CI status