[ROCm] Release fixes for rocm-jaxlib-v0.10.2 (combined cherry-picks)#993
Merged
Conversation
Imported from GitHub PR openxla#40385 📝 Summary of Changes Move from linking to rocm libraries trough DsoLoader to direct linking. Make sure we only link to specified rocm library, not to any other library said library may use. This change leaves enough wrapper (namespace wrap) headers needed for TF. 🎯 Justification Makes our build targets not tainted by transitive dependencies of rocm libraries. DsoLoader use has little benefit for rocm since the libraries get loaded ahead of time by jax plugin anyways. 🚀 Kind of Contribution ♻️ Cleanup Copybara import of the project: -- 38470a3 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: [ROCm] Streamline bazel targets for rocm libraries -- 7f2ec34 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: [ROCm] Remove rocm_solver_context -- 1e34f26 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: [ROCm] Remove hipblaslt_wrapper.h -- 2a35a53 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: [ROCm] Remove hipsparse_wrapper.h -- 37338f2 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Remove rocsolver_wrapper.h -- 545039f by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Remove rocblas_wrapper.h -- c7e959e by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Remove rocm_driver_wrapper.h -- e41363d by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Remove wrap namespace usage from rocm_fft -- 94dc13e by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Remove wrap namespace usage from rocm_dnn -- 0885e40 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Remove TryDlopenROCmLibraries -- 983cc3f by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Remove :all_files -- aa5b6c3 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Partialy restore *wrapper.h files needed by ROCm TF -- d2e89c2 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Use more filegroups -- 3abd237 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Remove DsoLoader ROCm support -- 474acd2 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Unblock X86 build CI -- 57e1100 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Fix RBE build -- 8a55bfb by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Move to rocm_lib_import macro Merging this change closes openxla#40385 COPYBARA_INTEGRATE_REVIEW=openxla#40385 from ROCm:draganm/rocm_libs 8a55bfb PiperOrigin-RevId: 904252712 (cherry picked from commit 4f993de) (cherry picked from commit 16f1d8d)
Imported from GitHub PR openxla#41591 📝 Summary of Changes Default for hermetic paths in non-hermetic build too 🎯 Justification Makes CI pick up rocm libs when doing local build + remote testing 🚀 Kind of Contribution 🐛 Bug Fix Copybara import of the project: -- 2ab9ead by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: [ROCm] Unblock CI after PR openxla#40385 Merging this change closes openxla#41591 COPYBARA_INTEGRATE_REVIEW=openxla#41591 from ROCm:draganm/rocm_libs_followup 2ab9ead PiperOrigin-RevId: 909965137 (cherry picked from commit d9922e7) (cherry picked from commit 83380a5)
(cherry picked from commit 8ef3bb4)
…ent_sum by matchin… Imported from GitHub PR openxla#44428 …g FloatNormalization conversions. 📝 Summary of Changes Make atomic-RMW matcher (GetAtomicModifierParameters) to look through the extf → addf(f32) → truncf body that FloatNormalization emits for bf16, recovering the narrow bf16 modifier so scatter-add lowers to packed atomicrmw fadd <2 x bf16> (global_atomic_pk_add_bf16) instead of a CAS loop. GpuFloatSupport/FloatNormalization are unchanged; targets without a native bf16 atomic still fall back to CAS. 🎯 Justification bf16 segment_sum/scatter-add result in slow CAS loop on MI300/MI350 despite the HW having a packed bf16 atomic, making bf16 ~7x slower than f16. 🚀 Kind of Contribution Please remove what does not apply: ⚡️ Performance Improvement, 🧪 Tests 📊 Benchmark (for Performance Improvements) Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. 🧪 Unit Tests: Added direct_atomic_rmw_fadd_bf16_widened + a gfx942 CHECK-GFX942-MI300 RUN line to lower_tensors.mlir, asserting the packed atomicrmw fadd <2 x bf16> with no CAS. All 9 RUN-line prefixes pass. 🧪 Execution Tests: What execution tests were added? For example, a new optimization should be tested with an end-to-end execution test triggering the optimization and asserting correctness. Please provide test cases running with at most 2 GPUs. Copybara import of the project: -- edcb06b by Zoran Jovanovic <zjovanov@amd.com>: [ROCm] Emit packed bf16 atomic add for scatter/segment_sum by matching FloatNormalization conversions. Merging this change closes openxla#44428 COPYBARA_INTEGRATE_REVIEW=openxla#44428 from ROCm:rocm-bf16-atomic-scatter edcb06b PiperOrigin-RevId: 933630040 (cherry picked from commit 97544f7) (cherry picked from commit 06b0034)
This was referenced Jun 24, 2026
The ROCm CI image's /usertools/rocm.bazelrc excludes -//xla/backends/gpu/codegen:dynamic_slice_fusion_test_amdgpu_any, but at this pinned XLA revision (5a9e73cb) that test is an xla_cc_test in //xla/backends/gpu/transforms, so the label doesn't exist and bazel aborts pattern evaluation under --nokeep_going (single_gpu/multi_gpu/rocm_cpu all fail before any test runs). Add an alias under the expected codegen label so the exclude resolves. The real test still runs via //xla/... and passes locally on gfx950. This is a temporary workaround for the CI-image/pin mismatch; revert once the image bazelrc is corrected.
(cherry picked from commit 8e36d62)
The release-branch EXCLUDED_TESTS only filtered two F8E4M3FN cases, while upstream openxla/xla ROCm CI filters a larger set of known ROCm-unsupported / hipBLASLt-gap cases. This left the f64 cublasLt cases (DotOperationTestWithCublasLt.../GeneralMatMulActivation and MatmulTestWithCublas.GemmRewriter_RegressionTestF64) running and failing with "Failed to get a MatmulPlan: no valid algorithm found" (hipBLASLt has no f64+activation algorithm), plus the F8E5M2 / triton dot-algorithm / scaled-dot cases. Take the union: keep our two F8E4M3FN entries and add upstream's skips.
JehandadKhan
approved these changes
Jun 24, 2026
Author
Collaborator
|
we might need this openxla#44047 for this UT rest of gemm related tests are due to MI250 on CI, and it's no problem on MI350 as the test result on the above |
Author
|
94e0cff |
magaonka-amd
added a commit
to ROCm/jax
that referenced
this pull request
Jun 24, 2026
Update XLA_COMMIT to 55f5563f, the rocm-jaxlib-v0.10.2 HEAD after ROCm/xla#993 merged (combined ROCm 0.10.2 release cherry-picks: PR #40385/#41591 bazel target streamlining, librocm_kpack/libclang-cpp runfiles, rocm_blas formatting, openxla #44428/#44389, fix-failing-tests jax-ml#939, dynamic_slice_fusion codegen alias, missing Tensile libs #44567, and EXCLUDED_TESTS alignment). Recompute XLA_SHA256 from the ROCm/xla archive. The four third_party/xla build patches were verified to still apply cleanly against this revision.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Motivation
Combines the four ROCm cherry-pick PRs for the JAX 0.10.2 release into a single PR against
rocm-jaxlib-v0.10.2(cut from pinned XLA5a9e73cb). Supersedes #985, #987, #988, #989.Commits (all cherry-picked with
-x, oldest first)From #983 backport (#985):
PR #40385: [ROCm] Streamline bazel targets for rocm librariesPR #41591: [ROCm] Unblock CI after PR #40385[ROCm] Add librocm_kpack and libclang-cpp to runfiles globsFix formatting error in rocm_blas.ccAdditional fixes requested in #985 review:
[ROCm] Emit packed bf16 atomic add for scatter/segment_sum13dc711—Fix failing tests (#939)(0.10.2 CI)[ROCm] Enable float and buffer checker(from approved-but-unmerged PR head9b7baa6dd5)Notes
rocm/BUILDresolution dropped unrelated newer-main targets not part of that PR; only its two hunks were applied. Result matched the upstream PR exactly (17 files, +963/-613).Test Plan
release/0.10.2release-validation workflow.