Skip to content

BlasLt interface refactoring and important MatmulPlan cache fixes#994

Open
pemeliya wants to merge 1 commit into
rocm-jaxlib-v0.9.1from
pemeliya/v0.9.1_gpu_blaslt_refactor
Open

BlasLt interface refactoring and important MatmulPlan cache fixes#994
pemeliya wants to merge 1 commit into
rocm-jaxlib-v0.9.1from
pemeliya/v0.9.1_gpu_blaslt_refactor

Conversation

@pemeliya

Copy link
Copy Markdown
Collaborator

Cherry-picking from openxla#42239 and openxla#43503

- Refactor BlasLt::Get() to take StreamExecutor* and return StatusOr<BlasLt*>
- Remove static convenience methods (GetMatmulPlan, GetGroupedMatmulPlan)
- Remove Stream* parameter from GetAlgorithms
- Split ROCm MatmulPlan into RegularMatmulPlan and GroupedMatmulPlan
- Remove redundant ExecuteOnStream overloads from base class
- Rename internal members (parent_ -> executor_, blas_lt_ -> handle_)
- Update all callers (autotuners, SYCL backend)

clang format

clang tidy

clang tidy

clang tidy

clang tidy

fixes after rebase

fixing deps

clang format

fix

fixing cuda test

fixing cuda

inlined NYI GroupedMatmulPlan for SyCL

addressing comments

cherry-picking gpu_blas_lt refactoring changes from upstream
@pemeliya pemeliya force-pushed the pemeliya/v0.9.1_gpu_blaslt_refactor branch from 832bf4a to 9a20615 Compare June 25, 2026 04:12
@pemeliya pemeliya requested review from hsharsha and i-chaochen June 25, 2026 04:13

@i-chaochen i-chaochen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok to me, but please advice JAX team earlier next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants