w16ai4 group v1 by yadaish · Pull Request #366 · ROCm/FlyDSL

yadaish · 2026-04-09T07:07:10Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…tions - Extract _unpack_int4_to_int8_pair(): shared 7-op int4->int8 bit manipulation used by unpack_b_w4a16, unpack_b_w4a8, unpack_b_w4a_fp8, and load_b_pack_k32 (was copy-pasted in 4 places) - Extract _pack_i32_pair_to_i64(): shared (even, odd) -> i64 packing - Extract _load_groupwise_scale(): shared scale address calculation and buffer_load for W4A16 and W4A8 groupwise paths - Have load_b_raw_w4a8_groupwise_k64 delegate weight load to load_b_raw_w4a8_k64 (matching W4A16 groupwise pattern) - Replace ir.IntegerType.get_signless(32) / ir.F32Type.get() with T.i32 / T.f32 to follow project conventions - Replace arith.constant(..., index=True) with fx.Index(...) throughout

- Add 'bf16' to out_dtype parametrization (was only f16/f32) - Fix run_moe_stage2 to accept bf16 output dtype - Fix bytes_moved calculation to treat bf16 as 2-byte (like f16) The stage2 kernel (compile_moe_gemm2) already supports out_dtype='bf16' using bf16 global atomics on gfx94+/gfx95+, but the test harness blocked it. Verified all 24 new test cases pass on MI355X (gfx950).

ClementLinCF and others added 17 commits March 12, 2026 23:45

Add W4A8/W4A_FP8 MoE support with groupwise scale

2d8d32f

Fix comments

75bc2d0

Merge branch 'main' into feature/w4a8-moe-port

22c9abc

Fix end to end

bc144e5

Merge remote-tracking branch 'origin/main' into feature/w4a8-moe-port

bfd28e8

Merge remote-tracking branch 'origin/main' into feature/w4a8-moe-port

28fa907

Merge remote-tracking branch 'origin' into feature/w4a8-moe-port

09bcfe5

add base version,3.5T

7226297

use dynamic for

5d61871

reduce gemm1 inst

9960c3b

add

9ea9cae

update

f8e5ba5

update

e7c83e5

tmp update

cfaaa94

update

5bb5b4d

yadaish closed this Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

w16ai4 group v1#366

w16ai4 group v1#366
yadaish wants to merge 17 commits intomainfrom
dev/yadai_w16ai4_group_v1

yadaish commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yadaish commented Apr 9, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants