Closed
Conversation
…tions - Extract _unpack_int4_to_int8_pair(): shared 7-op int4->int8 bit manipulation used by unpack_b_w4a16, unpack_b_w4a8, unpack_b_w4a_fp8, and load_b_pack_k32 (was copy-pasted in 4 places) - Extract _pack_i32_pair_to_i64(): shared (even, odd) -> i64 packing - Extract _load_groupwise_scale(): shared scale address calculation and buffer_load for W4A16 and W4A8 groupwise paths - Have load_b_raw_w4a8_groupwise_k64 delegate weight load to load_b_raw_w4a8_k64 (matching W4A16 groupwise pattern) - Replace ir.IntegerType.get_signless(32) / ir.F32Type.get() with T.i32 / T.f32 to follow project conventions - Replace arith.constant(..., index=True) with fx.Index(...) throughout
- Add 'bf16' to out_dtype parametrization (was only f16/f32) - Fix run_moe_stage2 to accept bf16 output dtype - Fix bytes_moved calculation to treat bf16 as 2-byte (like f16) The stage2 kernel (compile_moe_gemm2) already supports out_dtype='bf16' using bf16 global atomics on gfx94+/gfx95+, but the test harness blocked it. Verified all 24 new test cases pass on MI355X (gfx950).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist