[ROCm] Allow mixed (F8E4M3FNUZ, F8E5M2FNUZ) in Triton dot gate#897
Open
Ruturaj4 wants to merge 1 commit into
Open
[ROCm] Allow mixed (F8E4M3FNUZ, F8E5M2FNUZ) in Triton dot gate#897Ruturaj4 wants to merge 1 commit into
Ruturaj4 wants to merge 1 commit into
Conversation
076f8b3 to
30fd358
Compare
|
|
||
| EXPECT_TRUE(IsTritonSupportedInstruction( | ||
| ti.Instruction(), | ||
| se::GpuComputeCapability(se::RocmComputeCapability("gfx950")))); |
Collaborator
There was a problem hiding this comment.
fnuz is not natively supported on gfx950. Why does this pass?
Author
There was a problem hiding this comment.
Good catch. The test doesn't run anything on the GPU, but just checks whether the compiler's support predicate accepts the dot. FNUZ is allowed on any ROCm target (pre-existing behavior; when there's no native FNUZ MFMA it just upcasts to f16), so it passes regardless of the chip. But regardless gfx950 was a confusing thing to assert on, so I moved it to gfx942 where FNUZ is native and which is what the ticket's about.
ed67d54 to
254a2a7
Compare
The mixed-FP8 carve-out in IsTritonSupportedDot only listed the OCP pair (F8E5M2, F8E4M3FN). The ROCm-native FNUZ pair was rejected even though the rest of the file already accepts FNUZ FP8 inputs on ROCm. This blocks TransformerEngine FP8 GEMM on MI300 (gfx94X), which lowers dgrad to dot_general(F8E4M3FNUZ, F8E5M2FNUZ) and gets routed to a __triton_nested_gemm_fusion. The gate then refuses it at codegen time with "INTERNAL: ... Dot operation only supports same types for lhs and rhs." Add a focused support test asserting the FNUZ mixed pair is accepted on ROCm and rejected on CUDA via RunSupportTest's dual-contract check. Addresses TODO(b/393299275).
254a2a7 to
b3c6b4a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The mixed-FP8 in
IsTritonSupportedDotonly listed the OCP pair (F8E5M2,F8E4M3FN). The ROCm-native FNUZ pair was rejected even though the rest of the file already accepts FNUZ FP8 inputs on ROCm.This blocks TransformerEngine FP8 GEMM on MI300 (gfx94X), which lowers dgrad to dot_general(F8E4M3FNUZ, F8E5M2FNUZ) and gets routed to a
__triton_nested_gemm_fusion. The gate then refuses it at codegen time with "INTERNAL: ... Dot operation only supports same types for lhs and rhs."Mirror the existing OCP allowance under gpu_version.IsRocm() so the FNUZ pair passes the same check.
Submission Checklist