You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are cases where we want to online quant the quark models for some specific modules. For example, quant the bf16 attn linear layers to PTPC fp8 for https://huggingface.co/amd/MiniMax-M3-MXFP4, which is already quark quant model.
Hi, @gbyu-amd
It makes sense that bf16 runs successfully, since it doesn’t require dequantization of the weights.
Quark supports multiple quantization formats, and completely opening up the Quark options could pose risks. For example, if we try to perform mxfp4 quantization on a ptpc_fp8 model, problems will arise.
Merging this PR carries some risk, but as long as you confirm there won’t be any misuse, I believe it can be merged.
Based on this PR, we’ll submit another PR to handle online quantization of Quark models in common scenarios next week.
Hi, @gbyu-amd It makes sense that bf16 runs successfully, since it doesn’t require dequantization of the weights. Quark supports multiple quantization formats, and completely opening up the Quark options could pose risks. For example, if we try to perform mxfp4 quantization on a ptpc_fp8 model, problems will arise.
Merging this PR carries some risk, but as long as you confirm there won’t be any misuse, I believe it can be merged.
Based on this PR, we’ll submit another PR to handle online quantization of Quark models in common scenarios next week.
Thanks @haoyangli0109 , it would be great if you could support more general cases regarding quark models.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
There are cases where we want to online quant the quark models for some specific modules. For example, quant the bf16 attn linear layers to PTPC fp8 for https://huggingface.co/amd/MiniMax-M3-MXFP4, which is already quark quant model.
Technical Details
Test Plan
Test Result
Submission Checklist