Skip to content

support online quant for quark models#1370

Merged
valarLip merged 3 commits into
mainfrom
guanbao/m3_fp4_quant
Jun 26, 2026
Merged

support online quant for quark models#1370
valarLip merged 3 commits into
mainfrom
guanbao/m3_fp4_quant

Conversation

@gbyu-amd

Copy link
Copy Markdown
Contributor

Motivation

There are cases where we want to online quant the quark models for some specific modules. For example, quant the bf16 attn linear layers to PTPC fp8 for https://huggingface.co/amd/MiniMax-M3-MXFP4, which is already quark quant model.

image

Technical Details

Test Plan

Test Result

Submission Checklist

@gbyu-amd gbyu-amd requested a review from lihaoyang-amd June 26, 2026 09:59
@haoyangli0109

haoyangli0109 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Hi,
@gbyu-amd
It makes sense that bf16 runs successfully, since it doesn’t require dequantization of the weights.
Quark supports multiple quantization formats, and completely opening up the Quark options could pose risks. For example, if we try to perform mxfp4 quantization on a ptpc_fp8 model, problems will arise.

Merging this PR carries some risk, but as long as you confirm there won’t be any misuse, I believe it can be merged.

Based on this PR, we’ll submit another PR to handle online quantization of Quark models in common scenarios next week.

@gbyu-amd

Copy link
Copy Markdown
Contributor Author

Hi, @gbyu-amd It makes sense that bf16 runs successfully, since it doesn’t require dequantization of the weights. Quark supports multiple quantization formats, and completely opening up the Quark options could pose risks. For example, if we try to perform mxfp4 quantization on a ptpc_fp8 model, problems will arise.

Merging this PR carries some risk, but as long as you confirm there won’t be any misuse, I believe it can be merged.

Based on this PR, we’ll submit another PR to handle online quantization of Quark models in common scenarios next week.

Thanks @haoyangli0109 , it would be great if you could support more general cases regarding quark models.

@valarLip valarLip merged commit e97d631 into main Jun 26, 2026
25 of 33 checks passed
@valarLip valarLip deleted the guanbao/m3_fp4_quant branch June 26, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants