Skip to content

[Enhancement] aglin online quantizaiton and offline quantization#1365

Open
haoyangli0109 wants to merge 1 commit into
ROCm:mainfrom
haoyangli0109:lhy/align_off_online_q
Open

[Enhancement] aglin online quantizaiton and offline quantization#1365
haoyangli0109 wants to merge 1 commit into
ROCm:mainfrom
haoyangli0109:lhy/align_off_online_q

Conversation

@haoyangli0109

@haoyangli0109 haoyangli0109 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

The main difference in precision stems from the different round_mode settings used during scale calculations. Previously, we used Aiter’s mxfp4 kernel for online quantization, which uses round_up, whereas the offline Quark kernel uses round_even. In terms of precision, round_even is more advantageous.
This PR fixes this issue, and switching the kernel does not result in any additional overhead for online quantization.

Please note that you need to use gsm8kshot3 to reproduce the issue; gsm8kshot5 has nearly identical accuracy.

python3 -m atom.entrypoints.openai_server --model /shareddata/deepseek-ai/DeepSeek-R1-0528 \
  --enforce-eager -tp 8 \
  --port 5679 --server-port 7778 \
  --online_quant_config '{"global_quant_config":"ptpc_fp8","layer_quant_config":{"*expert*":"mxfp4"},"exclude_layer":["lm_head","*.gate.*"]}' \
  --method mtp --num-speculative-tokens 3 


lm_eval \
  --model local-completions \
  --model_args "model=/shareddata/deepseek-ai/DeepSeek-R1-0528,base_url=http://localhost:7778/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=32" \
  --tasks gsm8k \
  --num_fewshot 3 \
  --batch_size auto

before

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9333|±  |0.0069|
|     |       |strict-match    |     3|exact_match|↑  |0.9287|±  |0.0071|

modified:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9416|±  |0.0065|
|     |       |strict-match    |     3|exact_match|↑  |0.9401|±  |0.0065|

Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
@haoyangli0109 haoyangli0109 force-pushed the lhy/align_off_online_q branch from 050f816 to f2f8d07 Compare June 26, 2026 07:29
@haoyangli0109 haoyangli0109 marked this pull request as ready for review June 26, 2026 08:38
@lihaoyang-amd lihaoyang-amd reopened this Jun 26, 2026
@zufayu zufayu requested a review from JiaoliangYu June 26, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants