[Enhancement] aglin online quantizaiton and offline quantization by haoyangli0109 · Pull Request #1365 · ROCm/ATOM

haoyangli0109 · 2026-06-26T06:49:28Z

The main difference in precision stems from the different round_mode settings used during scale calculations. Previously, we used Aiter’s mxfp4 kernel for online quantization, which uses round_up, whereas the offline Quark kernel uses round_even. In terms of precision, round_even is more advantageous.
This PR fixes this issue, and switching the kernel does not result in any additional overhead for online quantization.

Please note that you need to use gsm8kshot3 to reproduce the issue; gsm8kshot5 has nearly identical accuracy.

python3 -m atom.entrypoints.openai_server --model /shareddata/deepseek-ai/DeepSeek-R1-0528 \
  --enforce-eager -tp 8 \
  --port 5679 --server-port 7778 \
  --online_quant_config '{"global_quant_config":"ptpc_fp8","layer_quant_config":{"*expert*":"mxfp4"},"exclude_layer":["lm_head","*.gate.*"]}' \
  --method mtp --num-speculative-tokens 3 


lm_eval \
  --model local-completions \
  --model_args "model=/shareddata/deepseek-ai/DeepSeek-R1-0528,base_url=http://localhost:7778/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=32" \
  --tasks gsm8k \
  --num_fewshot 3 \
  --batch_size auto

before

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9333|±  |0.0069|
|     |       |strict-match    |     3|exact_match|↑  |0.9287|±  |0.0071|

modified:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9416|±  |0.0065|
|     |       |strict-match    |     3|exact_match|↑  |0.9401|±  |0.0065|

Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>

align off online quant

f2f8d07

Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>

haoyangli0109 force-pushed the lhy/align_off_online_q branch from 050f816 to f2f8d07 Compare June 26, 2026 07:29

haoyangli0109 marked this pull request as ready for review June 26, 2026 08:38

lihaoyang-amd closed this Jun 26, 2026

lihaoyang-amd reopened this Jun 26, 2026

zufayu requested a review from JiaoliangYu June 26, 2026 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] aglin online quantizaiton and offline quantization#1365

[Enhancement] aglin online quantizaiton and offline quantization#1365
haoyangli0109 wants to merge 1 commit into
ROCm:mainfrom
haoyangli0109:lhy/align_off_online_q

haoyangli0109 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

haoyangli0109 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haoyangli0109 commented Jun 26, 2026 •

edited

Loading