[OMNIML-2244] Add E2E example for mixed precision quantization and ONNX export #656

ajrasane · 2025-12-05T22:38:18Z

What does this PR do?

Type of change:
New Feature

Overview:

Enable ONNX export for auto quantized models
Update documentation and changelog

Usage

python torch_quant_to_onnx.py --quantize_mode=auto \
	--onnx_save_path=./vit_base_patch16_224.nvfp4_fp8.onnx \
	--calibration_data_size 64 \
	--auto_quantization_formats NVFP4_AWQ_LITE_CFG FP8_DEFAULT_CFG \
	--batch_size 128

Testing

python evaluate.py --onnx_path=vit_base_patch16_224.nvfp4_fp8.onnx \
	--model_name=vit_base_patch16_224 \
	--results_path=./results.txt \
	--batch_size 128

Accuracy results

The top1 accuracy of the model is 84.15%
The top5 accuracy of the model is 97.396%

Reference accuracy for fp16

The top1 accuracy of the model is 85.102%
The top5 accuracy of the model is 97.526%

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: Yes

codecov · 2025-12-05T22:49:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.46%. Comparing base (53a2dde) to head (6c9f7b1).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #656      +/-   ##
==========================================
- Coverage   74.50%   74.46%   -0.05%     
==========================================
  Files         183      183              
  Lines       18400    18415      +15     
==========================================
+ Hits        13709    13712       +3     
- Misses       4691     4703      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…NX export Signed-off-by: ajrasane <[email protected]>

Signed-off-by: ajrasane <[email protected]>

examples/onnx_ptq/README.md

tests/examples/torch_onnx/test_torch_quant_to_onnx.py

examples/onnx_ptq/README.md

gcunhase · 2025-12-09T01:49:09Z

We should add perf and accuracy numbers for the baseline and quantized models in the README file as well.

vishalpandya1990 · 2025-12-09T05:05:50Z

A basic query: Is onnx_ptq the right place of "PyTorch PTQ => ONNX export" examples? I was under the impression that onnx_ptq exemplifies PTQ techniques for input ONNX models.

gcunhase · 2025-12-09T14:47:13Z

A basic query: Is onnx_ptq the right place of "PyTorch PTQ => ONNX export" examples? I was under the impression that onnx_ptq exemplifies PTQ techniques for input ONNX models.

Right now, onnx_ptq includes PyTorch PTQ => ONNX examples as well, but I agree that it might be better to separate those 2 workflows. WDYT @ajrasane?

Signed-off-by: ajrasane <[email protected]>

examples/torch_onnx/README.md

kevalmorabia97 · 2025-12-10T05:42:10Z

You'll need to change test from onnx_ptq to torch_onnx as well in https://github.com/NVIDIA/Model-Optimizer/blob/main/.github/workflows/example_tests.yml#L119-L147 since we dont have any other onnx_ptq test remaining after this change

gcunhase · 2025-12-10T23:16:54Z

examples/torch_onnx/README.md

+| :--- | :---: | :---: |
+| Torch autocast (FP16) | 85.11% | 97.53% |
+| NVFP4 Quantized | 84.558% | 97.36% |
+| Auto Quantized (FP8 + NVFP4, 4.78 effective bits) | 84.726% | 97.434% |


NVFP4 Quantized is NVFP4 + FP16-AutoCast? Same question for Auto Quantized.

gcunhase

Thanks for dividing the examples into 2 folders.

I posted a couple more comments and we'd still need to add the perf numbers to show the accuracy-runtime trade-offs.

Approving for now.

examples/torch_onnx/requirements.txt

Signed-off-by: ajrasane <[email protected]>

cjluo-nv · 2025-12-12T19:49:55Z

examples/torch_onnx/README.md

+| :--- | :---: | :---: |
+| Torch autocast (FP16) | 85.11% | 97.53% |
+| NVFP4 Quantized | 84.558% | 97.36% |
+| Auto Quantized (FP8 + NVFP4, 4.78 effective bits) | 84.726% | 97.434% |


can we just say 4.8 effective bits as this is what you set in the command above?

…NX export (NVIDIA#656) ## What does this PR do? **Type of change:** New Feature **Overview:** - Enable ONNX export for auto quantized models - Update documentation and changelog ## Usage  ``` python torch_quant_to_onnx.py --quantize_mode=auto \ --onnx_save_path=./vit_base_patch16_224.nvfp4_fp8.onnx \ --calibration_data_size 64 \ --auto_quantization_formats NVFP4_AWQ_LITE_CFG FP8_DEFAULT_CFG \ --batch_size 128 ``` ## Testing  ``` python evaluate.py --onnx_path=vit_base_patch16_224.nvfp4_fp8.onnx \ --model_name=vit_base_patch16_224 \ --results_path=./results.txt \ --batch_size 128 ``` Accuracy results ``` The top1 accuracy of the model is 84.15% The top5 accuracy of the model is 97.396% ``` Reference accuracy for fp16 ``` The top1 accuracy of the model is 85.102% The top5 accuracy of the model is 97.526% ``` ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes  --------- Signed-off-by: ajrasane <[email protected]>

ajrasane requested review from a team as code owners December 5, 2025 22:38

ajrasane requested review from cjluo-nv and vishalpandya1990 December 5, 2025 22:38

ajrasane added 3 commits December 8, 2025 21:06

[OMNIML-2244] Add E2E example for mixed precision quantization and ON…

3d8567e

…NX export Signed-off-by: ajrasane <[email protected]>

Add weight compression for FP8

eaa518c

Signed-off-by: ajrasane <[email protected]>

Update tests for auto,fp8,int8

f403387

Signed-off-by: ajrasane <[email protected]>

ajrasane force-pushed the ajrasane/mixed_precision_e2e branch from 23e00a2 to f403387 Compare December 8, 2025 21:07

cjluo-nv reviewed Dec 8, 2025

View reviewed changes

examples/onnx_ptq/README.md Outdated Show resolved Hide resolved

cjluo-nv reviewed Dec 8, 2025

View reviewed changes

tests/examples/torch_onnx/test_torch_quant_to_onnx.py Show resolved Hide resolved

cjluo-nv reviewed Dec 8, 2025

View reviewed changes

examples/onnx_ptq/README.md Outdated Show resolved Hide resolved

Update documentation

3c2b27d

Signed-off-by: ajrasane <[email protected]>

ajrasane requested a review from a team as a code owner December 9, 2025 23:54

ajrasane requested a review from kevalmorabia97 December 9, 2025 23:54

ajrasane force-pushed the ajrasane/mixed_precision_e2e branch from 0211848 to 9758f0c Compare December 10, 2025 00:00

ajrasane requested a review from a team as a code owner December 10, 2025 00:00

kevalmorabia97 reviewed Dec 10, 2025

View reviewed changes

examples/torch_onnx/README.md Outdated Show resolved Hide resolved

kevalmorabia97 removed the request for review from vishalpandya1990 December 10, 2025 05:43

ajrasane force-pushed the ajrasane/mixed_precision_e2e branch from 9758f0c to ee0d90f Compare December 10, 2025 21:26

gcunhase reviewed Dec 10, 2025

View reviewed changes

gcunhase approved these changes Dec 10, 2025

View reviewed changes

kevalmorabia97 reviewed Dec 11, 2025

View reviewed changes

examples/torch_onnx/requirements.txt Outdated Show resolved Hide resolved

kevalmorabia97 approved these changes Dec 11, 2025

View reviewed changes

ajrasane force-pushed the ajrasane/mixed_precision_e2e branch from ee0d90f to 11de7ce Compare December 11, 2025 06:13

ajrasane enabled auto-merge (squash) December 11, 2025 06:14

Create a new folder for torch_onnx examples

6c9f7b1

Signed-off-by: ajrasane <[email protected]>

ajrasane force-pushed the ajrasane/mixed_precision_e2e branch from 11de7ce to 6c9f7b1 Compare December 11, 2025 06:22

cjluo-nv reviewed Dec 12, 2025

View reviewed changes

cjluo-nv approved these changes Dec 12, 2025

View reviewed changes

ajrasane merged commit 51c3614 into main Dec 12, 2025
36 checks passed

ajrasane deleted the ajrasane/mixed_precision_e2e branch December 12, 2025 19:50

[OMNIML-2244] Add E2E example for mixed precision quantization and ONNX export #656

[OMNIML-2244] Add E2E example for mixed precision quantization and ONNX export #656

Uh oh!

Conversation

ajrasane commented Dec 5, 2025

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Uh oh!

codecov bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gcunhase commented Dec 9, 2025

Uh oh!

vishalpandya1990 commented Dec 9, 2025

Uh oh!

gcunhase commented Dec 9, 2025

Uh oh!

Uh oh!

kevalmorabia97 commented Dec 10, 2025

Uh oh!

gcunhase Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

ajrasane Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gcunhase left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjluo-nv Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov bot commented Dec 5, 2025 •

edited

Loading