-
Notifications
You must be signed in to change notification settings - Fork 221
[OMNIML-2244] Add E2E example for mixed precision quantization and ONNX export #656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #656 +/- ##
==========================================
- Coverage 74.50% 74.46% -0.05%
==========================================
Files 183 183
Lines 18400 18415 +15
==========================================
+ Hits 13709 13712 +3
- Misses 4691 4703 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…NX export Signed-off-by: ajrasane <[email protected]>
Signed-off-by: ajrasane <[email protected]>
Signed-off-by: ajrasane <[email protected]>
23e00a2 to
f403387
Compare
|
We should add perf and accuracy numbers for the baseline and quantized models in the README file as well. |
|
A basic query: Is onnx_ptq the right place of "PyTorch PTQ => ONNX export" examples? I was under the impression that onnx_ptq exemplifies PTQ techniques for input ONNX models. |
Right now, |
Signed-off-by: ajrasane <[email protected]>
0211848 to
9758f0c
Compare
|
You'll need to change test from |
9758f0c to
ee0d90f
Compare
| | :--- | :---: | :---: | | ||
| | Torch autocast (FP16) | 85.11% | 97.53% | | ||
| | NVFP4 Quantized | 84.558% | 97.36% | | ||
| | Auto Quantized (FP8 + NVFP4, 4.78 effective bits) | 84.726% | 97.434% | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVFP4 Quantized is NVFP4 + FP16-AutoCast? Same question for Auto Quantized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
gcunhase
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for dividing the examples into 2 folders.
I posted a couple more comments and we'd still need to add the perf numbers to show the accuracy-runtime trade-offs.
Approving for now.
ee0d90f to
11de7ce
Compare
Signed-off-by: ajrasane <[email protected]>
11de7ce to
6c9f7b1
Compare
| | :--- | :---: | :---: | | ||
| | Torch autocast (FP16) | 85.11% | 97.53% | | ||
| | NVFP4 Quantized | 84.558% | 97.36% | | ||
| | Auto Quantized (FP8 + NVFP4, 4.78 effective bits) | 84.726% | 97.434% | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just say 4.8 effective bits as this is what you set in the command above?
…NX export (NVIDIA#656) ## What does this PR do? **Type of change:** New Feature **Overview:** - Enable ONNX export for auto quantized models - Update documentation and changelog ## Usage <!-- You can potentially add a usage example below. --> ``` python torch_quant_to_onnx.py --quantize_mode=auto \ --onnx_save_path=./vit_base_patch16_224.nvfp4_fp8.onnx \ --calibration_data_size 64 \ --auto_quantization_formats NVFP4_AWQ_LITE_CFG FP8_DEFAULT_CFG \ --batch_size 128 ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ``` python evaluate.py --onnx_path=vit_base_patch16_224.nvfp4_fp8.onnx \ --model_name=vit_base_patch16_224 \ --results_path=./results.txt \ --batch_size 128 ``` Accuracy results ``` The top1 accuracy of the model is 84.15% The top5 accuracy of the model is 97.396% ``` Reference accuracy for fp16 ``` The top1 accuracy of the model is 85.102% The top5 accuracy of the model is 97.526% ``` ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> --------- Signed-off-by: ajrasane <[email protected]>
…NX export (NVIDIA#656) ## What does this PR do? **Type of change:** New Feature **Overview:** - Enable ONNX export for auto quantized models - Update documentation and changelog ## Usage <!-- You can potentially add a usage example below. --> ``` python torch_quant_to_onnx.py --quantize_mode=auto \ --onnx_save_path=./vit_base_patch16_224.nvfp4_fp8.onnx \ --calibration_data_size 64 \ --auto_quantization_formats NVFP4_AWQ_LITE_CFG FP8_DEFAULT_CFG \ --batch_size 128 ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ``` python evaluate.py --onnx_path=vit_base_patch16_224.nvfp4_fp8.onnx \ --model_name=vit_base_patch16_224 \ --results_path=./results.txt \ --batch_size 128 ``` Accuracy results ``` The top1 accuracy of the model is 84.15% The top5 accuracy of the model is 97.396% ``` Reference accuracy for fp16 ``` The top1 accuracy of the model is 85.102% The top5 accuracy of the model is 97.526% ``` ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> --------- Signed-off-by: ajrasane <[email protected]>
What does this PR do?
Type of change:
New Feature
Overview:
Usage
Testing
Accuracy results
Reference accuracy for fp16
Before your PR is "Ready for review"