Skip to content

Conversation

@shengliangxu
Copy link
Contributor

What does this PR do?

Refactor and clean up hf_ptq.py

This script has several separate logic and the code of them are entangled, making it really hard to add new features

Refactor them so that we separate these logics:

  1. sparsity, all logic go to sparsity_main. TODO: we may actually move this logic out to a separate script

  2. quantize, all logic go to quantize_main.

    2.1 plain quantization with a single quantization format

    2.2 auto quantization

In the quantization pipeline, separate the pipeline to:

  1. model loading
  2. calibrate dataset loading
  3. pre-quantize processing
  4. actual quantize
  5. post-quantize processing
  6. quantized model export

Testing

tested the plain quantization:

python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path=Qwen/Qwen3-8B \
    --export_path=qwen3-8B_fp8 \
    --qformat=fp8 \
    --kv_cache_qformat=fp8 \
    --calib_size=16 \
    --batch_size=0 \
    --trust_remote_code \
    --export_fmt=hf

tested auto quantize:

python examples/llm_ptq/hf_ptq.py \
    --qformat=nvfp4,fp8 \
    --auto_quantize_score_size 128 \
    --auto_quantize_bits 5.0 \
    --auto_quantize_checkpoint Qwen3-8B-auto-quantize-checkpoint \
    --pyt_ckpt_path=Qwen/Qwen3-8B \
    --export_path=qwen3-8B_auto_quantize \
    --kv_cache_qformat=fp8 \
    --calib_size=16 \
    --batch_size=0 \
    --trust_remote_code \
    --export_fmt=hf

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 8, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@shengliangxu shengliangxu force-pushed the shengliangx/hf_ptq_refactor_cleanup branch from 832fb13 to 070ae87 Compare December 8, 2025 21:01
This script has several separate logic and the code of them are
entangled, making it really hard to add new features

Refactor them so that we separate these logics:

1. sparsity, all logic go to sparsity_main. TODO: we may actually move this
   logic out to a separate script

2. quantize, all logic go to quantize_main.

   2.1 plain quantization with a single quantization format

   2.2 auto quantization

In the quantization pipeline, separate the pipeline to:

1. model loading
2. calibrate dataset loading
3. pre-quantize processing
4. actual quantize
5. post-quantize processing
6. quantized model export

Signed-off-by: Shengliang Xu <[email protected]>
@shengliangxu shengliangxu force-pushed the shengliangx/hf_ptq_refactor_cleanup branch from 070ae87 to a89625b Compare December 8, 2025 22:05
@shengliangxu shengliangxu marked this pull request as ready for review December 8, 2025 22:11
@shengliangxu shengliangxu requested review from a team as code owners December 8, 2025 22:11
@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.45%. Comparing base (f265f8d) to head (e15d632).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #665   +/-   ##
=======================================
  Coverage   74.45%   74.45%           
=======================================
  Files         183      183           
  Lines       18412    18412           
=======================================
  Hits        13709    13709           
  Misses       4703     4703           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shengliangxu shengliangxu self-assigned this Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants