Support KIMI K2 Thinking int4 checkpoint PTQ #669

cjluo-nv · 2025-12-09T07:45:28Z

What does this PR do?

Type of change: ? new feature

Overview:

Support KIMI K2 Thinking PTQ from the original int4 checkpoint.
Tested with transformers 4.48

The model weights are dequantized on the fly to save GPU memory

Usage

scripts/huggingface_example.sh --model --quant nvfp4_mlp_only --trust_remote_code

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Signed-off-by: Chenjie Luo <[email protected]>

copy-pr-bot · 2025-12-09T07:45:32Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Edwardf0t1

Is the ckpt generated identical to @jingyu-ml previously generated nvfp4 ckpt?

Edwardf0t1 · 2025-12-09T07:50:27Z

modelopt/torch/quantization/plugins/huggingface.py

    pass

+try:
+    from compressed_tensors.linear.compressed_linear import CompressedLinear


Should we add compressed-tensor as an optional dependency?

@kevalmorabia97 @realAsma what do you think?

If a user is quantizing a model with CompressedLinear, wouldn't they already have compressed-tensors pre-installed? What benefit do we have by having it added as an optional dependency?

compressed-tensors's main dependencies are torch and transformers so should be pretty lightweight to add as a dependency so fine if you want to add. But if its not commonly used by customers, perhaps we can skip it

Can we move this to a seperate file modelopt/torch/quantization/plugins/compressed_tensor.py?

If a user is quantizing a model with CompressedLinear, wouldn't they already have compressed-tensors pre-installed?

This is a good point. +1
Are we planning to have any unit tests for compressed tensor integration?

not right now

Can we move this to a seperate file modelopt/torch/quantization/plugins/compressed_tensor.py?

How strong do you feel about it? Right now I feel this still fall under hf plugins as it's part of the HF's invocation.

Signed-off-by: Chenjie Luo <[email protected]>

examples/llm_ptq/example_utils.py

Signed-off-by: Chenjie Luo <[email protected]>

Edwardf0t1 · 2025-12-11T01:18:42Z

@cjluo-nv Did we run any deployment and accuracy test for the ckpt generated with this flow to make sure it's correct? Asking because there's a customer who wants to generate the ckpt by themselves.

In addition, I heard from @jingyu-ml that we need to modify modeling_deepseek.py to enable our PTQ flow.

Support KIMI K2 Thinking int4 checkpoint PTQ

998856b

Signed-off-by: Chenjie Luo <[email protected]>

Edwardf0t1 reviewed Dec 9, 2025

View reviewed changes

cjluo-nv requested review from jingyu-ml, kevalmorabia97 and realAsma December 9, 2025 17:27

Fix

3aebac7

Signed-off-by: Chenjie Luo <[email protected]>

cjluo-nv marked this pull request as ready for review December 9, 2025 17:29

cjluo-nv requested review from a team as code owners December 9, 2025 17:29

cjluo-nv requested a review from meenchen December 9, 2025 17:29

meenchen reviewed Dec 9, 2025

View reviewed changes

examples/llm_ptq/example_utils.py Outdated Show resolved Hide resolved

Fix

95ee275

Signed-off-by: Chenjie Luo <[email protected]>

cjluo-nv requested a review from meenchen December 10, 2025 00:12

meenchen approved these changes Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support KIMI K2 Thinking int4 checkpoint PTQ #669

Support KIMI K2 Thinking int4 checkpoint PTQ #669

Uh oh!

cjluo-nv commented Dec 9, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Dec 9, 2025

Uh oh!

Edwardf0t1 left a comment

Uh oh!

Edwardf0t1 Dec 9, 2025

Uh oh!

cjluo-nv Dec 9, 2025

Uh oh!

kevalmorabia97 Dec 9, 2025

Uh oh!

kevalmorabia97 Dec 9, 2025

Uh oh!

realAsma Dec 9, 2025

Uh oh!

realAsma Dec 9, 2025

Uh oh!

cjluo-nv Dec 10, 2025

Uh oh!

cjluo-nv Dec 10, 2025

Uh oh!

Uh oh!

Edwardf0t1 commented Dec 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Support KIMI K2 Thinking int4 checkpoint PTQ #669

Are you sure you want to change the base?

Support KIMI K2 Thinking int4 checkpoint PTQ #669

Uh oh!

Conversation

cjluo-nv commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Dec 9, 2025

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Edwardf0t1 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cjluo-nv commented Dec 9, 2025 •

edited

Loading

Edwardf0t1 commented Dec 11, 2025 •

edited

Loading