Skip to content

Conversation

@cjluo-nv
Copy link
Collaborator

@cjluo-nv cjluo-nv commented Dec 9, 2025

What does this PR do?

Type of change: ? new feature

Overview:

Support KIMI K2 Thinking PTQ from the original int4 checkpoint.
Tested with transformers 4.48

The model weights are dequantized on the fly to save GPU memory

Usage

scripts/huggingface_example.sh --model --quant nvfp4_mlp_only --trust_remote_code

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 9, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Contributor

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ckpt generated identical to @jingyu-ml previously generated nvfp4 ckpt?

pass

try:
from compressed_tensors.linear.compressed_linear import CompressedLinear
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add compressed-tensor as an optional dependency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevalmorabia97 @realAsma what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user is quantizing a model with CompressedLinear, wouldn't they already have compressed-tensors pre-installed? What benefit do we have by having it added as an optional dependency?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compressed-tensors's main dependencies are torch and transformers so should be pretty lightweight to add as a dependency so fine if you want to add. But if its not commonly used by customers, perhaps we can skip it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to a seperate file modelopt/torch/quantization/plugins/compressed_tensor.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user is quantizing a model with CompressedLinear, wouldn't they already have compressed-tensors pre-installed?

This is a good point. +1
Are we planning to have any unit tests for compressed tensor integration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not right now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to a seperate file modelopt/torch/quantization/plugins/compressed_tensor.py?

How strong do you feel about it? Right now I feel this still fall under hf plugins as it's part of the HF's invocation.

Signed-off-by: Chenjie Luo <[email protected]>
@cjluo-nv cjluo-nv marked this pull request as ready for review December 9, 2025 17:29
@cjluo-nv cjluo-nv requested review from a team as code owners December 9, 2025 17:29
@cjluo-nv cjluo-nv requested a review from meenchen December 9, 2025 17:29
Signed-off-by: Chenjie Luo <[email protected]>
@cjluo-nv cjluo-nv requested a review from meenchen December 10, 2025 00:12
@Edwardf0t1
Copy link
Contributor

Edwardf0t1 commented Dec 11, 2025

@cjluo-nv Did we run any deployment and accuracy test for the ckpt generated with this flow to make sure it's correct? Asking because there's a customer who wants to generate the ckpt by themselves.

In addition, I heard from @jingyu-ml that we need to modify modeling_deepseek.py to enable our PTQ flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants