-
Notifications
You must be signed in to change notification settings - Fork 212
Support KIMI K2 Thinking int4 checkpoint PTQ #669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Chenjie Luo <[email protected]>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Edwardf0t1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ckpt generated identical to @jingyu-ml previously generated nvfp4 ckpt?
| pass | ||
|
|
||
| try: | ||
| from compressed_tensors.linear.compressed_linear import CompressedLinear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add compressed-tensor as an optional dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevalmorabia97 @realAsma what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user is quantizing a model with CompressedLinear, wouldn't they already have compressed-tensors pre-installed? What benefit do we have by having it added as an optional dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compressed-tensors's main dependencies are torch and transformers so should be pretty lightweight to add as a dependency so fine if you want to add. But if its not commonly used by customers, perhaps we can skip it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to a seperate file modelopt/torch/quantization/plugins/compressed_tensor.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user is quantizing a model with CompressedLinear, wouldn't they already have compressed-tensors pre-installed?
This is a good point. +1
Are we planning to have any unit tests for compressed tensor integration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to a seperate file
modelopt/torch/quantization/plugins/compressed_tensor.py?
How strong do you feel about it? Right now I feel this still fall under hf plugins as it's part of the HF's invocation.
Signed-off-by: Chenjie Luo <[email protected]>
Signed-off-by: Chenjie Luo <[email protected]>
|
@cjluo-nv Did we run any deployment and accuracy test for the ckpt generated with this flow to make sure it's correct? Asking because there's a customer who wants to generate the ckpt by themselves. In addition, I heard from @jingyu-ml that we need to modify modeling_deepseek.py to enable our PTQ flow. |
What does this PR do?
Type of change: ? new feature
Overview:
Support KIMI K2 Thinking PTQ from the original int4 checkpoint.
Tested with transformers 4.48
The model weights are dequantized on the fly to save GPU memory
Usage
scripts/huggingface_example.sh --model --quant nvfp4_mlp_only --trust_remote_code
Before your PR is "Ready for review"
Additional Information