Skip to content

Conversation

@soodoshll
Copy link

What does this PR do?

Type of change: ? new feature

Overview: ? huggingface transformers library implements Qwen3VL Moe layer as a monolithic module, instead of assembling it using Linear layers, which cannot be recognized by modelopt's quantizer now. This PR introduces a conversion from hf's qwen3vl_moe MoE layers to qewn3_moe MoE layers which consist of a set of Linear layers.

Testing

Tested with

python hf_ptq.py --pyt_ckpt_path=Qwen/Qwen3-VL-30B-A3B-Instruct --qformat=nvfp4 --dataset wikipedia

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@soodoshll soodoshll requested a review from a team as a code owner December 8, 2025 19:32
@soodoshll soodoshll requested a review from ajrasane December 8, 2025 19:32
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 8, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shengliangxu shengliangxu self-requested a review December 8, 2025 19:33
@soodoshll soodoshll requested review from a team as code owners December 8, 2025 19:58
@soodoshll soodoshll requested a review from ynankani December 8, 2025 19:58
@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.46%. Comparing base (5a4242f) to head (e0d2121).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #666      +/-   ##
==========================================
- Coverage   74.52%   74.46%   -0.06%     
==========================================
  Files         183      183              
  Lines       18400    18410      +10     
==========================================
- Hits        13712    13709       -3     
- Misses       4688     4701      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Qwen3VLMoeTextSparseMoeBlock,
)

if not isinstance(self.mlp, Qwen3VLMoeTextSparseMoeBlock):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of whether we want to directly add support to Qwen3VLMoeTextSparseMoeBlock, it feels a bit fragile to use a replacement module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants