fix(models): fix MoE weight dangling reference and Qwen3-Omni model adapter compatibility#334
Merged
Conversation
…sues - Fix MoE expert weight dangling reference: clone() after chunk() to avoid invalid memory access when gate_up_proj/down_proj are deleted - Add missing block_name attribute to Qwen3-Omni (required by GPTQ flow) - Remove incompatible self.model.use_cache = False in model_forward()
ali-88123
approved these changes
Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix critical runtime issues in MoE expert weight handling and Qwen3-Omni model adapter that caused GPTQ quantization failures.
Changes
1. Fix MoE expert weight dangling reference (
angelslim/models/llm/qwen.py)After
chunk()splitsgate_up_projintogate_projandup_proj, the resulting tensors are views sharing the same underlying storage. Whendel self.gate_up_projis executed subsequently, the storage is freed, leavinggate_projandup_projas dangling references pointing to invalid memory.Fix: Call
.clone()on all chunked/sliced weight tensors immediately after assignment to ensure each expert owns an independent copy before the source tensors are deleted.2. Add missing
block_nameattribute to Qwen3-Omni (angelslim/models/omni/qwen3_omni.py)The GPTQ quantization flow requires
self.block_nameto locate transformer blocks. Qwen3-Omni only definedthinker_block_name/talker_block_namebut was missing the baseblock_nameattribute, causingAttributeErrorduring calibration.Fix: Add
self.block_name = "thinker.model.layers"to the constructor.3. Remove incompatible
self.model.use_cache = False(angelslim/models/omni/qwen3_omni.py)The Qwen3-Omni model object does not expose a
use_cacheattribute at the top level (it is configured per-component). Setting it unconditionally raisedAttributeError.Fix: Remove the incompatible assignment from
model_forward().Files Changed
angelslim/models/llm/qwen.py— clone expert weights after chunk to prevent dangling referencesangelslim/models/omni/qwen3_omni.py— addblock_nameattr; remove invaliduse_cacheassignment