fix(models): fix MoE weight dangling reference and Qwen3-Omni model adapter compatibility by sunnyxiaohu · Pull Request #334 · Tencent/AngelSlim

sunnyxiaohu · 2026-06-08T08:00:52Z

Summary

Fix critical runtime issues in MoE expert weight handling and Qwen3-Omni model adapter that caused GPTQ quantization failures.

Changes

1. Fix MoE expert weight dangling reference (`angelslim/models/llm/qwen.py`)

After chunk() splits gate_up_proj into gate_proj and up_proj, the resulting tensors are views sharing the same underlying storage. When del self.gate_up_proj is executed subsequently, the storage is freed, leaving gate_proj and up_proj as dangling references pointing to invalid memory.

Fix: Call .clone() on all chunked/sliced weight tensors immediately after assignment to ensure each expert owns an independent copy before the source tensors are deleted.

2. Add missing `block_name` attribute to Qwen3-Omni (`angelslim/models/omni/qwen3_omni.py`)

The GPTQ quantization flow requires self.block_name to locate transformer blocks. Qwen3-Omni only defined thinker_block_name / talker_block_name but was missing the base block_name attribute, causing AttributeError during calibration.

Fix: Add self.block_name = "thinker.model.layers" to the constructor.

3. Remove incompatible `self.model.use_cache = False` (`angelslim/models/omni/qwen3_omni.py`)

The Qwen3-Omni model object does not expose a use_cache attribute at the top level (it is configured per-component). Setting it unconditionally raised AttributeError.

Fix: Remove the incompatible assignment from model_forward().

Files Changed

angelslim/models/llm/qwen.py — clone expert weights after chunk to prevent dangling references
angelslim/models/omni/qwen3_omni.py — add block_name attr; remove invalid use_cache assignment

…sues - Fix MoE expert weight dangling reference: clone() after chunk() to avoid invalid memory access when gate_up_proj/down_proj are deleted - Add missing block_name attribute to Qwen3-Omni (required by GPTQ flow) - Remove incompatible self.model.use_cache = False in model_forward()

ali-88123 approved these changes Jun 9, 2026

View reviewed changes

ali-88123 merged commit 13ccafd into Tencent:main Jun 9, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): fix MoE weight dangling reference and Qwen3-Omni model adapter compatibility#334

fix(models): fix MoE weight dangling reference and Qwen3-Omni model adapter compatibility#334
ali-88123 merged 1 commit into
Tencent:mainfrom
sunnyxiaohu:fix/model-adapter-compat

sunnyxiaohu commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sunnyxiaohu commented Jun 8, 2026

Summary

Changes

1. Fix MoE expert weight dangling reference (angelslim/models/llm/qwen.py)

2. Add missing block_name attribute to Qwen3-Omni (angelslim/models/omni/qwen3_omni.py)

3. Remove incompatible self.model.use_cache = False (angelslim/models/omni/qwen3_omni.py)

Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Fix MoE expert weight dangling reference (`angelslim/models/llm/qwen.py`)

2. Add missing `block_name` attribute to Qwen3-Omni (`angelslim/models/omni/qwen3_omni.py`)

3. Remove incompatible `self.model.use_cache = False` (`angelslim/models/omni/qwen3_omni.py`)