fix(spin): fix tied embedding corruption in silent_run() and unconditional untie/fuse in run() by sunnyxiaohu · Pull Request #319 · Tencent/AngelSlim

sunnyxiaohu · 2026-05-28T03:19:46Z

Summary

Fix SpinQuant rotation transform corrupting the embedding lookup table when tie_word_embeddings=True, affecting all models with tied embeddings (e.g., Qwen3, LLaMA-3).

Problems

silent_run() corrupts tied embeddings — silent_run() calls _apply_fused_ln() without first untying word embeddings. When lm_head.weight and embed_tokens.weight share the same underlying tensor (tied), fuse_ln_linear() modifies both simultaneously, corrupting the embedding lookup table.
run() gates untie/fuse inside if "R1" branch — _untie_word_embeddings() and _apply_fused_ln() are incorrectly placed inside the if "R1" in self.spin_config.rotation conditional block. Models using only R2 rotation skip these critical steps, leading to either tied embedding corruption or missing norm fusion.

Changes

File	Fix
`spin.py`	Move `_untie_word_embeddings()` to the top of both `run()` and `silent_run()`, unconditionally before any fuse/rotation operations
`spin.py`	Move `_apply_fused_ln()` out of the `if "R1"` branch to execute unconditionally
`spin.py`	Remove redundant `_untie_word_embeddings()` call inside `_apply_fused_ln()`

Root Cause

When tie_word_embeddings=True, lm_head.weight is a reference (not a copy) to embed_tokens.weight. Any in-place modification to one affects the other. The fused layer norm operation scales lm_head.weight in-place, which simultaneously corrupts embed_tokens.weight, causing garbage token embeddings during inference.

…ional untie/fuse in run() - Move _untie_word_embeddings() to unconditional top of silent_run() (was missing entirely) - Move _untie_word_embeddings() and _apply_fused_ln() outside 'if R1' condition in run() - Remove duplicate _untie_word_embeddings() call from _apply_fused_ln() This fixes embedding lookup table corruption when tie_word_embeddings=True (affects Qwen3 and all models with tied embeddings).

gavingavin99 · 2026-05-28T03:53:37Z

In Spinquant, only the R1 section should require the fuse layernorm operation, which is why untie_embedding needs to be executed. Verification has shown that fuse layernorm can negatively impact model performance to some extent (due to the use of low-precision weights), so only untie_embedding and fuse layernorm are placed in the R1 branch.

sunnyxiaohu · 2026-06-09T04:02:57Z

@gavingavin99 That's true — the precision loss doesn't only come from fused LayerNorm, but also from other rotation matrix multiplications. For the Qwen3-Omni model, we experimented with SpinQuant (R1, R2, R4) + GPTQ, and compared to GPTQ alone, the performance actually degraded significantly. Do you have any suggestions?

gavingavin99 · 2026-06-10T01:52:10Z

@sunnyxiaohu Could you show us some relevant test result? If you are using rotation alone, it is recommended to disable R1 rotation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(spin): fix tied embedding corruption in silent_run() and unconditional untie/fuse in run()#319

fix(spin): fix tied embedding corruption in silent_run() and unconditional untie/fuse in run()#319
sunnyxiaohu wants to merge 1 commit into
Tencent:mainfrom
sunnyxiaohu:fix/spinquant-tied-embedding

sunnyxiaohu commented May 28, 2026

Uh oh!

gavingavin99 commented May 28, 2026

Uh oh!

sunnyxiaohu commented Jun 9, 2026

Uh oh!

gavingavin99 commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sunnyxiaohu commented May 28, 2026

Summary

Problems

Changes

Root Cause

Uh oh!

gavingavin99 commented May 28, 2026

Uh oh!

sunnyxiaohu commented Jun 9, 2026

Uh oh!

gavingavin99 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gavingavin99 commented Jun 10, 2026 •

edited

Loading