fix(gptq): fix Hessian computation, variable-length sequence support, and layer output type handling by sunnyxiaohu · Pull Request #318 · Tencent/AngelSlim

sunnyxiaohu · 2026-05-27T13:28:51Z

Summary

Fix critical bugs in the GPTQ quantization pipeline that cause incorrect quantization results, especially for MoE models and variable-length calibration data.

Problems

Hessian matrix corruption for MoE experts — add_batch() uses parameterless squeeze() which collapses [1, dim] to [dim] when an expert receives only 1 routed token, causing nsamples to be incorrectly accumulated as feature_dim instead of 1.
Variable-length sequence incompatibility — Catcher pre-allocates a fixed-size tensor [nsamples, seq_length, hidden_size], requiring all samples to have identical seq_len. Shorter sequences get zero-padded (introducing Hessian noise) and longer sequences are silently truncated.
Layer output type mismatch — Unconditional layer(...)[0] assumes tuple output, but some decoder layers return a plain tensor. [0] then incorrectly indexes the batch dimension.
ignore_layers exact match fails for MoE — Nested module names like mlp.experts.0.gate_proj cannot match the ignore pattern gate_proj with exact equality.
_make_quant AttributeError on non-standard Linear — Modules like TopKRouter lack in_features/out_features/bias attributes, causing crashes during weight replacement.
g_idx generation uses slow Python list comprehension — Replaced with vectorized tensor operations.

Changes

File	Fix
`gptq_module.py`	Remove dangerous `squeeze()`, fix `add_batch()` reshape logic, vectorize `g_idx`
`catcher.py`	Rewrite to dynamic list storage with per-sample kwargs and `max_seq_length` VRAM guard
`gptq.py`	Add `_extract_hidden_states()` helper, per-sample forward loop, substring `ignore_layers` matching
`helper_layer.py`	Use `getattr(linear, "bias", None)` for non-standard Linear modules

Testing

Verified on Qwen3-30B-A3B (MoE, variable expert routing)
Verified on standard dense models (Qwen3-4B)
No regression on existing quantization quality metrics

yghstill · 2026-06-01T08:28:28Z

@sunnyxiaohu
Please per-commit code formatting:

pip3 install pre-commit black isort flake8
cd AngelSlim
pre-commit install

… and output type handling - Fix add_batch() squeeze() bug that corrupts Hessian for MoE experts with single-token routing - Rewrite Catcher to use dynamic list storage, supporting variable-length sequences - Fix layer output type handling: use _extract_hidden_states() instead of unconditional [0] - Fix ignore_layers matching: use substring match for nested MoE module names - Fix _make_quant: support non-standard Linear modules lacking in_features/bias attributes - Fix g_idx generation: use vectorized tensor ops instead of Python list comprehension

sunnyxiaohu · 2026-06-08T07:48:33Z

@sunnyxiaohu Please per-commit code formatting:
pip3 install pre-commit black isort flake8
cd AngelSlim
pre-commit install

fixed

yghstill previously approved these changes Jun 1, 2026

View reviewed changes

sunnyxiaohu dismissed yghstill’s stale review via 5d482fd June 8, 2026 07:43

sunnyxiaohu force-pushed the fix/gptq-core-bugs branch from 3e66200 to 5d482fd Compare June 8, 2026 07:43

sunnyxiaohu closed this Jun 8, 2026

sunnyxiaohu reopened this Jun 8, 2026

yghstill approved these changes Jun 8, 2026

View reviewed changes

linchuanxie merged commit 86479db into Tencent:main Jun 10, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gptq): fix Hessian computation, variable-length sequence support, and layer output type handling#318

fix(gptq): fix Hessian computation, variable-length sequence support, and layer output type handling#318
linchuanxie merged 1 commit into
Tencent:mainfrom
sunnyxiaohu:fix/gptq-core-bugs

sunnyxiaohu commented May 27, 2026

Uh oh!

yghstill commented Jun 1, 2026 •

edited

Loading

Uh oh!

sunnyxiaohu commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sunnyxiaohu commented May 27, 2026

Summary

Problems

Changes

Testing

Uh oh!

yghstill commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunnyxiaohu commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yghstill commented Jun 1, 2026 •

edited

Loading

sunnyxiaohu commented Jun 8, 2026 •

edited

Loading