padding-free / packed-sequence support for Qwen3.5. by meichangsu1 · Pull Request #186 · modelscope/twinkle

meichangsu1 · 2026-04-30T12:22:25Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

This PR focuses on three sequence-parallel / padding-free fixes for Qwen3.5 in the Transformers backend.

Main changes:

Add Qwen3.5 support for padding-free / packed inputs.
- Introduce a dedicated Qwen3.5 （Qwen3.6） GatedDeltaNet padding-free patch.
- Pass explicit packed-sequence metadata (cu_seq_lens_q, cu_seq_lens_k, max_length_q, max_length_k) for packed Qwen3.5 inputs.
- Make Qwen3.5 linear attention use flash-linear-attention kernels with packed cu_seqlens when padding-free is enabled.
Fix gather_loss_tensors to remove sequence-parallel padding before loss computation.
- Trim SP/RP-added padding from gathered logps and labels.
- Ensure packed/padding-free loss computation uses only real tokens after gather.
Add a non-padding-free fallback path for Qwen3.5 GatedDeltaNet when flash-linear-attention is unavailable.
- If padding-free is not enabled, fall back to torch-native GatedDeltaNet computation.
- This keeps non-packed Qwen3.5 sequence-parallel training usable without requiring FLA.

Experiment results

The `is_packed` flag was ambiguous and only inferred from position IDs. Now `padding_free` is explicitly passed as input, making the intent clearer and enabling early validation of attention backend compatibility.

Simplify the logic for returning logits in `forward` and `forward_only` methods by removing redundant `_outputs` copy and `logits` variable. The new logic directly modifies `outputs` and creates a single copy for return, reducing code complexity and potential bugs.

gemini-code-assist

Code Review

This pull request introduces support for padding-free and packed sequence inputs for Qwen 3.5 models, specifically targeting GatedDeltaNet and linear attention within a sequence parallel context. Key changes include a new patching mechanism for Qwen 3.5, refactored attention logic to handle variable sequence lengths without padding, and fallback implementations for linear attention kernels when specialized libraries are missing. Feedback highlights a regression in how sequence boundaries are determined in the attention strategy and identifies inconsistencies in the return types and activation handling within the new torch-based fallback for causal convolution.

…ance

…helpers and renaming function Remove `_get_real_position_ids` and `_is_packed_position_ids` helper functions that are no longer used. Inline the availability check into `_get_flash_linear_attention_kernels` instead of a separate function. Rename `_run_with_gdn_conv_and_delta_rule_cu_seqlens` to `_patch_gdn_kernels_for_cu_seqlens` for clarity.

… and rely on explicit position_ids The automatic derivation of cu_seq_lens_q from position_ids in `_update_packed_varlen_metadata` was removed to simplify the codebase and avoid potential inconsistencies. Now, packed sequence metadata must be provided explicitly via valid position_ids or other means, with clearer error messages when missing.

The patch class and attribute were renamed from Qwen35-specific names to generic GatedDeltaNet names to reflect that the padding-free optimization is not limited to Qwen3.5 models.

meichangsu1 added 2 commits April 30, 2026 16:51

refactor: replace is_packed with padding_free flag for clarity

bc2663d

The `is_packed` flag was ambiguous and only inferred from position IDs. Now `padding_free` is explicitly passed as input, making the intent clearer and enabling early validation of attention backend compatibility.

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread src/twinkle/model/transformers/strategy/sequence_parallel/__init__.py

Comment thread src/twinkle/model/transformers/strategy/sequence_parallel/linear_attention_sp.py Outdated

meichangsu1 added 4 commits May 3, 2026 11:08

wip

b9c3063

fix: add missing blank line in linear_attention_sp.py for PEP8 compli…

1587a6d

…ance

tpx818 reviewed May 7, 2026

View reviewed changes

Comment thread src/twinkle/patch/gdn_padding_free.py

tpx818 reviewed May 7, 2026

View reviewed changes

Comment thread src/twinkle/patch/qwen35_gdn_padding_free.py Outdated

refactor(processor): rename Qwen35-specific patch to generic GDN patch

4daf906

The patch class and attribute were renamed from Qwen35-specific names to generic GatedDeltaNet names to reflect that the padding-free optimization is not limited to Qwen3.5 models.

meichangsu1 force-pushed the qwen3_5_padding_free_ljl branch from ceb8983 to 4daf906 Compare May 7, 2026 07:48

tastelikefeet approved these changes May 7, 2026

View reviewed changes

meichangsu1 merged commit 6921342 into modelscope:main May 7, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

padding-free / packed-sequence support for Qwen3.5.#186

padding-free / packed-sequence support for Qwen3.5.#186
meichangsu1 merged 7 commits intomodelscope:mainfrom
meichangsu1:qwen3_5_padding_free_ljl

meichangsu1 commented Apr 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

meichangsu1 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

meichangsu1 commented Apr 30, 2026 •

edited

Loading