Skip to content

Fix cross-WG race condition in gdn_conv_fused conv_state shift#356

Open
qiuxin2012 wants to merge 2 commits intointel:mainfrom
qiuxin2012:fixgdn
Open

Fix cross-WG race condition in gdn_conv_fused conv_state shift#356
qiuxin2012 wants to merge 2 commits intointel:mainfrom
qiuxin2012:fixgdn

Conversation

@qiuxin2012
Copy link
Copy Markdown

When N*HV > 32, WGs span multiple scheduling waves. The hv==0 WG's Phase 3 conv_state shift could complete before later-scheduled WGs read the original conv_state in Phase 1, corrupting their conv1d results and producing wrong ssm_state.

Split into two paths:

  • N*HV <= 32: single kernel with inline shift (all WGs co-scheduled)
  • N*HV > 32: separate shift kernel after main kernel completes

qiuxin2012 and others added 2 commits April 14, 2026 18:15
When N*HV > 32, WGs span multiple scheduling waves. The hv==0 WG's
Phase 3 conv_state shift could complete before later-scheduled WGs
read the original conv_state in Phase 1, corrupting their conv1d
results and producing wrong ssm_state.

Split into two paths:
- N*HV <= 32: single kernel with inline shift (all WGs co-scheduled)
- N*HV > 32: separate shift kernel after main kernel completes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes for the sequential-layout GDN kernel:

1. Cross-WG race on conv_state: Phase 3 shifted conv_state without
   hv==0 guard and re-read from memory (could get corrupted data).
   Now uses register-cached values + N*HV<=32 inline / separate kernel.

2. Hardcoded conv_state dim 2048: only correct for HV=8. Replaced
   with dynamic dim = 2*H*K + HV*V.

3. Added double_v mode (HV>8): v-threads handle 128 elements each,
   enabling HV=16 support (e.g. num_v_heads_global=64 with TP=4).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant