fix(pd): transfer MiniMax-M3 sparse indexer-key cache in disaggregation by Jasen2201 · Pull Request #1368 · ROCm/ATOM

Jasen2201 · 2026-06-26T07:39:10Z

MiniMax-M3 sparse attention reuses the unified KV cache and kv_scale for K/V, so the fp8 per-token scales already travel with the KV blocks. It keeps one extra per-token buffer, runner.sparse_attention_index_cache, holding the indexer keys used for top-k block selection at decode time. get_kv_transfer_tensors() never registered that buffer, so under PD disaggregation the decode node ran top-k against a zero/stale index for the prefilled tokens and attended to the wrong KV blocks. This is masked for short prompts (the init+local+topk window already covers every block, so selection is moot) but corrupts output once the context exceeds that window.

Register the indexer-key cache as block-indexed transfer regions (one per sparse layer, same physical-block striding as the KV cache), guarded by getattr so non-sparse models and bf16 paths are unaffected.

Tested (latest image, 1P+1D TP4, fp8 KV via Triton attention): GSM8K 5-shot = 0.9401, i.e. no regression to M3 fp8 PD. Short-prompt GSM8K does not exercise the long-context top-k path the buffer affects; that path is covered by review, not this run.

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

MiniMax-M3 sparse attention reuses the unified KV cache and kv_scale for K/V, so the fp8 per-token scales already travel with the KV blocks. It keeps one extra per-token buffer, runner.sparse_attention_index_cache, holding the indexer keys used for top-k block selection at decode time. get_kv_transfer_tensors() never registered that buffer, so under PD disaggregation the decode node ran top-k against a zero/stale index for the prefilled tokens and attended to the wrong KV blocks. This is masked for short prompts (the init+local+topk window already covers every block, so selection is moot) but corrupts output once the context exceeds that window. Register the indexer-key cache as block-indexed transfer regions (one per sparse layer, same physical-block striding as the KV cache), guarded by getattr so non-sparse models and bf16 paths are unaffected. Tested (latest image, 1P+1D TP4, fp8 KV via Triton attention): GSM8K 5-shot = 0.9401, i.e. no regression to M3 fp8 PD. Short-prompt GSM8K does not exercise the long-context top-k path the buffer affects; that path is covered by review, not this run.

Copilot

Pull request overview

Fixes PD disaggregation correctness for MiniMax-M3 sparse attention by ensuring the per-token sparse indexer-key cache is included in the KV RDMA transfer set. This prevents decode workers from running top‑k block selection against stale/zero index-cache data for prefilled tokens, which can mis-select KV blocks once context grows beyond the init/local/top‑k coverage window.

Changes:

Register runner.sparse_attention_index_cache as additional block-indexed transfer regions in get_kv_transfer_tensors().
Guard the new transfer registration via getattr(...) so non-sparse models (and runners without the buffer) are unaffected.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings June 26, 2026 07:39

Copilot started reviewing on behalf of Jasen2201 June 26, 2026 07:40 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

zufayu requested a review from ZhangLirong-amd June 26, 2026 13:58

valarLip approved these changes Jun 29, 2026

View reviewed changes

valarLip merged commit 7486551 into main Jun 29, 2026
28 of 34 checks passed

valarLip deleted the fix/m3-pd-sparse-index-cache-transfer branch June 29, 2026 03:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(pd): transfer MiniMax-M3 sparse indexer-key cache in disaggregation#1368

fix(pd): transfer MiniMax-M3 sparse indexer-key cache in disaggregation#1368
valarLip merged 1 commit into
mainfrom
fix/m3-pd-sparse-index-cache-transfer

Jasen2201 commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Jasen2201 commented Jun 26, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants