[ATOM SGL] MTP Spec decode by ZhiweiYan-96 · Pull Request #1361 · ROCm/ATOM

ZhiweiYan-96 · 2026-06-26T03:17:54Z

Motivation

Enable DeepSeek-V4 MTP speculative decoding in the ATOM SGLang plugin path.

Technical Details

Export DeepSeek-V4 target hidden states through the SGLang wrapper so the MTP draft model can consume the pre-hc_head hidden states.
Add a DeepseekV4ForCausalLMNextN SGLang wrapper that creates the DeepSeek-V4 MTP draft model through ATOM.
Share the target model’s embedding and LM head weights with the MTP draft model.
Extend the DeepSeek-V4 SGLang KV-cache bridge to support both target layers and MTP draft layers.
Route SGLang’s DeepSeek-V4 speculative draft attention setup through the ATOM plugin path.

Test Result

DeepSeek-V4-Pro

Acceptance rate(MTP=3):

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

Enable DeepSeek-V4 MTP speculative decoding in the ATOM SGLang plugin path by adding an SGLang-compatible NextN draft wrapper and extending the DeepSeek-V4 proxy KV-cache/metadata bridge to support speculative + CUDA-graph workflows.

Changes:

Add DeepseekV4ForCausalLMNextN SGLang wrapper backed by ATOM’s DeepseekV4MTP, including shared embedding/LM-head weight tying for draft runs.
Extend DeepSeek-V4 SGLang bridge/backends to publish and reuse ATOM V4 graph/attention metadata across decode/prefill/graph capture paths, including MTP layer support.
Patch SGLang speculative factories / CUDA-graph eligibility to route DeepSeek-V4 draft backends through ATOM plugin mode.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
atom/plugin/sglang/runtime/forward_context.py	Adds V4 graph-metadata capture slicing and more robust metadata discovery paths.
atom/plugin/sglang/models/deepseek_v4_nextn_wrapper.py	New SGLang entry wrapper that instantiates ATOM `DeepseekV4MTP` and shares embed/head weights for NextN draft.
atom/plugin/sglang/models/base_model_wrapper.py	Routes DeepSeek-V4 logits through `LogitsProcessor` consistently; exposes embed/head for weight sharing.
atom/plugin/sglang/deepseek_v4_bridge.py	Extends proxy KV binding to MTP blocks and updates decode-graph metadata construction/indexer metadata.
atom/plugin/sglang/attention_backend/deepseek_v4_backend.py	Publishes ATOM metadata into `ForwardBatch`, supports multi-step draft backend expectations, and supports old/new graph hooks.
atom/plugin/register.py	Monkeypatches SGLang speculative backend factories and CUDA-graph behavior to force DeepSeek-V4 through ATOM shims.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            pos_np = positions[:total].detach().cpu().numpy().astype(np.int32)
+            repeats = max(1, total // max(1, bs))
+            batch_np = np.repeat(np.arange(bs, dtype=np.int64), repeats)[:total]
+        else:


 from atom.plugin.sglang.runtime.context import bind_current_forward_batch

+logger = logging.getLogger("atom.plugin.sglang.runtime.forward_context")


ZhiweiYan-96 added 3 commits June 25, 2026 22:43

[ATOM SGL] MTP Spec decode

8659b84

remove debug print

4ceeda7

mtp layer wrapper

84f2e0d

ZhiweiYan-96 force-pushed the zhiwei/v4_mtp branch from 22b74b5 to 84f2e0d Compare June 26, 2026 06:17

format

a6ecff4

ZhiweiYan-96 marked this pull request as ready for review June 26, 2026 07:02

Copilot AI review requested due to automatic review settings June 26, 2026 07:02

Copilot started reviewing on behalf of ZhiweiYan-96 June 26, 2026 07:02 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

zufayu requested a review from valarLip June 26, 2026 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ATOM SGL] MTP Spec decode#1361

[ATOM SGL] MTP Spec decode#1361
ZhiweiYan-96 wants to merge 4 commits into
mainfrom
zhiwei/v4_mtp

ZhiweiYan-96 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from atom.plugin.sglang.runtime.context import bind_current_forward_batch

		logger = logging.getLogger("atom.plugin.sglang.runtime.forward_context")

Uh oh!

Conversation

ZhiweiYan-96 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZhiweiYan-96 commented Jun 26, 2026 •

edited

Loading