feat: GLM-5.2 DSpark draft training configs + deploy plan by FlamingoPg · Pull Request #133 · lightseekorg/TorchSpec

FlamingoPg · 2026-06-29T10:45:20Z

Summary

Adds the artifacts to train a DSpark speculative-decoding draft against GLM-5.2 (744B/40B MoE, glm_moe_dsa) on TorchSpec's online (Mooncake-streamed) path.

torchspec/config/glm52_dspark_draft_config.json — DSpark draft sized to GLM-5.2 (hidden 6144, vocab 154880, 78-layer target, aux layers [1,20,38,56,75]); backbone dims aligned with the existing GLM-5.2 DFlash draft config.
configs/sglang_glm52_dspark_8card_colocate.yaml — single-node 8×B200, FP8, colocate (inference TP=8 + FSDP training share the 8 GPUs).
configs/sglang_glm52_dspark_2node.yaml — 16×B200 / 2-node fallback.
torchspec/data/template.py — register the glm chat template.
docs/deploy_glm52_dspark_8card.md — end-to-end deploy plan with a Go/No-Go gate.

Key findings

GLM-5.2 (GlmMoeDsaForCausalLM) inherits EAGLE3 aux-hidden capture from DeepseekV2ForCausalLM → no GLM-specific sglang patch required.
B200 = sm100 satisfies the glm_moe_dsa DSA-kernel gating (the community ada_dsa port is only needed on sm89/L20).
8-card single node is feasible via colocate (shared placement group, placement_group.py:525-543). Open risk = the FP8-inference-KV vs co-located-training memory budget; tune inference.sglang.mem_fraction_static (start 0.5).

⚠️ NOT YET VERIFIED — do not merge before

Resolve mask_token_id (currently placeholder 154820 = pad) via tokenizer.convert_tokens_to_ids('[MASK]') on the GLM-5.2 checkpoint.
Confirm the glm chat_template (headers / end token / thinking) against the checkpoint's tokenizer_config.json.
Confirm embed/lm_head/norm state-dict keys match the GLM-5.2 weights.
Step A smoke (serve GLM-5.2 FP8 TP=8 on B200, confirm aux+last hidden capture) + a small dry-run.

Test plan

See docs/deploy_glm52_dspark_8card.md, Steps A–D (smoke → dry-run → train → eval acceptance vs the built-in MTP).

🤖 Generated with Claude Code

Add support artifacts for training a DSpark speculative-decoding draft against GLM-5.2 (744B/40B MoE, glm_moe_dsa) on the TorchSpec online path. - torchspec/config/glm52_dspark_draft_config.json: DSpark draft config sized to GLM-5.2 (hidden 6144, vocab 154880, 78-layer target, aux layers [1,20,38,56,75]); backbone dims aligned with the existing GLM-5.2 DFlash draft config. - configs/sglang_glm52_dspark_8card_colocate.yaml: single-node 8xB200, FP8, colocate (inference TP=8 + FSDP training share the 8 GPUs). - configs/sglang_glm52_dspark_2node.yaml: 16xB200 / 2-node fallback. - torchspec/data/template.py: register the `glm` chat template. - docs/deploy_glm52_dspark_8card.md: end-to-end deploy plan with Go/No-Go. NOT YET VERIFIED — before merge/training, confirm on a GLM-5.2 checkpoint: mask_token_id (placeholder 154820), the glm chat_template, and the embed/lm_head/norm state-dict keys; then run the Step A smoke + dry-run on B200. GLM-5.2 inherits EAGLE3 aux-hidden capture from DeepseekV2ForCausalLM, so no GLM-specific sglang patch is required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37a8641260

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-29T10:49:26Z

+  sglang:
+    tp_size: 8
+    mem_fraction_static: 0.5           # <-- LOW: leave ~half of each B200 for training
+    dtype: auto                        # FP8 follows the checkpoint weights


Move dtype into sglang.extra_args

The structured config schema validates inference.sglang against SGLangConfig, which does not define a top-level dtype field; engine-only kwargs are forwarded via inference.sglang.extra_args. As written, this GLM config, and the same dtype entry in configs/sglang_glm52_dspark_2node.yaml, will be rejected during load_config before training starts instead of passing the dtype to SGLang.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-29T10:49:26Z

+    "target_hidden_size": 6144,
+    "target_num_hidden_layers": 78,
+    "target_layer_ids": [1, 20, 38, 56, 75],
+    "mask_token_id": 154820,


Replace the placeholder mask token

This draft config is consumed directly by DSpark, where mask_token_id is inserted for the masked draft positions. The newly added docs/comments identify 154820 as the pad placeholder, so running the checked-in config as-is trains on PAD embeddings instead of GLM's actual mask token and silently corrupts the DSpark objective; resolve the tokenizer id in the config before shipping it.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-29T10:49:26Z

+# 用 colocate config 但只起推理:临时把 debug.debug_inference_only 设为 true 跑一小步
+RAY_ADDRESS=local CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+  python -m torchspec.train_entry --config configs/sglang_glm52_dspark_8card_colocate.yaml \
+  --opts debug.debug_inference_only=true


Drop unsupported --opts from the smoke command

This repo's parse_config forwards unknown CLI tokens directly to OmegaConf dotlist overrides, so overrides are accepted as bare debug.debug_inference_only=true, not behind an --opts flag. With the documented command, --opts becomes an unknown top-level config key and the Step A smoke exits during config parsing before it can test SGLang.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-29T10:49:26Z

+
+```bash
+RAY_ADDRESS=local CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+  ./examples/qwen3-8b-single-node/run.sh configs/sglang_glm52_dspark_8card_colocate.yaml


Invoke train_entry instead of the Qwen wrapper

For the 8-card dry-run/formal-training path, this wrapper is not neutral: examples/qwen3-8b-single-node/run.sh unconditionally overrides the supplied config to training_num_gpus_per_node=2, inference_num_gpus=2, and inference.sglang.tp_size=2. That contradicts the surrounding GLM plan's TP=8/8-GPU colocate requirement, so following this command launches the 744B FP8 target with the wrong TP/GPU layout or fails to fit.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: GLM-5.2 DSpark draft training configs + deploy plan#133

feat: GLM-5.2 DSpark draft training configs + deploy plan#133
FlamingoPg wants to merge 1 commit into
mainfrom
proj/glm52-torchspec-training

FlamingoPg commented Jun 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FlamingoPg commented Jun 29, 2026

Summary

Key findings

⚠️ NOT YET VERIFIED — do not merge before

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant