feat: GLM-5.2 DSpark draft training configs + deploy plan#133
feat: GLM-5.2 DSpark draft training configs + deploy plan#133FlamingoPg wants to merge 1 commit into
Conversation
Add support artifacts for training a DSpark speculative-decoding draft against GLM-5.2 (744B/40B MoE, glm_moe_dsa) on the TorchSpec online path. - torchspec/config/glm52_dspark_draft_config.json: DSpark draft config sized to GLM-5.2 (hidden 6144, vocab 154880, 78-layer target, aux layers [1,20,38,56,75]); backbone dims aligned with the existing GLM-5.2 DFlash draft config. - configs/sglang_glm52_dspark_8card_colocate.yaml: single-node 8xB200, FP8, colocate (inference TP=8 + FSDP training share the 8 GPUs). - configs/sglang_glm52_dspark_2node.yaml: 16xB200 / 2-node fallback. - torchspec/data/template.py: register the `glm` chat template. - docs/deploy_glm52_dspark_8card.md: end-to-end deploy plan with Go/No-Go. NOT YET VERIFIED — before merge/training, confirm on a GLM-5.2 checkpoint: mask_token_id (placeholder 154820), the glm chat_template, and the embed/lm_head/norm state-dict keys; then run the Step A smoke + dry-run on B200. GLM-5.2 inherits EAGLE3 aux-hidden capture from DeepseekV2ForCausalLM, so no GLM-specific sglang patch is required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 37a8641260
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| sglang: | ||
| tp_size: 8 | ||
| mem_fraction_static: 0.5 # <-- LOW: leave ~half of each B200 for training | ||
| dtype: auto # FP8 follows the checkpoint weights |
There was a problem hiding this comment.
Move dtype into sglang.extra_args
The structured config schema validates inference.sglang against SGLangConfig, which does not define a top-level dtype field; engine-only kwargs are forwarded via inference.sglang.extra_args. As written, this GLM config, and the same dtype entry in configs/sglang_glm52_dspark_2node.yaml, will be rejected during load_config before training starts instead of passing the dtype to SGLang.
Useful? React with 👍 / 👎.
| "target_hidden_size": 6144, | ||
| "target_num_hidden_layers": 78, | ||
| "target_layer_ids": [1, 20, 38, 56, 75], | ||
| "mask_token_id": 154820, |
There was a problem hiding this comment.
Replace the placeholder mask token
This draft config is consumed directly by DSpark, where mask_token_id is inserted for the masked draft positions. The newly added docs/comments identify 154820 as the pad placeholder, so running the checked-in config as-is trains on PAD embeddings instead of GLM's actual mask token and silently corrupts the DSpark objective; resolve the tokenizer id in the config before shipping it.
Useful? React with 👍 / 👎.
| # 用 colocate config 但只起推理:临时把 debug.debug_inference_only 设为 true 跑一小步 | ||
| RAY_ADDRESS=local CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ | ||
| python -m torchspec.train_entry --config configs/sglang_glm52_dspark_8card_colocate.yaml \ | ||
| --opts debug.debug_inference_only=true |
There was a problem hiding this comment.
Drop unsupported --opts from the smoke command
This repo's parse_config forwards unknown CLI tokens directly to OmegaConf dotlist overrides, so overrides are accepted as bare debug.debug_inference_only=true, not behind an --opts flag. With the documented command, --opts becomes an unknown top-level config key and the Step A smoke exits during config parsing before it can test SGLang.
Useful? React with 👍 / 👎.
|
|
||
| ```bash | ||
| RAY_ADDRESS=local CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ | ||
| ./examples/qwen3-8b-single-node/run.sh configs/sglang_glm52_dspark_8card_colocate.yaml |
There was a problem hiding this comment.
Invoke train_entry instead of the Qwen wrapper
For the 8-card dry-run/formal-training path, this wrapper is not neutral: examples/qwen3-8b-single-node/run.sh unconditionally overrides the supplied config to training_num_gpus_per_node=2, inference_num_gpus=2, and inference.sglang.tp_size=2. That contradicts the surrounding GLM plan's TP=8/8-GPU colocate requirement, so following this command launches the 744B FP8 target with the wrong TP/GPU layout or fails to fit.
Useful? React with 👍 / 👎.
Summary
Adds the artifacts to train a DSpark speculative-decoding draft against GLM-5.2 (744B/40B MoE,
glm_moe_dsa) on TorchSpec's online (Mooncake-streamed) path.torchspec/config/glm52_dspark_draft_config.json— DSpark draft sized to GLM-5.2 (hidden 6144, vocab 154880, 78-layer target, aux layers[1,20,38,56,75]); backbone dims aligned with the existing GLM-5.2 DFlash draft config.configs/sglang_glm52_dspark_8card_colocate.yaml— single-node 8×B200, FP8, colocate (inference TP=8 + FSDP training share the 8 GPUs).configs/sglang_glm52_dspark_2node.yaml— 16×B200 / 2-node fallback.torchspec/data/template.py— register theglmchat template.docs/deploy_glm52_dspark_8card.md— end-to-end deploy plan with a Go/No-Go gate.Key findings
GlmMoeDsaForCausalLM) inherits EAGLE3 aux-hidden capture fromDeepseekV2ForCausalLM→ no GLM-specific sglang patch required.glm_moe_dsaDSA-kernel gating (the community ada_dsa port is only needed on sm89/L20).colocate(shared placement group,placement_group.py:525-543). Open risk = the FP8-inference-KV vs co-located-training memory budget; tuneinference.sglang.mem_fraction_static(start 0.5).mask_token_id(currently placeholder 154820 = pad) viatokenizer.convert_tokens_to_ids('[MASK]')on the GLM-5.2 checkpoint.glmchat_template (headers / end token / thinking) against the checkpoint'stokenizer_config.json.embed/lm_head/normstate-dict keys match the GLM-5.2 weights.Test plan
See
docs/deploy_glm52_dspark_8card.md, Steps A–D (smoke → dry-run → train → eval acceptance vs the built-in MTP).🤖 Generated with Claude Code