Skip to content

feat: GLM-5.2 DSpark draft training configs + deploy plan#133

Open
FlamingoPg wants to merge 1 commit into
mainfrom
proj/glm52-torchspec-training
Open

feat: GLM-5.2 DSpark draft training configs + deploy plan#133
FlamingoPg wants to merge 1 commit into
mainfrom
proj/glm52-torchspec-training

Conversation

@FlamingoPg

Copy link
Copy Markdown
Collaborator

Summary

Adds the artifacts to train a DSpark speculative-decoding draft against GLM-5.2 (744B/40B MoE, glm_moe_dsa) on TorchSpec's online (Mooncake-streamed) path.

  • torchspec/config/glm52_dspark_draft_config.json — DSpark draft sized to GLM-5.2 (hidden 6144, vocab 154880, 78-layer target, aux layers [1,20,38,56,75]); backbone dims aligned with the existing GLM-5.2 DFlash draft config.
  • configs/sglang_glm52_dspark_8card_colocate.yaml — single-node 8×B200, FP8, colocate (inference TP=8 + FSDP training share the 8 GPUs).
  • configs/sglang_glm52_dspark_2node.yaml — 16×B200 / 2-node fallback.
  • torchspec/data/template.py — register the glm chat template.
  • docs/deploy_glm52_dspark_8card.md — end-to-end deploy plan with a Go/No-Go gate.

Key findings

  • GLM-5.2 (GlmMoeDsaForCausalLM) inherits EAGLE3 aux-hidden capture from DeepseekV2ForCausalLM → no GLM-specific sglang patch required.
  • B200 = sm100 satisfies the glm_moe_dsa DSA-kernel gating (the community ada_dsa port is only needed on sm89/L20).
  • 8-card single node is feasible via colocate (shared placement group, placement_group.py:525-543). Open risk = the FP8-inference-KV vs co-located-training memory budget; tune inference.sglang.mem_fraction_static (start 0.5).

⚠️ NOT YET VERIFIED — do not merge before

  • Resolve mask_token_id (currently placeholder 154820 = pad) via tokenizer.convert_tokens_to_ids('[MASK]') on the GLM-5.2 checkpoint.
  • Confirm the glm chat_template (headers / end token / thinking) against the checkpoint's tokenizer_config.json.
  • Confirm embed/lm_head/norm state-dict keys match the GLM-5.2 weights.
  • Step A smoke (serve GLM-5.2 FP8 TP=8 on B200, confirm aux+last hidden capture) + a small dry-run.

Test plan

See docs/deploy_glm52_dspark_8card.md, Steps A–D (smoke → dry-run → train → eval acceptance vs the built-in MTP).

🤖 Generated with Claude Code

Add support artifacts for training a DSpark speculative-decoding draft
against GLM-5.2 (744B/40B MoE, glm_moe_dsa) on the TorchSpec online path.

- torchspec/config/glm52_dspark_draft_config.json: DSpark draft config
  sized to GLM-5.2 (hidden 6144, vocab 154880, 78-layer target, aux layers
  [1,20,38,56,75]); backbone dims aligned with the existing GLM-5.2 DFlash
  draft config.
- configs/sglang_glm52_dspark_8card_colocate.yaml: single-node 8xB200, FP8,
  colocate (inference TP=8 + FSDP training share the 8 GPUs).
- configs/sglang_glm52_dspark_2node.yaml: 16xB200 / 2-node fallback.
- torchspec/data/template.py: register the `glm` chat template.
- docs/deploy_glm52_dspark_8card.md: end-to-end deploy plan with Go/No-Go.

NOT YET VERIFIED — before merge/training, confirm on a GLM-5.2 checkpoint:
mask_token_id (placeholder 154820), the glm chat_template, and the
embed/lm_head/norm state-dict keys; then run the Step A smoke + dry-run on
B200. GLM-5.2 inherits EAGLE3 aux-hidden capture from DeepseekV2ForCausalLM,
so no GLM-specific sglang patch is required.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37a8641260

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

sglang:
tp_size: 8
mem_fraction_static: 0.5 # <-- LOW: leave ~half of each B200 for training
dtype: auto # FP8 follows the checkpoint weights

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move dtype into sglang.extra_args

The structured config schema validates inference.sglang against SGLangConfig, which does not define a top-level dtype field; engine-only kwargs are forwarded via inference.sglang.extra_args. As written, this GLM config, and the same dtype entry in configs/sglang_glm52_dspark_2node.yaml, will be rejected during load_config before training starts instead of passing the dtype to SGLang.

Useful? React with 👍 / 👎.

"target_hidden_size": 6144,
"target_num_hidden_layers": 78,
"target_layer_ids": [1, 20, 38, 56, 75],
"mask_token_id": 154820,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace the placeholder mask token

This draft config is consumed directly by DSpark, where mask_token_id is inserted for the masked draft positions. The newly added docs/comments identify 154820 as the pad placeholder, so running the checked-in config as-is trains on PAD embeddings instead of GLM's actual mask token and silently corrupts the DSpark objective; resolve the tokenizer id in the config before shipping it.

Useful? React with 👍 / 👎.

# 用 colocate config 但只起推理:临时把 debug.debug_inference_only 设为 true 跑一小步
RAY_ADDRESS=local CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torchspec.train_entry --config configs/sglang_glm52_dspark_8card_colocate.yaml \
--opts debug.debug_inference_only=true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drop unsupported --opts from the smoke command

This repo's parse_config forwards unknown CLI tokens directly to OmegaConf dotlist overrides, so overrides are accepted as bare debug.debug_inference_only=true, not behind an --opts flag. With the documented command, --opts becomes an unknown top-level config key and the Step A smoke exits during config parsing before it can test SGLang.

Useful? React with 👍 / 👎.


```bash
RAY_ADDRESS=local CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
./examples/qwen3-8b-single-node/run.sh configs/sglang_glm52_dspark_8card_colocate.yaml

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Invoke train_entry instead of the Qwen wrapper

For the 8-card dry-run/formal-training path, this wrapper is not neutral: examples/qwen3-8b-single-node/run.sh unconditionally overrides the supplied config to training_num_gpus_per_node=2, inference_num_gpus=2, and inference.sglang.tp_size=2. That contradicts the surrounding GLM plan's TP=8/8-GPU colocate requirement, so following this command launches the 744B FP8 target with the wrong TP/GPU layout or fails to fit.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant