Skip to content

Add MiniCPM-MoE-8x2B contrib model port#89

Draft
dhwanw wants to merge 3 commits intomainfrom
contrib/minicpm-moe-8x2b
Draft

Add MiniCPM-MoE-8x2B contrib model port#89
dhwanw wants to merge 3 commits intomainfrom
contrib/minicpm-moe-8x2b

Conversation

@dhwanw
Copy link

@dhwanw dhwanw commented Mar 18, 2026

Description

Adds NeuronX port of OpenBMB MiniCPM-MoE-8x2B to the contrib models collection.

Model Information

Field Value
Model openbmb/MiniCPM-MoE-8x2b
Architecture MiniCPM Mixture of Experts (8 experts, top-2 routing)
Parameters ~14B total, ~2B active per token
TP Degree 2
Precision BF16

Checklist

  • Model compiles successfully on Neuron
  • Token matching validated (39.38% greedy, 92.19% teacher-forced)
  • Performance profiled (2.5 tok/s)
  • README with architecture details, usage, validation results
  • Integration tests included

Folder Structure

contrib/models/minicpm-moe-8x2b/
├── README.md
├── src/
│   ├── __init__.py
│   └── modeling_minicpm_moe_neuronx.py
└── test/
    └── integration/
        └── test_model.py

Testing

  • Token Match (greedy): 39.38% (10 prompts, 32 tokens each)
  • Token Match (teacher-forced): 92.19%
  • Throughput: 2.5 tok/s (TP=2, BS=1, seq_len=2048)

Note: Teacher-forced accuracy is 92.19%. MiniCPM-MoE uses embedding/residual scaling which amplifies bf16 rounding differences through the MoE routing path.

Compatibility

  • Neuron SDK: 2.22+
  • Instance: trn1.32xlarge

🤖 Generated with Claude Code

dhwanw and others added 3 commits March 17, 2026 04:54
Port of openbmb/MiniCPM-MoE-8x2b (8 experts, top-2 routing, ~14B total
params). Uses NXDI base classes (NeuronAttentionBase, NeuronBaseModel,
NeuronBaseForCausalLM, initialize_moe_module). Key MiniCPM-specific
features: embedding scaling (scale_emb=12), residual depth scaling
(scale_depth=1.4), and GLU MLP with softmax router.

Validated on trn1.32xlarge with tp_degree=2, seq_len=2048, bf16.
Compilation passes. Inference-only validation passes with coherent
greedy generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 prompts, 32 tokens each. Greedy 39.38%, teacher-forced 92.19%.
Below 95% threshold - likely due to embedding/residual scaling amplifying
bf16 rounding through MoE routing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant