Skip to content

feat(non_record): add SP8192 BPE Mamba3 SSM hybrid 16MB non-record submission#2155

Open
divagr18 wants to merge 2 commits into
openai:mainfrom
divagr18:MAMBA3_EVERY4_IMPLEMENTATION
Open

feat(non_record): add SP8192 BPE Mamba3 SSM hybrid 16MB non-record submission#2155
divagr18 wants to merge 2 commits into
openai:mainfrom
divagr18:MAMBA3_EVERY4_IMPLEMENTATION

Conversation

@divagr18
Copy link
Copy Markdown

@divagr18 divagr18 commented May 4, 2026

feat(records): add SP8192 BPE Mamba3 SSM hybrid 16MB non-record submission

  • Introduce non-record 16MB submission centered on SP8192 BPE with Mamba3 SSM hybrid
  • Replace every 4th transformer attention block with Mamba3 state-space model to reduce parameters
  • Configure with 9 layers, 8 heads, 4 KV heads, model dim 448, SSM every 4 layers
  • Use sentencepiece tokenizer with 8192 vocab size and BPE model
  • Employ Muon + Adam optimizer with SWA and GPTQ int8 quantization + zstd compression
  • Provide detailed README with configuration, metrics, dataset, build, and run instructions
  • Include required files: training script, log, submission metadata, dependencies, tokenizer vocab
  • Note Mamba3 CUDA extension usage for efficient state-space model implementation

Divyansh Agrawal added 2 commits May 4, 2026 12:44
…ssion

- Introduce non-record 16MB submission centered on SP8192 BPE with Mamba3 SSM hybrid
- Replace every 4th transformer attention block with Mamba3 state-space model to reduce parameters
- Configure with 9 layers, 8 heads, 4 KV heads, model dim 448, SSM every 4 layers
- Use sentencepiece tokenizer with 8192 vocab size and BPE model
- Employ Muon + Adam optimizer with SWA and GPTQ int8 quantization + zstd compression
- Provide detailed README with configuration, metrics, dataset, build, and run instructions
- Include required files: training script, log, submission metadata, dependencies, tokenizer vocab
- Note Mamba3 CUDA extension usage for efficient state-space model implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant