Feat/minimax m2.5 support by xs1997zju · Pull Request #1929 · THUDM/slime

xs1997zju · 2026-05-21T03:32:34Z

Summary

Add full integration for MiniMax-M2.5 (256 experts, top-8 routing), including:

Model spec plugin (slime_plugins/models/minimax_m2.py): Custom SelfAttention with full-dimension QK Norm (RMSNorm over all heads concatenated, with TP gather/scatter)
mbridge weight bridge (slime_plugins/mbridge/minimax_m2.py): HF ↔ Megatron weight mapping extending Qwen2MoEBridge
Megatron-to-HF converter (slime/backends/megatron_utils/megatron_to_hf/minimax_m2.py): Reverse conversion for saving trained checkpoints back to HF format
Shell scripts (scripts/): Model architecture args, RL training launch script, and 3-script HF ↔ Megatron weight conversion pipeline

Key architecture differences from standard Qwen2MoE

Feature	MiniMax-M2.5	Qwen2MoE
MoE prefix	`block_sparse_moe` (w1/w2/w3)	`mlp`
QK Norm	Full-dimension (all heads concat)	Per-head
Router	Sigmoid + `e_score_correction_bias`	Softmax
RoPE	Partial (50%)	Full
Experts	256 × 62 layers	varies

Add full integration for MiniMax-M2.5, a 229B MoE model with 256 experts and top-8 routing. This includes: - Model spec plugin with custom SelfAttention for full-dimension QK Norm (RMSNorm over all heads concatenated, with TP gather/scatter) - mbridge weight bridge (HF <-> Megatron conversion via Qwen2MoEBridge) - Megatron-to-HF converter for saving trained checkpoints - Shell scripts: model args, RL training launch, HF<->Megatron weight conversion (3-script pipeline) Key architecture differences from standard Qwen2MoE: - block_sparse_moe prefix with w1/w2/w3 expert naming - Full-dimension QK Norm (q_norm/k_norm, not per-head) - Sigmoid router with e_score_correction_bias - Partial RoPE (rotary_percent=0.5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zhuzilin · 2026-05-25T07:07:29Z

Thank you for the PR! Just a quick check — could you provide some evidence that the implementation is working correctly, such as W&B screenshots, logs, or validation outputs?

xs1997zju · 2026-05-25T07:13:23Z

Thank you for the PR! Just a quick check — could you provide some evidence that the implementation is working correctly, such as W&B screenshots, logs, or validation outputs?
@zhuzilin

aime eval results with dapo-math-17k train parameters:

`ROLLOUT_ARGS=(
--prompt-data dapo-math-17k.jsonl
--input-key prompt
--label-key label
--apply-chat-template
--rollout-shuffle
--rm-type deepscaler
--num-rollout 3000
--rollout-batch-size 32
--n-samples-per-prompt 8
--rollout-max-response-len 16384
--rollout-temperature 1
--global-batch-size 256
)

EVAL_ARGS=(
--eval-interval 20
--eval-prompt-data aime-2024.jsonl
--n-samples-per-eval-prompt 4
--eval-max-response-len 16384
--eval-top-p 1
)`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zhangxinsen and others added 3 commits May 21, 2026 11:25

fix: update actor-num-nodes to 16 for MiniMax-M2.5 training script

f3d74b3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'main' into feat/minimax-m2.5-support

2448731

zhuzilin approved these changes May 25, 2026

View reviewed changes

style: fix black formatting for pre-commit compliance

d38c37e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

xs1997zju requested a review from zhuzilin May 26, 2026 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/minimax m2.5 support#1929

Feat/minimax m2.5 support#1929
xs1997zju wants to merge 4 commits into
THUDM:mainfrom
xs1997zju:feat/minimax-m2.5-support

xs1997zju commented May 21, 2026 •

edited

Loading

Uh oh!

zhuzilin commented May 25, 2026

Uh oh!

xs1997zju commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xs1997zju commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key architecture differences from standard Qwen2MoE

Uh oh!

zhuzilin commented May 25, 2026

Uh oh!

xs1997zju commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xs1997zju commented May 21, 2026 •

edited

Loading

xs1997zju commented May 25, 2026 •

edited

Loading