Skip to content

fix: topkgating major bug#7986

Merged
delock merged 3 commits intodeepspeedai:masterfrom
excepshenal:dshen-fix-topkgating
Apr 30, 2026
Merged

fix: topkgating major bug#7986
delock merged 3 commits intodeepspeedai:masterfrom
excepshenal:dshen-fix-topkgating

Conversation

@excepshenal
Copy link
Copy Markdown
Contributor

@excepshenal excepshenal commented Apr 28, 2026

topk_masked_gates was previously being used across the tokens dimension to determine which tokens had the highest importance for each expert. However, it was using logits rather than probabilities to determine this.

This was causing print statements like

            if dist.get_rank() == 0:
                print(f"Mask mean: {mask.float().mean()}")
                print(f"Capacity mask mean: {capacity_mask.mean()}")
            mask = torch.logical_and(mask, capacity_mask)
            if dist.get_rank() == 0:
                print(f"Mask (after AND) mean: {mask.float().mean()}")

to often yield values like

Mask mean: 0.0625
Capacity mask mean: 0.0625
Mask (after AND) mean: 0.005908316932618618

and in turn the average number of routed experts per token was as low as 0.001.

@excepshenal excepshenal requested a review from tohtana as a code owner April 28, 2026 00:42
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 513c3bc413

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread deepspeed/moe/sharded_moe.py
@excepshenal excepshenal force-pushed the dshen-fix-topkgating branch from 513c3bc to 84aa0e2 Compare April 28, 2026 02:33
Comment thread deepspeed/moe/sharded_moe.py
@delock
Copy link
Copy Markdown
Collaborator

delock commented Apr 29, 2026

Hi @excepshenal ,can you fix DCO error by sign-off your commits? Thanks!

Signed-off-by: Daniel Shen <dandanshen2002@gmail.com>
Signed-off-by: Daniel Shen <dandanshen2002@gmail.com>
@excepshenal excepshenal force-pushed the dshen-fix-topkgating branch from 715bbb6 to d334266 Compare April 29, 2026 18:23
@delock delock enabled auto-merge (squash) April 30, 2026 01:29
@delock delock merged commit 853c938 into deepspeedai:master Apr 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants