Skip to content

Conversation

@bledden
Copy link
Contributor

@bledden bledden commented Dec 20, 2025

Summary

The get_lr() function in hyperparam_utils.py was missing support for Moonshot Kimi models, causing it to fail with an AssertionError when users tried to get learning rate recommendations for moonshotai/Kimi-K2-Thinking.

This PR adds the missing configuration:

  • Added hidden size lookup for Kimi-K2 models (hidden_size=7168)
  • Added learning rate scaling exponent for Kimi-K2 (0.0775, matching other MoE architectures)

Uses specific model name matching (moonshotai/Kimi-K2 and kimi-k2) to avoid conflicts if Moonshot releases other model families in the future.

Test plan

  • Verified get_lr("moonshotai/Kimi-K2-Thinking") returns a valid learning rate instead of raising an error
  • Confirmed existing model support (llama, qwen) is unaffected

Note on exponent value

The exponent value (0.0775) is based on Qwen's MoE models since Kimi-K2 is also an MoE architecture. The existing exponents for Llama (0.781) and Qwen (0.0775) were empirically derived through hyperparameter sweeps. A follow-up eval run would help validate this is optimal for Kimi-K2 specifically.

Related to thinking-machines-lab/tinker-feedback#49

"meta-llama/Llama-3.3-70B-Instruct": 8192,
}[model_name]

if "moonshotai" in model_name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make it check for the full name, in case we add another moonshot model later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joschu ah nice catch. Updated this and the DeepSeek PR

@bledden bledden force-pushed the fix/moonshot-lr-configs branch from c7119f3 to b84afb0 Compare December 20, 2025 10:54
The get_lr() function was raising an AssertionError for moonshotai models
because they weren't included in the model name checks.

This adds:
- Hidden size lookup for Kimi-K2 models (hidden_size=7168)
- Learning rate scaling exponent for Kimi (0.0775, same as other MoE models)

Related to thinking-machines-lab/tinker-feedback#49
@bledden bledden force-pushed the fix/moonshot-lr-configs branch from b84afb0 to fc8b62d Compare December 20, 2025 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants