Add learning rate config support for Moonshot Kimi models #200

bledden · 2025-12-20T06:42:13Z

Summary

The get_lr() function in hyperparam_utils.py was missing support for Moonshot Kimi models, causing it to fail with an AssertionError when users tried to get learning rate recommendations for moonshotai/Kimi-K2-Thinking.

This PR adds the missing configuration:

Added hidden size lookup for Kimi-K2 models (hidden_size=7168)
Added learning rate scaling exponent for Kimi-K2 (0.0775, matching other MoE architectures)

Uses specific model name matching (moonshotai/Kimi-K2 and kimi-k2) to avoid conflicts if Moonshot releases other model families in the future.

Test plan

Verified get_lr("moonshotai/Kimi-K2-Thinking") returns a valid learning rate instead of raising an error
Confirmed existing model support (llama, qwen) is unaffected

Note on exponent value

The exponent value (0.0775) is based on Qwen's MoE models since Kimi-K2 is also an MoE architecture. The existing exponents for Llama (0.781) and Qwen (0.0775) were empirically derived through hyperparameter sweeps. A follow-up eval run would help validate this is optimal for Kimi-K2 specifically.

Related to thinking-machines-lab/tinker-feedback#49

joschu · 2025-12-20T07:35:06Z

tinker_cookbook/hyperparam_utils.py

            "meta-llama/Llama-3.3-70B-Instruct": 8192,
        }[model_name]

+    if "moonshotai" in model_name:


can you make it check for the full name, in case we add another moonshot model later?

@joschu ah nice catch. Updated this and the DeepSeek PR

The get_lr() function was raising an AssertionError for moonshotai models because they weren't included in the model name checks. This adds: - Hidden size lookup for Kimi-K2 models (hidden_size=7168) - Learning rate scaling exponent for Kimi (0.0775, same as other MoE models) Related to thinking-machines-lab/tinker-feedback#49

joschu reviewed Dec 20, 2025

View reviewed changes

bledden force-pushed the fix/moonshot-lr-configs branch from c7119f3 to b84afb0 Compare December 20, 2025 10:54

bledden force-pushed the fix/moonshot-lr-configs branch from b84afb0 to fc8b62d Compare December 20, 2025 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add learning rate config support for Moonshot Kimi models #200

Add learning rate config support for Moonshot Kimi models #200

Uh oh!

bledden commented Dec 20, 2025 •

edited

Loading

Uh oh!

joschu Dec 20, 2025

Uh oh!

bledden Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add learning rate config support for Moonshot Kimi models #200

Are you sure you want to change the base?

Add learning rate config support for Moonshot Kimi models #200

Uh oh!

Conversation

bledden commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Note on exponent value

Uh oh!

joschu Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

bledden Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bledden commented Dec 20, 2025 •

edited

Loading