Skip to content

Conversation

@bledden
Copy link
Contributor

@bledden bledden commented Dec 20, 2025

Summary

The get_lr() function in hyperparam_utils.py was missing support for the gpt-oss model family, causing it to fail with an AssertionError when users tried to get learning rate recommendations for openai/gpt-oss-20b or openai/gpt-oss-120b.

This PR adds the missing configuration:

  • Added hidden size lookup for gpt-oss models (both variants use hidden_size=2880)
  • Added learning rate scaling exponent for gpt-oss (0.0775, matching Qwen since both use MoE architecture)

Test plan

  • Verified get_lr("openai/gpt-oss-20b") returns a valid learning rate instead of raising an error
  • Verified get_lr("openai/gpt-oss-120b") works correctly
  • Confirmed existing model support (llama, qwen) is unaffected

Fixes thinking-machines-lab/tinker-feedback#49

@bledden bledden force-pushed the fix/gpt-oss-lr-configs branch from 9206be1 to 52f8c0d Compare December 20, 2025 06:14
The get_lr() function was raising an AssertionError for openai/gpt-oss-20b
and openai/gpt-oss-120b models because they weren't included in the
model name checks.

This adds:
- Hidden size lookup for gpt-oss models (both use 2880)
- Learning rate scaling exponent for gpt-oss (using 0.0775, same as Qwen
  since both are MoE architectures)

Fixes thinking-machines-lab/tinker-feedback#49
Use exact model name matching instead of partial string matching
to prevent unintended matches with future model variants.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gpt-oss models missing learning rate configs

1 participant