Add learning rate config support for gpt-oss models #197

bledden · 2025-12-20T06:11:17Z

Summary

The get_lr() function in hyperparam_utils.py was missing support for the gpt-oss model family, causing it to fail with an AssertionError when users tried to get learning rate recommendations for openai/gpt-oss-20b or openai/gpt-oss-120b.

This PR adds the missing configuration:

Added hidden size lookup for gpt-oss models (both variants use hidden_size=2880)
Added learning rate scaling exponent for gpt-oss (0.0775, matching Qwen since both use MoE architecture)

Test plan

Verified get_lr("openai/gpt-oss-20b") returns a valid learning rate instead of raising an error
Verified get_lr("openai/gpt-oss-120b") works correctly
Confirmed existing model support (llama, qwen) is unaffected

Fixes thinking-machines-lab/tinker-feedback#49

The get_lr() function was raising an AssertionError for openai/gpt-oss-20b and openai/gpt-oss-120b models because they weren't included in the model name checks. This adds: - Hidden size lookup for gpt-oss models (both use 2880) - Learning rate scaling exponent for gpt-oss (using 0.0775, same as Qwen since both are MoE architectures) Fixes thinking-machines-lab/tinker-feedback#49

Use exact model name matching instead of partial string matching to prevent unintended matches with future model variants. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

bledden force-pushed the fix/gpt-oss-lr-configs branch from 9206be1 to 52f8c0d Compare December 20, 2025 06:14

bledden force-pushed the fix/gpt-oss-lr-configs branch from 52f8c0d to 118fd43 Compare December 20, 2025 06:17

bledden mentioned this pull request Dec 20, 2025

gpt-oss models missing learning rate configs thinking-machines-lab/tinker-feedback#49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add learning rate config support for gpt-oss models #197

Add learning rate config support for gpt-oss models #197

Uh oh!

bledden commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add learning rate config support for gpt-oss models #197

Are you sure you want to change the base?

Add learning rate config support for gpt-oss models #197

Uh oh!

Conversation

bledden commented Dec 20, 2025

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant