Add GDPO Support

**Is your feature request related to a problem? Please describe.**

 GRPO inherently compresses the reward signal, causing loss of information in the advantage estimates. Group-wise decoupled normalization of each reward separately better preserves cross-reward distinctions and enables more accurate multi-reward optimization.

**Describe the solution you'd like**
https://arxiv.org/abs/2601.05242

**Additional context**
Authors have reported huge improvement in results

How i came to this  --? 
As part of Google Tunix hackathon i have observed the problem of GRPO with a multi reward optimization. I found this latest paper directly questions the issues of loss of information in advantage estimates in GRPO hence i would love to add this to Tunix!

**Checklist**

- [x] I have searched the existing issues for similar feature requests.
- [x] This is not a support question (please use the "bug template" for that).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GDPO Support #977

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add GDPO Support #977

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions