Modify SkyRL Generator to Append Router Indices in Multi-Turn by devpatelio · Pull Request #1530 · NovaSky-AI/SkyRL

devpatelio · 2026-04-17T20:30:05Z

vLLM returns new expert indices across prior turns which can often differ from the actual routing decisions that were used in earlier turns.

Turn 1:
Input: [A] w/ R3 indices A_r
Output: [B] w/ R3 indices B_r
Observation: O

Record [A_r, B_r].

Turn 2:
Input: [A, B, O] w/ R3 indices [AB_r, O_r]
Output: [C] w/ R3 indices C_r

The previous approach was to override the recorded indices, but in the case that [AB_r, O_r] != [A_r, B_r], we need to instead append O and C to the original recorded indices instead.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

gemini-code-assist

Code Review

This pull request implements router replay (R3) for multi-turn search tasks, transitioning from overwriting to appending rollout expert indices to maintain routing decisions across turns. It introduces a new execution script, a utility function for merging indices, and expanded test coverage for multi-turn invariants. Review feedback focuses on correcting outdated documentation and removing development TODOs in the generator logic.

I am having trouble creating individual review comments. Click here to see my feedback.

skyrl/train/generators/skyrl_gym_generator.py (991-997)

The comment at lines 991-993 is now incorrect as it states the indices are overwritten, whereas the implementation has been updated to append them. Additionally, the comment at line 995 is a redundant development note that should be removed.

                # Append the new rollout expert indices. The inference engine returns indices
                # for the entire sequence (including previous turns), so we only append
                # the new indices to maintain the original routing decisions for earlier turns.
                agent_loop_state.rollout_expert_indices = append_rollout_expert_indices(
                    agent_loop_state.rollout_expert_indices, rollout_expert_indices_for_turn)

skyrl/train/generators/skyrl_gym_generator.py (1072-1078)

The comment at lines 1072-1074 is now incorrect as it states the indices are overwritten, whereas the implementation has been updated to append them. Additionally, there is a leftover TODO at line 1076 that should be removed.

            # Append the new rollout expert indices. The inference engine returns indices
            # for the entire sequence (including previous turns), so we only append
            # the new indices to maintain the original routing decisions for earlier turns.
            agent_loop_state.rollout_expert_indices = append_rollout_expert_indices(
                agent_loop_state.rollout_expert_indices, turn_output.rollout_expert_indices)

skyrl/train/generators/skyrl_gym_generator.py (972)

This leftover TODO comment should be removed as the logic for appending is handled later in the function.

erictang000 added 2 commits April 17, 2026 07:02

append indices for multiturn

debbb4a

pushing w. modified bash script

01c58cf

devpatelio requested review from CharlieFRuan and erictang000 April 17, 2026 20:30

erictang000 added 2 commits April 17, 2026 20:33

tests

5d03af3

done

c675268

devin-ai-integration Bot reviewed Apr 17, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 17, 2026

View reviewed changes

router

1756547

This comment was marked as resolved.

Sign in to view

linter

a3280c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify SkyRL Generator to Append Router Indices in Multi-Turn#1530

Modify SkyRL Generator to Append Router Indices in Multi-Turn#1530
devpatelio wants to merge 6 commits into
mainfrom
r3-multi

devpatelio commented Apr 17, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

devpatelio commented Apr 17, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

skyrl/train/generators/skyrl_gym_generator.py (991-997)

skyrl/train/generators/skyrl_gym_generator.py (1072-1078)

skyrl/train/generators/skyrl_gym_generator.py (972)

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

devpatelio commented Apr 17, 2026 •

edited by devin-ai-integration Bot

Loading