Skip to content

Modify SkyRL Generator to Append Router Indices in Multi-Turn#1530

Open
devpatelio wants to merge 6 commits into
mainfrom
r3-multi
Open

Modify SkyRL Generator to Append Router Indices in Multi-Turn#1530
devpatelio wants to merge 6 commits into
mainfrom
r3-multi

Conversation

@devpatelio
Copy link
Copy Markdown
Collaborator

@devpatelio devpatelio commented Apr 17, 2026

vLLM returns new expert indices across prior turns which can often differ from the actual routing decisions that were used in earlier turns.

Turn 1:
Input: [A] w/ R3 indices A_r
Output: [B] w/ R3 indices B_r
Observation: O

Record [A_r, B_r].

Turn 2:
Input: [A, B, O] w/ R3 indices [AB_r, O_r]
Output: [C] w/ R3 indices C_r

The previous approach was to override the recorded indices, but in the case that [AB_r, O_r] != [A_r, B_r], we need to instead append O and C to the original recorded indices instead.


Open with Devin

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements router replay (R3) for multi-turn search tasks, transitioning from overwriting to appending rollout expert indices to maintain routing decisions across turns. It introduces a new execution script, a utility function for merging indices, and expanded test coverage for multi-turn invariants. Review feedback focuses on correcting outdated documentation and removing development TODOs in the generator logic.

I am having trouble creating individual review comments. Click here to see my feedback.

skyrl/train/generators/skyrl_gym_generator.py (991-997)

medium

The comment at lines 991-993 is now incorrect as it states the indices are overwritten, whereas the implementation has been updated to append them. Additionally, the comment at line 995 is a redundant development note that should be removed.

                # Append the new rollout expert indices. The inference engine returns indices
                # for the entire sequence (including previous turns), so we only append
                # the new indices to maintain the original routing decisions for earlier turns.
                agent_loop_state.rollout_expert_indices = append_rollout_expert_indices(
                    agent_loop_state.rollout_expert_indices, rollout_expert_indices_for_turn)

skyrl/train/generators/skyrl_gym_generator.py (1072-1078)

medium

The comment at lines 1072-1074 is now incorrect as it states the indices are overwritten, whereas the implementation has been updated to append them. Additionally, there is a leftover TODO at line 1076 that should be removed.

            # Append the new rollout expert indices. The inference engine returns indices
            # for the entire sequence (including previous turns), so we only append
            # the new indices to maintain the original routing decisions for earlier turns.
            agent_loop_state.rollout_expert_indices = append_rollout_expert_indices(
                agent_loop_state.rollout_expert_indices, turn_output.rollout_expert_indices)

skyrl/train/generators/skyrl_gym_generator.py (972)

medium

This leftover TODO comment should be removed as the logic for appending is handled later in the function.

devin-ai-integration[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants