Fix the issue of repeated inference in multi-turn scheduler. #7279

Simon-ss7 · 2026-01-04T11:07:44Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

Bug information

version: 3.12.0
infer_engine: vllm
model: qwen3

When I tried to follow the demo under example/train/grpo/plugin.py/ToolCallScheduler to run multi-turn training, I encountered the following issues:

In colocate mode, the launch fails with a null pointer error indicating that self.infer_engine is None.

In server mode, when current_turn > 1, the returned responses contain duplicated inference results. After inspecting the _add_request method in vllm_engine.py, I found that the inference content for each step does not append the results from previous steps.

PR information

I found that in colocate mode, a tokenizer is passed when initializing the multi_turn_scheduler in rollout_mixin.py, and in server mode, self.tokenizer is also available inside multi_turn_scheduler. Based on this, I adjusted the logic for obtaining the tokenizer in the step method of plugin.py/ToolCallScheduler.

I found that in server mode, the run method of MultiTurnScheduler defines {'role': 'assistant', 'content': None} at lines 352–354 to append inference results, but when current_turn > 1, this entry is removed at lines 236–238. As a result, when I inspected the _add_request method in vllm_engine.py, the input_ids always corresponded to the initial inference content.

After adjusting the logic in the run method so that the appended assistant=None entry is not removed when current_turn > 1, the inference content became correct.

Experiment results

… inference in server mode.

gemini-code-assist · 2026-01-04T11:07:59Z

Summary of Changes

Hello @Simon-ss7, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers crucial bug fixes for the multi-turn scheduler, specifically targeting issues that arise during training with vllm as the inference engine. It rectifies a null pointer error in colocate mode by ensuring proper tokenizer access and eliminates the problem of duplicated inference results in server mode by refining the conversation history management. These adjustments are vital for accurate and stable multi-turn inference workflows.

Highlights

Colocate Mode Fix: Resolved a null pointer error in colocate mode by adjusting the tokenizer acquisition logic in plugin.py. The tokenizer is now correctly accessed via self.tokenizer instead of self.infer_engine.default_template.tokenizer, preventing self.infer_engine from being None.
Server Mode Inference Duplication Fix: Addressed an issue in server mode where inference results were duplicated for current_turn > 1. The logic in multi_turn.py was refined to prevent the premature removal of the {'role': 'assistant', 'content': None} entry, ensuring that subsequent inference steps correctly append to the conversation history.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively resolves two bugs related to multi-turn inference. The first fix in plugin.py correctly sources the tokenizer, preventing a None access error in colocate mode, which is a solid improvement for robustness. The second, more subtle fix in multi_turn.py adjusts the logic for handling dummy assistant messages in multi-turn conversations. By moving the removal of the dummy message to after the inference call, it ensures the conversation history is correctly passed to the inference engine, fixing the repeated inference issue. The changes are logical and well-explained. I've added one suggestion to improve the clarity of the fix in multi_turn.py for future maintainability.

gemini-code-assist · 2026-01-04T11:11:56Z

swift/plugin/multi_turn.py

+            if current_turn > 1 and not messages[-1]['content']:
+                # If it's not the first turn and the last message content is dummy, remove the response
+                remove_response(messages)


This change correctly removes the dummy assistant message after the inference call, which is the key to fixing the repeated inference bug. To improve code clarity and future maintainability, I suggest enhancing the comment to explain why the dummy message was intentionally kept during the infer_async call. This will help other developers understand this subtle but crucial part of the multi-turn logic.

Suggested change

if current_turn > 1 and not messages[-1]['content']:

# If it's not the first turn and the last message content is dummy, remove the response

remove_response(messages)

if current_turn > 1 and not messages[-1]['content']:

# The dummy assistant message was intentionally kept during `infer_async`

# to ensure correct history processing by the template. It is now removed

# before appending the new completion.

remove_response(messages)

hjh0119 · 2026-01-06T07:24:06Z

LGTM
thanks

Simon-ss7 added 2 commits January 4, 2026 16:29

Fix the issue where self.infer_engine is None in colocate mode.

242589d

Fix the issue of repeated inference during multi-turn completion-mode…

187f773

… inference in server mode.

gemini-code-assist bot reviewed Jan 4, 2026

View reviewed changes

Modify the comments according to Gemini’s suggestion.

dcf889e

hjh0119 approved these changes Jan 6, 2026

View reviewed changes

hjh0119 merged commit 28c2479 into modelscope:main Jan 6, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the issue of repeated inference in multi-turn scheduler. #7279

Fix the issue of repeated inference in multi-turn scheduler. #7279

Uh oh!

Simon-ss7 commented Jan 4, 2026

Uh oh!

gemini-code-assist bot commented Jan 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 4, 2026

Uh oh!

hjh0119 commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix the issue of repeated inference in multi-turn scheduler. #7279

Fix the issue of repeated inference in multi-turn scheduler. #7279

Uh oh!

Conversation

Simon-ss7 commented Jan 4, 2026

PR type

Bug information

PR information

Experiment results

Uh oh!

gemini-code-assist bot commented Jan 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

hjh0119 commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants