[Feat] add tensorboard for RL trainer #1396

YanhuiDua · 2025-12-25T12:13:07Z

Refactor fit loop in RLTrainer

_initial_evaluate
_rollout_step
_train_step
_sync_weights_and_save
_evaluate_step

Add tensorboard for RLTrainer, current metrics are listed below:

entropy
mismatch (rollout_logprob & old_logprob)
train_metrics: grad_norm, loss, ..
response: response_len, reward, ...
time: each stage(generation, training, sync_weight) for every step

Copilot

Pull request overview

This PR adds comprehensive TensorBoard logging support to the RLTrainer, enabling better monitoring and visualization of RL training metrics. The changes include refactoring the main training loop for better modularity and replacing NumPy operations with PyTorch for consistency.

Key changes:

Integrated TensorBoard writer to log training metrics, response statistics, evaluation scores, and timing information for each training step
Refactored the monolithic fit() method into smaller, focused helper methods (_initial_evaluate, _rollout_step, _train_step, _sync_weights_and_save, _evaluate_step)
Replaced NumPy tensor operations with PyTorch in data preparation and trajectory saving for consistency

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
xtuner/v1/utils/profile.py	Extended the `timer` context manager to optionally log timing metrics to TensorBoard
xtuner/v1/train/rl_trainer.py	Added TensorboardWriter initialization, refactored fit() method into helper methods, added tensorboard logging throughout, replaced numpy with torch tensors, added debug_rollout mode and rollout_steps parameter
xtuner/v1/rl/base/worker.py	Modified fit() to return structured logging information (entropy, mismatch, rollout_is, training metrics) for TensorBoard
xtuner/v1/rl/base/controller.py	Updated fit() to return log_infos from workers instead of discarding them

Comments suppressed due to low confidence (1)

xtuner/v1/train/rl_trainer.py:632

The data_info dictionary from _prepare_train_data contains useful training metrics (advantages, prompt_len, etc.) that are only logged to console. For consistency with other metrics being logged to tensorboard in this PR, consider also logging these to tensorboard using self._writer.add_scalars().

    def _log_data_info(self, rollout_idx: int, data_info: dict):
        """Formats and logs the data statistics dictionary."""
        log_lines = [f"Rollout {rollout_idx} data statistics:"]
        for key, value in data_info.items():
            if isinstance(value, float):
                log_lines.append(f"  - {key:<20}: {value:.4f}")
            else:
                log_lines.append(f"  - {key:<20}: {value}")
        self.logger.info("\n".join(log_lines))

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xtuner/v1/rl/base/worker.py

xtuner/v1/train/rl_trainer.py

HAOCHENYE · 2025-12-25T12:53:01Z

/gemini review

HAOCHENYE · 2025-12-25T12:54:04Z

@gemini-code-assist

YanhuiDua · 2025-12-28T11:39:49Z

Add tensorboard for RLTrainer, current metrics are listed below:

entropy

mismatch (rollout_logprob & old_logprob)

train_metrics: grad_norm, loss, ..

response: response_len, reward, ...

time: each stage(generation, training, sync_weight) for every step

YanhuiDua force-pushed the support_tb branch from f4ab23e to b87d9bf Compare December 25, 2025 12:17

YanhuiDua requested review from HIT-cwh and hhaAndroid December 25, 2025 12:20

HAOCHENYE requested a review from Copilot December 25, 2025 12:34

Copilot started reviewing on behalf of HAOCHENYE December 25, 2025 12:34 View session

Copilot AI reviewed Dec 25, 2025

View reviewed changes

YanhuiDua added 2 commits December 28, 2025 19:34

[Refactor] refactor RL Trainer fit loop and support debug_rollout

12bdb89

[Feat] support tensorboard in RL Trainer

4e15498

YanhuiDua force-pushed the support_tb branch from b87d9bf to 4e15498 Compare December 28, 2025 11:36

YanhuiDua closed this Dec 28, 2025

YanhuiDua reopened this Dec 28, 2025

YanhuiDua requested review from HAOCHENYE and jayhenry December 28, 2025 16:15

HAOCHENYE approved these changes Dec 29, 2025

View reviewed changes

YanhuiDua and others added 3 commits December 29, 2025 17:33

fix comments

50d025f

Merge branch 'main' into support_tb

074ffb8

replace List with list

7abdb3c

YanhuiDua force-pushed the support_tb branch from 9a7397a to 7abdb3c Compare December 29, 2025 12:25

YanhuiDua merged commit 8913a02 into main Dec 29, 2025
6 checks passed

YanhuiDua deleted the support_tb branch December 29, 2025 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] add tensorboard for RL trainer #1396

[Feat] add tensorboard for RL trainer #1396

Uh oh!

YanhuiDua commented Dec 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HAOCHENYE commented Dec 25, 2025

Uh oh!

HAOCHENYE commented Dec 25, 2025

Uh oh!

YanhuiDua commented Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feat] add tensorboard for RL trainer #1396

[Feat] add tensorboard for RL trainer #1396

Uh oh!

Conversation

YanhuiDua commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HAOCHENYE commented Dec 25, 2025

Uh oh!

HAOCHENYE commented Dec 25, 2025

Uh oh!

YanhuiDua commented Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YanhuiDua commented Dec 25, 2025 •

edited

Loading