Fix fast-cache-load rank synchronization guard by sandyhouse · Pull Request #5398 · NVIDIA/Megatron-LM

sandyhouse · 2026-06-18T03:16:03Z

I, the PR author, have personally reviewed every line of this PR.

What does this PR do ?

Fix the --dataloader-fast-cache-load rank synchronization guard in BlendedMegatronDatasetBuilder.

The existing check used the dataset class object when testing for GPTDatasetConfig, which means the guard could fail to disable rank synchronization in some builder paths. This updates the check to inspect self.config, which is the actual dataset config instance.

Motivation

--dataloader-fast-cache-load is intended to skip the rank-0-first synchronization path when loading dataset indices from an existing cache. However, the inner _build_megatron_dataset_splits() guard checked:

isinstance(self.cls, GPTDatasetConfig)

self.cls is the dataset class, such as GPTDataset, not the config object. As a result, this condition is false even when self.config.fast_cache_load is enabled.

Some multi-prefix fast-cache cases may still work because higher-level builder paths already bypass synchronization, but the inner guard remains incorrect and can affect single-prefix paths, direct calls, or future refactors.
Changes

Use self.config instead of self.cls when checking for GPTDatasetConfig. Ensure synchronize_ranks is forced to False when --dataloader-fast-cache-load is enabled for GPT datasets.

copy-pr-bot · 2026-06-18T03:16:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-06-18T03:16:13Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

Copilot

Pull request overview

This PR fixes the rank-synchronization guard for --dataloader-fast-cache-load in BlendedMegatronDatasetBuilder by checking the actual dataset config instance (self.config) instead of the dataset class (self.cls), ensuring GPT fast-cache-load correctly forces synchronize_ranks=False.

Changes:

Fix fast-cache-load guard in _build_megatron_dataset_splits() to use isinstance(self.config, GPTDatasetConfig).
Add a unit test intended to assert that fast-cache-load disables rank synchronization for GPT datasets.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`megatron/core/datasets/blended_megatron_dataset_builder.py`	Corrects the fast-cache-load type check to apply to the config instance so the synchronization override triggers as intended.
`tests/unit_tests/data/test_builder.py`	Adds a test covering the fast-cache-load synchronization behavior (currently needs a small fix to pass a valid split matrix).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Long Li <lilong@shopee.com>

asolergi-nv · 2026-06-18T16:56:32Z

Hey! Thanks for catching this issue. Can you drop the tests? Thanks!

Copilot AI review requested due to automatic review settings June 18, 2026 03:16

sandyhouse requested review from a team as code owners June 18, 2026 03:16

svcnvidia-nemo-ci marked this pull request as draft June 18, 2026 03:16

Copilot started reviewing on behalf of sandyhouse June 18, 2026 03:16 View session

github-actions Bot added the community-request label Jun 18, 2026

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Comment thread tests/unit_tests/data/test_builder.py

bug fix for dataloader-fast-cache-load

aaa09dc

Signed-off-by: Long Li <lilong@shopee.com>

sandyhouse force-pushed the bugfix/dataset-fast-load branch from aec42d3 to aaa09dc Compare June 18, 2026 03:27

sandyhouse marked this pull request as ready for review June 18, 2026 03:37

svcnvidia-nemo-ci requested a review from a team June 18, 2026 03:37

asolergi-nv self-assigned this Jun 18, 2026

svcnvidia-nemo-ci added the waiting-on-customer Waiting on the original author to respond label Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fast-cache-load rank synchronization guard#5398

Fix fast-cache-load rank synchronization guard#5398
sandyhouse wants to merge 1 commit into
NVIDIA:mainfrom
sandyhouse:bugfix/dataset-fast-load

sandyhouse commented Jun 18, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

asolergi-nv commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sandyhouse commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Motivation

Uh oh!

copy-pr-bot Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

asolergi-nv commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sandyhouse commented Jun 18, 2026 •

edited

Loading

asolergi-nv commented Jun 18, 2026 •

edited

Loading