Add Per-GPU-Pair Bandwidth Diagnostics for NCCL_TESTS_SPLIT_MASK#376
Open
kzlxd wants to merge 1 commit into
Open
Add Per-GPU-Pair Bandwidth Diagnostics for NCCL_TESTS_SPLIT_MASK#376kzlxd wants to merge 1 commit into
kzlxd wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Per-GPU-Pair Bandwidth Diagnostics for
NCCL_TESTS_SPLIT_MASKDescription
Background
When using
NCCL_TESTS_SPLIT_MASKin multi-node environments (e.g., 2 nodes × 8 GPUs), MPI ranks can be divided into subgroups, where each subgroup represents a cross-node GPU pair with the same local index.This mechanism is particularly useful for analyzing communication performance between specific GPU pairs.
Motivation
Currently,
nccl-testsonly reports aggregated performance across all ranks, which makes it difficult to:Changes
This PR introduces an optional diagnostic feature that enables per-GPU-pair performance visibility when using
NCCL_TESTS_SPLIT_MASK.When enabled via:
the following enhancements are provided:
nccl-tests-formatted results per GPU pairExample Output
Usage
Compatibility
NCCL_TESTS_SPLIT_VERBOSE=1is setBenefits
Notes
NCCL_TESTS_SPLIT_VERBOSEis intended to be used together withNCCL_TESTS_SPLIT_MASK