Skip to content

Add bisection test#203

Closed
x41lakazam wants to merge 7 commits into
NVIDIA:masterfrom
x41lakazam:bisection_test
Closed

Add bisection test#203
x41lakazam wants to merge 7 commits into
NVIDIA:masterfrom
x41lakazam:bisection_test

Conversation

@x41lakazam

@x41lakazam x41lakazam commented Apr 4, 2024

Copy link
Copy Markdown

In a bisection, each rank is paired with one other rank called 'peer', they both send and receive N messages, this number is bound to the agg_iters parameter.

The selection of this 'peer' rank is defined by the getPeer function:
https://github.com/x41lakazam/nccl-tests/blob/bisection_test/src/bisection.cu#L19

@sjeaugey

sjeaugey commented Apr 4, 2024

Copy link
Copy Markdown
Member

Can't you already do that running the sendrecv_perf test, and setting NCCL_TESTS_SPLIT_MASK=(nranks/2)-1?

Sure, that only works with nranks being a power of two; maybe your code is more generic.

@x41lakazam

Copy link
Copy Markdown
Author

Being able to run this test when nranks is not a power of two is actually important to us

Comment thread src/bisection.cu Outdated
@x41lakazam x41lakazam force-pushed the bisection_test branch 3 times, most recently from 52ebfc6 to 42a079f Compare April 14, 2024 06:47
Add bisection to makefile

Add bisection doc in performance.md
Add bisection to makefile

Add bisection doc in performance.md
@x41lakazam

x41lakazam commented Apr 14, 2024

Copy link
Copy Markdown
Author

@AddyLaddy @sjeaugey

The code is ready to merge from our side
Please let me know what you think about it, thanks

Comment thread src/bisection.cu Outdated
@sjeaugey

sjeaugey commented Mar 4, 2025

Copy link
Copy Markdown
Member

@x41lakazam Given the new NCCL_TESTS_SPLIT option allowing to split ranks in any way and no longer tied to powers of two, could you describe a use case where this allows to test performance in a new/better way?

@AddyLaddy

Copy link
Copy Markdown
Collaborator

Oh, I wanted the cross rail change removed. Why did you restore it?

@x41lakazam

Copy link
Copy Markdown
Author

@sjeaugey I didn't notice this change, does it permit bisection on any number of ranks ?

@sjeaugey

sjeaugey commented Mar 6, 2025

Copy link
Copy Markdown
Member

Yes, if you set NCCL_TESTS_SPLIT="MOD X" with X=N/2 then it should be a bisection test.

@x41lakazam x41lakazam closed this Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants