-
-
Notifications
You must be signed in to change notification settings - Fork 124
Open
Description
On a node with 4× AMD MI300A GPUs, the automatic detection of the setup via
MPIPreferences.use_system_binary(vendor="cray") fails.
julia> MPIPreferences.use_system_binary(vendor="cray")
ERROR: ArgumentError: Collection has multiple elements, must contain exactly 1 element
Stacktrace:
[1] _only
@ ./iterators.jl:1554 [inlined]
[2] only
@ ./iterators.jl:1545 [inlined]
[3] only_or_nothing
@ ~/.julia/packages/MPIPreferences/PLH7x/src/parse_cray_cc.jl:42 [inlined]
[4] cray_gtl
@ ~/.julia/packages/MPIPreferences/PLH7x/src/parse_cray_cc.jl:56 [inlined]
[5] analyze_cray_cc()
@ MPIPreferences.CrayParser ~/.julia/packages/MPIPreferences/PLH7x/src/parse_cray_cc.jl:84
[6] use_system_binary(...)
@ MPIPreferences ~/.julia/packages/MPIPreferences/PLH7x/src/MPIPreferences.jl:180
[7] top-level scope
@ REPL[2]:1Manually editing LocalPreferences.toml to include:
[MPIPreferences]
_format = "1.1"
abi = "MPICH"
binary = "system"
libmpi = "libmpi_cray"
preloads = ["libmpi_gtl_hsa.so"]
cclibs = []
fixes the issue - MPI.jl then runs correctly across GPUs.
The node uses
cray-mpich/8.1.30
rocm/6.2.2
$CRAY_MPICH_ROOTDIR/gtl/lib contains:
libmpi_gtl_cuda.a libmpi_gtl_cuda.so libmpi_gtl_cuda.so.0 libmpi_gtl_cuda.so.0.1.0 libmpi_gtl_hsa.a libmpi_gtl_hsa.so libmpi_gtl_hsa.so.0 libmpi_gtl_hsa.so.0.1.0 libmpi_gtl_ze.a libmpi_gtl_ze.so libmpi_gtl_ze.so.0 libmpi_gtl_ze.so.0.1.0 pkgconfig
cc @vchuravy