[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit#968
Conversation
|
cc @rich7420, @ryankert01 |
There was a problem hiding this comment.
using 65,535, you might be using a limit inherited from the Fermi architecture (pre-2012).
On any modern GPU (Compute Capability 3.0 or higher, which is virtually everything in use today), the 1D grid limit for the X-dimension is significantly higher. (at 2^31-1)
Which may or may not be touch before out-of-memory. (but it's good to add a check tho)
|
@ryankert01 thx for suggestion! What if we use |
76021d0 to
1f60d55
Compare
ryankert01
left a comment
There was a problem hiding this comment.
Thanks for the update! Some comments
…d sample length for float32 and float64
|
@ryankert01 do you want to take another look? |
|
@viiccwen lg, ty for the contribution |
Purpose of PR
Fixes a bug in
launch_l2_norm_batch(f64) where attempting to process more than limited samples would result in an invalid CUDA kernel launch. The fix adds early validation to return an error whennum_samplesexceeds the CUDA 1D grid dimension limit.Related Issues or PRs
closes #967
Changes Made
Breaking Changes
Checklist