-
-
Notifications
You must be signed in to change notification settings - Fork 130
Open
Description
batched_vec is currently implemented to use batched_mul (which calls batched gemm) only with some extra reshapes.
Some basic benchmarks (on an RTX PRO 6000 Blackwell) suggest batched gemv is sometimes 1-2% faster:
and converges to be similar in the limit:
but in some cases is consistently slightly slower:
Not sure if this has been considered already. The difference isn't huge, but in the cases where there actually is justification to specifically use gemv, it could be nice to have the option. Maybe this also helps in the backward pass.
Metadata
Metadata
Assignees
Labels
No labels