To follow up the speeding up work issue #715 and PR #767, create a new member function matmul_veclib() that uses ?GEMM() in macos Accelerate/veclib.
Its performance should be benchmarked against the naive matmul() and the sped-up matmul_fast().
The new function matmul_veclib() only needs to work on Apple Silicon. Wrappers for other vendor libs will be follow-up work.
To follow up the speeding up work issue #715 and PR #767, create a new member function
matmul_veclib()that uses?GEMM()in macos Accelerate/veclib.Its performance should be benchmarked against the naive
matmul()and the sped-upmatmul_fast().The new function
matmul_veclib()only needs to work on Apple Silicon. Wrappers for other vendor libs will be follow-up work.