Create `SimpleArray::matmul_veclib()` to wrap `?GEMM()` in macos Accelerate/veclib

To follow up the speeding up work issue #715 and PR #767, create a new member function `matmul_veclib()` that uses `?GEMM()` in macos Accelerate/veclib.

Its performance should be benchmarked against the naive `matmul()` and the sped-up `matmul_fast()`.

The new function `matmul_veclib()` only needs to work on Apple Silicon.  Wrappers for other vendor libs will be follow-up work.