Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 1.34 KB

File metadata and controls

31 lines (22 loc) · 1.34 KB

Cache-oblivious matrix transpose algorithm for Computer Architecture, ITMO University (Dec 2020)

This program compares different approaches of transposing a square matrix:

  1. Naive algorithm (two loops)
  2. Cache-oblivious algorithm w/o blocks
  3. Cache-oblivious algorithm w/ 64x64 and 512x512 blocks

The 3rd approach is the first two combined, i.e. it tries to minimize the number of cache misses while handling "small" blocks as in the first approach.

These algorithms were compared by transposing the same square matrices. Each algorithm was applied 20 times to each of the matrices of size from 2500 to 50,000 with a step of 2500.

The matrices are filled randomly with numbers from 0 to 9 stored in char8_t type (which was introduced in C++20).

The benchmarks were made by compiling the sources using GCC 10.2.0 from MinGW-w64 in Release mode with -O3 optimization level. The program was run on an Intel Core i7 6850K running machine.

Benchmark

You can see that the 3rd approach turned out to be the fastest one, almost independent of the block size. Cache-oblivious algorithm without blocks (i.e. with 1x1 blocks) seems to be the least efficient.

Full statistics can be found here.