|author||Wojtek Kosior <firstname.lastname@example.org>||2019-04-26 15:07:16 +0200|
|committer||Wojtek Kosior <email@example.com>||2019-04-26 15:07:16 +0200|
1 files changed, 4 insertions, 0 deletions
@@ -115,3 +115,7 @@ Implemented in `src/blockmath.F90`. The multiplication of matrices is achieved b
+As expected, modification of algorithm to reference memory more locally improves the execution speed (difference between naive and better).
+Usage of built-in intrincis functions as well as fortran's array operations may help the compiler better optimize code and further improve speed (difference between better and better2, between naive and dot, between better and matmul). It is not, however, always effective - in this experiment single precision operations were indeed optimized noticably better while there was less or none improvement for bigger precisions floating point types.
+Block array multiplication is supposed to increase temporal locality of operations thus allowing efficient use use L1 cache. Whether this method is really beneficial depends on factors like processor model. The performance gain, if it there is any, is greater for bigger matices. In this experiment this algorithm was most successful for kind 8 (double precision on x86_64) numbers. \ No newline at end of file