From 32fc35b009df51a1690b2d7abb44f490908f6eb1 Mon Sep 17 00:00:00 2001
From: Wojtek Kosior <kwojtus@protonmail.com>
Date: Fri, 26 Apr 2019 15:07:16 +0200
Subject: add conclusions

---
 README.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/README.md b/README.md
index ef1e38a..616c8a9 100644
--- a/README.md
+++ b/README.md
@@ -115,3 +115,7 @@ Implemented in `src/blockmath.F90`. The multiplication of matrices is achieved b
 ####KIND=16####
 ![plot kind=16](res/wykres16.svg)
 
+##Conclusions##
+As expected, modification of algorithm to reference memory more locally improves the execution speed (difference between naive and better).   
+Usage of built-in intrincis functions as well as fortran's array operations may help the compiler better optimize code and further improve speed (difference between better and better2, between naive and dot, between better and matmul). It is not, however, always effective - in this experiment single precision operations were indeed optimized noticably better while there was less or none improvement for bigger precisions floating point types.
+Block array multiplication is supposed to increase temporal locality of operations thus allowing efficient use use L1 cache. Whether this method is really beneficial depends on factors like processor model. The performance gain, if it there is any, is greater for bigger matices. In this experiment this algorithm was most successful for kind 8 (double precision on x86_64) numbers.
\ No newline at end of file
-- 
cgit v1.2.3