aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorWojtek Kosior <kwojtus@protonmail.com>2019-04-26 15:07:16 +0200
committerWojtek Kosior <kwojtus@protonmail.com>2019-04-26 15:07:16 +0200
commit32fc35b009df51a1690b2d7abb44f490908f6eb1 (patch)
tree2baf9abed8210300df30a7de4a60e273866f2c6a
parentb89d73e8281ce7db5ea971718916ebbd1c78b9a3 (diff)
downloadfortran-assignment1-32fc35b009df51a1690b2d7abb44f490908f6eb1.tar.gz
fortran-assignment1-32fc35b009df51a1690b2d7abb44f490908f6eb1.zip
add conclusions
-rw-r--r--README.md4
1 files changed, 4 insertions, 0 deletions
diff --git a/README.md b/README.md
index ef1e38a..616c8a9 100644
--- a/README.md
+++ b/README.md
@@ -115,3 +115,7 @@ Implemented in `src/blockmath.F90`. The multiplication of matrices is achieved b
####KIND=16####
![plot kind=16](res/wykres16.svg)
+##Conclusions##
+As expected, modification of algorithm to reference memory more locally improves the execution speed (difference between naive and better).
+Usage of built-in intrincis functions as well as fortran's array operations may help the compiler better optimize code and further improve speed (difference between better and better2, between naive and dot, between better and matmul). It is not, however, always effective - in this experiment single precision operations were indeed optimized noticably better while there was less or none improvement for bigger precisions floating point types.
+Block array multiplication is supposed to increase temporal locality of operations thus allowing efficient use use L1 cache. Whether this method is really beneficial depends on factors like processor model. The performance gain, if it there is any, is greater for bigger matices. In this experiment this algorithm was most successful for kind 8 (double precision on x86_64) numbers. \ No newline at end of file