Cache based Optimization of Stencil Computations - An Algorithmic Approach

Shaheen, Mohammed

Cache based Optimization of Stencil Computations - An Algorithmic Approach

Files

shaheen.pdf (1.57 MB)

Date

2013-11-05

Authors

Shaheen, Mohammed

Publisher

Shaheen, Mohammed

Text.PhDThesis

Abstract

We are witnessing a fundamental paradigm shift in computer design. Memory has beenand is becoming more hierarchical. Clock frequency is no longer crucial for performance.The on-chip core count is doubling rapidly. The quest for performance is growing. Thesefacts have lead to complex computer systems which bestow high demands on scientificcomputing problems to achieve high performance.Stencil computation is a frequent and important kernel that is affected by this complexity.Its importance stems from the wide variety of scientific and engineering applications thatuse it. The stencil kernel is a nearest-neighbor computation with low arithmetic intensity,thus it usually achieves only a tiny fraction of the peak performance when executed onmodern computer systems. Fast on-chip memory modules were introduced as the hardwareapproach to alleviate the problem.There are mainly three approaches to address the problem, cache aware, cache oblivious,and automatic loop transformation approaches. In this thesis, comprehensive cache awareand cache oblivious algorithms to optimize stencil computations on structured rectangular2D and 3D grids are presented. Our algorithms observe the challenges for high performancein the previous approaches, devise solutions for them, and carefully balance the solutionbuilding blocks against each other.The many-core systems put the scalability of memory access at stake which has lead tohierarchical main memory systems. This adds another locality challenge for performance.We tailor our frameworks to meet the new performance challenge on these architectures.Experiments are performed to evaluate the performance of our frameworks on syntheticas well as real world problems.