Weber, NicolasGoesele, MichaelMargarita Amor and Markus Hadwiger2014-12-162014-12-162014978-3-905674-59-01727-348Xhttps://doi.org/10.2312/pgv.20141085https://diglib.eg.org/handle/10.2312/pgv.20141085.057-064The continuing evolution of Graphics Processing Units (GPU) has shown rapid performance increases over the years. But with each new hardware generation, the constraints for programming them efficiently have changed. Programs have to be tuned towards one specific hardware to unleash the full potential. This is time consuming and costly as vendors tend to release a new generation every 18 months. It is therefore important to auto-tune GPU code to achieve GPU-specific improvements. Using either static or empirical profiling to adjust parameters or to change the kernel implementation. We introduce a new approach to automatically improve memory access on GPUs. Our system generates an application specific library which abstracts the memory access for complex arrays on the host and GPU side. This allows to optimize the code by exchanging the memory layout without recompiling the application, as all necessary layouts are pre-compiled into the library. Our implementation is able to speedup real-world applications up to an order of magnitude and even outperforms hand-tuned implementations.D.3.3 [Programming Technique]Language Constructs and FeaturesData types and structuresI.3.1 [Computer Graphics]Hardware ArchitectureGraphics processorsI.3.6 [Computer Graphics]Methodology and TechniquesGraphics data structures and data typesAuto-Tuning Complex Array Layouts for GPUs