A Visual Approach to Investigating Shared and Global Memory Behavior of CUDA Kernels

dc.contributor.author	Rosen, Paul	en_US
dc.contributor.editor	B. Preim, P. Rheingans, and H. Theisel	en_US
dc.date.accessioned	2015-02-28T15:30:28Z
dc.date.available	2015-02-28T15:30:28Z
dc.date.issued	2013	en_US
dc.description.abstract	We present an approach to investigate the memory behavior of a parallel kernel executing on thousands of threads simultaneously within the CUDA architecture. Our top-down approach allows for quickly identifying any significant differences between the execution of the many blocks and warps. As interesting warps are identified, we allow further investigation of memory behavior by visualizing the shared memory bank conflicts and global memory coalescence, first with an overview of a single warp with many operations and, subsequently, with a detailed view of a single warp and a single operation. We demonstrate the strength of our approach in the context of a parallel matrix transpose kernel and a parallel 1D Haar Wavelet transform kernel.	en_US
dc.description.seriesinformation	Computer Graphics Forum	en_US
dc.identifier.doi	10.1111/cgf.12103	en_US
dc.identifier.issn	1467-8659	en_US
dc.identifier.uri	https://doi.org/10.1111/cgf.12103	en_US
dc.publisher	The Eurographics Association and Blackwell Publishing Ltd.	en_US
dc.subject	Hardware [B.8.2]	en_US
dc.subject	Performance and Reliability	en_US
dc.subject	Performance Analysis and Design Aids	en_US
dc.title	A Visual Approach to Investigating Shared and Global Memory Behavior of CUDA Kernels	en_US