Efficient Stream Compaction on Wide SIMD Many-Core Architectures

dc.contributor.author	Billeter, Markus	en_US
dc.contributor.author	Olsson, Ola	en_US
dc.contributor.author	Assarsson, Ulf	en_US
dc.contributor.editor	David Luebke and Philipp Slusallek	en_US
dc.date.accessioned	2013-10-29T15:48:19Z
dc.date.available	2013-10-29T15:48:19Z
dc.date.issued	2009	en_US
dc.description.abstract	Stream compaction is a common parallel primitive used to remove unwanted elements in sparse data. This allows highly parallel algorithms to maintain performance over several processing steps and reduces overall memory usage. For wide SIMD many-core architectures, we present a novel stream compaction algorithm and explore several variations thereof. Our algorithm is designed to maximize concurrent execution, with minimal use of synchronization. Bandwidth and auxiliary storage requirements are reduced significantly, which allows for substantially better performance. We have tested our algorithms using CUDA on a PC with an NVIDIA GeForce GTX280 GPU. On this hardware, our reference implementation provides a 3× speedup over previous published algorithms.	en_US
dc.description.seriesinformation	High-Performance Graphics	en_US
dc.identifier.doi	10.1145/1572769.1572795
dc.identifier.isbn	978-1-60558-603-8	en_US
dc.identifier.issn	2079-8687	en_US
dc.identifier.uri	https://doi.org/10.1145/1572769.1572795	en_US
dc.publisher	The Eurographics Association	en_US
dc.title	Efficient Stream Compaction on Wide SIMD Many-Core Architectures	en_US