Áfra, Attila T.Benthin, CarstenWald, IngoMunkberg, JacobUlf Assarsson and Warren Hunt2016-06-172016-06-172016978-3-03868-008-62079-8679https://doi.org/10.2312/hpg.20161198Accelerating ray traversal on data-parallel hardware architectures has received widespread attention over the last few years, but much less research has focused on efficient shading for ray tracing. This is unfortunate since shading for many applications is the single most time consuming operation. To maximize rendering performance, it is therefore crucial to effectively use the processor's wide vector units not only for the ray traversal step itself, but also during shading. This is non-trivial as incoherent ray distributions cause control flow divergence, making high SIMD utilization difficult to maintain. In this paper, we propose a local shading coherence extraction algorithm for CPU-based path tracing that enables efficient SIMD shading. Each core independently traces and sorts small streams of rays that fit into the on-chip cache hierarchy, allowing to extract coherent ray batches requiring similar shading operations, with a very low overhead. We show that operating on small independent ray streams instead of a large global stream is sufficient to achieve high SIMD utilization in shading (90% on average) for complex scenes, while avoiding unnecessary memory traffic and synchronization. For a set of scenes with many different materials, our approach reduces the shading time with 1.9–-3.4 compared to simple structure-of-arrays (SoA) based packet shading. The total rendering speedup varies between 1.2-3 , which is also determined by the ratio of the traversal and shading times.I.3.7 [Computer Graphics]Three Dimensional Graphics and RealismRaytracingLocal Shading Coherence Extraction for SIMD-Efficient Path Tracing on CPUs10.2312/hpg.20161198119-128