Aila, TimoKarras, TeroMichael Doggett and Samuli Laine and Warren Hunt2013-10-282013-10-282010978-3-905674-26-22079-8687https://doi.org/10.2312/EGGH/HPG10/113-122This paper proposes a massively parallel hardware architecture for efficient tracing of incoherent rays, e.g. for global illumination. The general approach is centered around hierarchical treelet subdivision of the acceleration structure and repeated queueing/postponing of rays to reduce cache pressure. We describe a heuristic algorithm for determining the treelet subdivision, and show that our architecture can reduce the total memory bandwidth requirements by up to 90% in difficult scenes. Furthermore the architecture allows submitting rays in an arbitrary order with practically no performance penalty.We also conclude that scheduling algorithms can have an important effect on results, and that using fixed-size queues is not an appealing design choice. Increased auxiliary traffic, including traversal stacks, is identified as the foremost remaining challenge of this architecture.Categories and Subject Descriptors (according to ACM CCS): I.3.1 [Computer Graphics]: Hardware Architecture- Graphics Processors I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism-RaytracingArchitecture Considerations for Tracing Incoherent Rays