Visual Insights into Memory Behavior of GPU Ray Tracers

No Thumbnail Available
Date
2024-07
Journal Title
Journal ISSN
Volume Title
Publisher
TUprints
Abstract
Ray tracing is a fundamental rendering technique that typically projects three-dimensional representations of a scene onto a two-dimensional display. This is achieved by perspectively sampling a set of rays into the scene and computing intersections against the relevant geometry. Secondary rays may be sent out from these intersection points, allowing for physically correct global illumination on the reverse photon direction. Real-time rendering has historically used classical rasterization pipelines, which are straightforward to implement on hardware as they form a data-parallel problem projecting the whole scene into the coordinate system of the image. In contrast, task-parallel ray tracing suffers from incoherency between rays. However, recent advances in ray tracing have led to more efficient approaches, resulting in even more efficient embedded hardware implementations. While these approaches are already capable of rendering realistic images, further improvements in run-time performance can compensate for computational time to achieve higher framerates, display resolutions, ray-tracing recursion depths, or reducing the energy footprint of ray-tracing data centers. A fundamental technique for improving ray-tracing performance is the use of bounding-volume hierarchies (BVH), which prevent rays from intersecting the entire scene, especially in occluded or distant regions. In addition to the structural efficiency of a BVH, the primary bottlenecks of GPU ray tracing are memory latency and work distribution. These factors mainly result in more coherent memory accesses, making caching more efficient. Creating programs with the goal of achieving higher caching rates typically requires increased programming efforts and a deep understanding of the hardware, as an additional abstraction layer is introduced, making the memory pipeline less transparent. General-purpose profilers aim to support the implementation process. However, they typically display caching rates based on kernel calls. This is because these values are measured using basic hardware counters that do not distinguish between the context of a memory access. In many cases, it would be useful to have a more detailed representation of memory-related profiling metrics, such as the number of recordings per memory allocation or projections into other domains, such as the framebuffer or the scene geometry. This thesis presents a new method for simulating the GPU memory pipeline accurately. The method uses memory traces exported by dynamic binary instrumentation, which can be applied to any compiled GPU binaries, similar to standard profilers. The exported memory profiles can be used for performance visualization purposes in individual domains, as well as traditional memory profiling metrics that can be displayed in finer granularity than usual. A method for mapping memory metrics onto the original scene is included, allowing users to explore profiling results within the scene domain, making the profiling process more intuitive. In addition, this thesis presents a novel compressed ray-tracing implementation that optimizes its memory footprint by making assumptions about the topological properties of the scene to be rendered. The findings can be used to evaluate and optimize a wide range of ray tracing and ray marching applications in a user-friendly manner.
Description
Citation
Collections