Search Results

Now showing 1 - 10 of 61
  • Item
    Extending Graphics Hardware For Occlusion Queries In OpenGL
    (The Eurographics Association, 1998) Bartz, Dirk; Meißner, Michael; Hüttner, Tobias; S. N. Spencer
    For interactive rendering of large polygonal objects, fast visibility queries are necessary to quickly decide whether polygonal objects are visible and need to be rendered. None of the numerous published algorithms provide visibility performance for interactive rendering of large models. In this paper, we propose an OpenGL extension for fast occlusion queries. Added after the depth test stage of the OpenGL rendering pipeline. our algorithm provides fast queries to establish the occlusion of polygonal objects. Furthermore, hardware aspects of this proposal are discussed and possible implementations on two different graphics architectures are presented.
  • Item
    Design of a Fast Voxel Processor for Parallel Volume Visualization
    (The Eurographics Association, 1995) Lichtennann, Jan; W. Strasser
    The basics of a parallel real-time volume visualization architecture are introduced. Volume data is divided into subcubes that are dis­ tributed among multiple image processors and stored in their pri­ vate voxel memories. Rays fall into ray segments at the subcube borders. Each image processor is responsible for the ray segments within its assigned subcubes. Results of the ray segments are passed to the image processor where the ray continues. The enu­ meration of resampling points on the ray segments and the interpo­ lation at resampling points is accelerated by the voxel processor. The voxel processor can additionally compute a normalized gradi­ ent vector at a resampling point used as a surface normal estima­ tion for shading calculations. In the paper the focus is on operation and hardware implementation of this pipeline processor and the organization of voxel memory. The instruction set of the voxel pro­ cessor is explained. A performance of 20 images per second for a 2563 voxel volume and 16 image processors can be achieved.
  • Item
    Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication
    (The Eurographics Association, 2004) Fatahalian, K.; Sugerman, J.; Hanrahan, P.; Tomas Akenine-Moeller and Michael McCool
    Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model's constraint on input reuse and perform an in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrix-matrix multiplication as an obvious candidate for efficient evaluation on GPUs but, surprisingly we find even nearoptimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches. We find the key cause of this inefficiency is that the GPU can fetch less data and yet execute more arithmetic operations per clock than the CPU when both are operating out of their closest caches. The lack of high bandwidth access to cached data will impair the performance of GPU implementations of any computation featuring significant input reuse.
  • Item
    Real-Time Bump Map Synthesis
    (The Eurographics Association, 2001) Kautz, Jan; Heidrich, Wolfgang; Seidel, Hans-Peter; Kurt Akeley and Ulrich Neumann
    In this paper we present a method that automatically synthesizes bump maps at arbitrary levels of detail in real-time. The only input data we require is a normal density function; the bump map is generated according to that function. It is also used to shade the generated bump map. The technique allows to infinitely zoom into the surface, because more (consistent) detail can be created on the fly. The shading of such a surface is consistent when displayed at different distances to the viewer (assuming that the surface structure is self-similar). The bump map generation and the shading algorithm can also be used separately.
  • Item
    GPU Smoke Simulation on Compressed DCT Space
    (The Eurographics Association, 2019) Ishida, Daichi; Ando, Ryoichi; Morishima, Shigeo; Cignoni, Paolo and Miguel, Eder
    This paper presents a novel GPU-based algorithm for smoke animation. Our primary contribution is the use of Discrete Cosine Transform (DCT) compressed space for efficient simulation. We show that our method runs an order of magnitude faster than a CPU implementation while retaining visual details with a smaller memory usage. The key component of our method is an on-the-fly compression and expansion of velocity, pressure and density fields. Whenever these physical quantities are requested during a simulation, we perform data expansion and compression only where necessary in a loop. As a consequence, our simulation allows us to simulate a large domain without actually allocating full memory space for it. We show that albeit our method comes with some extra cost for DCT manipulations, such cost can be minimized with the aid of a devised shared memory usage.
  • Item
    Variable Length Coding for GPU-Based Direct Volume Rendering
    (The Eurographics Association, 2016) Guthe, Stefan; Goesele, Michael; Matthias Hullin and Marc Stamminger and Tino Weinkauf
    The sheer size of volume data sampled in a regular grid requires efficient lossless and lossy compression algorithms that allow for on-the-fly decompression during rendering. While all hardware assisted approaches are based on fixed bit rate block truncation coding, they suffer from degradation in regions of high variation while wasting space in homogeneous areas. On the other hand, vector quantization approaches using texture hardware achieve an even distribution of error in the entire volume at the cost of storing overlapping blocks or bricks. However, these approaches suffer from severe blocking artifacts that need to be smoothed over during rendering. In contrast to existing approaches, we propose to build a lossy compression scheme on top of a state-of-the-art lossless compression approach built on non-overlapping bricks by combining it with straight forward vector quantization. Due to efficient caching and load balancing, the rendering performance of our approach improves with the compression rate and can achieve interactive to real-time frame rates even at full HD resolution.
  • Item
    Memory Access Patterns of Occlusion-Compatible 3D Image Warping
    (The Eurographics Association, 1997) Murk, William R.; Bishop, Gary; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
    McMillan and Bishop s 3D image warp can be efficiently implemented by exploiting the coherency of its memory accesses. We analyze this coherency, and present algorithms that take advantage of it. These algorithms traverse the reference image in an occlusion-compatible order, which is an order that can resolve visibility using a painter s algorithm. Required cache sizes are calculated for several one-pass 3D warp algorithms, and we develop a two-pass algorithm which requires a smaller cache size than any of the practical one-pass algorithms. We also show that reference image traversal orders that are occlusion-compatible for continuous images are not always occlusion-compatible when applied to the discrete images used in practice.
  • Item
    High-Quality Rendering of Glyphs Using Hardware-Accelerated Ray Tracing
    (The Eurographics Association, 2020) Zellmann, Stefan; Aumüller, Martin; Marshak, Nathan; Wald, Ingo; Frey, Steffen and Huang, Jian and Sadlo, Filip
    Glyph rendering is an important scientific visualization technique for 3D, time-varying simulation data and for higherdimensional data in general. Though conceptually simple, there are several different challenges when realizing glyph rendering on top of triangle rasterization APIs, such as possibly prohibitive polygon counts, limitations of what shapes can be used for the glyphs, issues with visual clutter, etc. In this paper, we investigate the use of hardware ray tracing for high-quality, highperformance glyph rendering, and show that this not only leads to a more flexible and often more elegant solution for dealing with number and shape of glyphs, but that this can also help address visual clutter, and even provide additional visual cues that can enhance understanding of the dataset.
  • Item
    Efficient Adaptive Deferred Shading with Hardware Scatter Tiles
    (ACM, 2020) Mallett, Ian; Yuksel, Cem; Seiler, Larry; Yuksel, Cem and Membarth, Richard and Zordan, Victor
    Adaptive shading is an effective mechanism for reducing the number of shaded pixels to a subset of the image resolution with minimal impact on final rendering quality. We present a new scheduling method based on on-chip tiles that, along with relatively minor modifications to the GPU architecture, provides efficient hardware support. As compared to software implementations on current hardware using compute shaders, our approach dramatically reduces memory bandwidth requirements, thereby significantly improving performance and energy use. We also introduce the concept of a fragment pre-shader for programmatically controlling when a fragment shader is invoked, and describe advanced techniques for utilizing our approach to further reduce the number of shaded pixels via temporal filtering, or to adjust rendering quality to maintain stable framerates.
  • Item
    Brook GLES Pi: Democratising Accelerator Programming
    (ACM, 2018) Trompouki, Matina Maria; Kosmidis, Leonidas; Patney, Anjul and Niessner, Matthias
    Nowadays computing is heavily-based on accelerators, however, the cost of the hardware equipment prevents equal access to heterogeneous programming. In this work we present Brook GLES Pi, a port of the accelerator programming language Brook. Our solution, primarily focused on the educational platform Raspberry Pi, allows to teach, experiment and take advantage of heterogeneous programming on any low-cost embedded device featuring an OpenGL ES 2 GPU, democratising access to accelerator programming.