29 results
Search Results
Now showing 1 - 10 of 29
Item Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication(The Eurographics Association, 2004) Fatahalian, K.; Sugerman, J.; Hanrahan, P.; Tomas Akenine-Moeller and Michael McCoolUtilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model's constraint on input reuse and perform an in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrix-matrix multiplication as an obvious candidate for efficient evaluation on GPUs but, surprisingly we find even nearoptimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches. We find the key cause of this inefficiency is that the GPU can fetch less data and yet execute more arithmetic operations per clock than the CPU when both are operating out of their closest caches. The lack of high bandwidth access to cached data will impair the performance of GPU implementations of any computation featuring significant input reuse.Item Real-Time Bump Map Synthesis(The Eurographics Association, 2001) Kautz, Jan; Heidrich, Wolfgang; Seidel, Hans-Peter; Kurt Akeley and Ulrich NeumannIn this paper we present a method that automatically synthesizes bump maps at arbitrary levels of detail in real-time. The only input data we require is a normal density function; the bump map is generated according to that function. It is also used to shade the generated bump map. The technique allows to infinitely zoom into the surface, because more (consistent) detail can be created on the fly. The shading of such a surface is consistent when displayed at different distances to the viewer (assuming that the surface structure is self-similar). The bump map generation and the shading algorithm can also be used separately.Item Polygon Rendering on a Stream Architecture(The Eurographics Association, 2000) Owens, John D.; Dally, William J.; Kapasi, Ujval J.; Rixner, Scott; Mattson, Peter; Mowery, Ben; I. Buck and G. Humphreys and P. HanrahanThe use of a programmable stream architecture in polygon rendering provides a powerful mechanism to address the high performance needs of today s complex scenes as well as the need for flexibility and programmability in the polygon rendering pipeline. We describe how a polygon rendering pipeline maps into data streams and kernels that operate on streams, and how this mapping is used to implement the polygon rendering pipeline on Imagine, a programmable stream processor. We compare our results on a cycleaccurate simulation of Imagine to representative hardware and software renderers.Item VoxelCache: A Cache-Based Memory Architecture for Volume Graphics(The Eurographics Association, 2003) Kanus, U.; Wetekam, G.; Hirche, J.; M. Doggett and W. Heidrich and W. Mark and A. SchillingThis paper presents a cache-based memory architecture for volume graphics. We describe the memory organization and cache logic to implement a voxel cache based on 43 voxel blocks. We show an efficient prefetching scheme that increases the cache hit ratio to more than 98% in most cases. The performance of the memory system with different types of external memory is demonstrated by a cycle accurate C++ simulation. The VoxelCache memory architecture is designed to be easily adapted to different memory technologies, because all volume graphics specific parts of the memory system are encapsulated inside the on-chip cache. The design is targeted at implementation on off-the-shelf reconfigurable hardware.Item A Programmable Vertex Shader with Fixed-Point SIMD Datapath for Low Power Wireless Applications(The Eurographics Association, 2004) Sohn, Ju-Ho; Woo, Ramchan; Yoo, Hoi-Jun; Tomas Akenine-Moeller and Michael McCoolThe real time 3D graphics becomes one of the attractive applications for 3G wireless terminals although their battery lifetime and memory bandwidth limit the system resources for graphics processing. Instead of using the dedicated hardware engine with complex functions, we propose an efficient hardware architecture of low power vertex shader with programmability. Our architecture includes the following three features: I) a fixed-point SIMD datapath to exploit parallelism in vertex processing while keeping the power consumption low, II) a multithreaded coprocessor interface to decrease unwanted stalls between the main processor and the vertex shader, reducing power consumption by instruction-level power management, III) a programmable vertex engine to increases the datapath throughput by concurrent operations with main processor. Simulation results show that full 3D geometry pipeline can be performed at 7.2M vertices/sec with 115mW power consumption for polygons using the OpenGL lighting model. The improvement is about 10 times greater than that of the latest graphics core with floating-point datapath for wireless applications in terms of processing speed normalized by power consumption, Kvertices/sec per milliwatt.Item Prefiltered Antialiased Lines Using Half-Plane Distance Functions(The Eurographics Association, 2000) McNamara, Robert; McCormack, Joel; Jouppi, Norman P.; I. Buck and G. Humphreys and P. HanrahanWe describe a method to compute high-quality antialiased lines by adding a modest amount of hardware to a fragment generator based upon half-plane edge functions. (A fragment contains the information needed to paint one pixel of a line or a polygon.) We surround an antialiased line with four edge functions to create a long, thin, rectangle. We scale the edge functions so that they compute signed distances from the four edges. At each fragment within the antialiased line, the four distances to the fragment are combined and the result indexes an intensity table. The table is computed by convolving a filter kernel with a prototypical line at various distances from the line s edge. Because the convolutions aren t performed in hardware, we can use wider, more complex filters with better high-frequency rejection than the narrow box filter common to supersampling antialiasing hardware. The result is smoother antialiased lines. Our algorithm is parameterized by the line width and filter radius. These parameters do not affect the rendering algorithm, but only the setup of the edge functions. Our algorithm antialiases line endpoints without special handling. We exploit this to paint small blurry squares as approximations to small antialiased round points. We do not need a different fragment generator for antialiased lines, and so can take advantage of all optimizations introduced in the existing fragment generator.Item Hardware-based Simulation and Collision Detection for Large Particle Systems(The Eurographics Association, 2004) Kolb, A.; Latta, L.; Rezk-Salama, C.; Tomas Akenine-Moeller and Michael McCoolParticle systems have long been recognized as an essential building block for detail-rich and lively visual environments. Current implementations can handle up to 10,000 particles in real-time simulations and are mostly limited by the transfer of particle data from the main processor to the graphics hardware (GPU) for rendering. This paper introduces a full GPU implementation using fragment shaders of both the simulation and rendering of a dynamically-growing particle system. Such an implementation can render up to 1 million particles in real-time on recent hardware. The massively parallel simulation handles collision detection and reaction of particles with objects for arbitrary shape. The collision detection is based on depth maps that represent the outer shape of an object. The depth maps store distance values and normal vectors for collision reaction. Using a special texturebased indexing technique to represent normal vectors, standard 8-bit textures can be used to describe the complete depth map data. Alternately, several depth maps can be stored in one floating point texture. In addition, a GPU-based parallel sorting algorithm is introduced that can be used to perform a depth sorting of the particles for correct alpha blending.Item A Flexible Simulation Framework for Graphics Architectures(The Eurographics Association, 2004) Sheaffer, J. W.; Luebke, D.; Skadron, K.; Tomas Akenine-Moeller and Michael McCoolIn this paper we describe a multipurpose tool for analysis of the performance characteristics of computer graphics hardware and software. We are developing Qsilver, a highly configurable micro-architectural simulator of the GPU that uses the Chromium system's ability to intercept and redirect an OpenGL stream. The simulator produces an annotated trace of graphics commands using Chromium, then runs the trace through a cycle-timer model to evaluate time-dependent behaviors of the various functional units. We demonstrate the use of Qsilver on a simple hypothetical architecture to analyze performance bottlenecks, to explore new GPU microarchitectures, and to model power and leakage properties. One innovation we explore is the use of dynamic voltage scaling across multiple clock domains to achieve significant energy savings at almost negligible performance cost. Finally, we discuss how other architectural features and experiments might be incorporated into the Qsilver framework.Item Interactive Rendering of Atmospheric Scattering Effects Using Graphics Hardware(The Eurographics Association, 2002) Dobashi, Yoshinori; Yamamoto, Tsuyoshi; Nishita, Tomoyuki; Thomas Ertl and Wolfgang Heidrich and Michael DoggettTo create realistic images using computer graphics, an important element to consider is atmospheric scattering, that is, the phenomenon by which light is scattered by small particles in the air. This effect is the cause of the light beams produced by spotlights, shafts of light, foggy scenes, the bluish appearance of the earth s atmosphere, and so on. This paper proposes a fast method for rendering the atmospheric scattering effects based on actual physical phenomena. In the proposed method, look-up tables are prepared to store the intensities of the scattered light, and these are then used as textures. Realistic images are then created at interactive rates by making use of graphics hardware.Item PixelView: A View-Independent Graphics Rendering Architecture(The Eurographics Association, 2004) Stewart, J.; Bennett, E.P.; McMillan, L.; Tomas Akenine-Moeller and Michael McCoolWe present a new computer graphics rendering architecture that allows all possible views to be extracted from a single traversal of a scene description. It supports a wide range of rendering primitives, including polygonal meshes, higher-order surface primitives (e.g. spheres, cylinders, and parametric patches), point-based models, and image-based representations. To demonstrate our concept, we have implemented a hardware prototype that includes a 4D, z-buffered frame-buffer supporting dynamic view selection at the time of raster scan-out. As a result, our implementation supports extremely low display-update latency. The PixelView architecture also supports rendering of the same scene for multiple eyes, which provides immediate benefits for stereo viewing methods like those used in today s virtual environments, particularly when there are multiple participants. In the future, view-independent graphics rendering hardware will also be essential to support the multitude of viewpoints required for real-time autostereoscopic and holographic display devices.