23 results
Search Results
Now showing 1 - 10 of 23
Item Prefetching in a Texture Cache Architecture(The Eurographics Association, 1998) lgehy, Homan; Eldridge, Matthew; Proudfoot, Kekoa; S. N. SpencerTexture mapping has become so ubiquitous in real-time graphics hardware that many systems are able to perform filtered texturing without any penalty in fill rate. The computation rates available in hardware have been outpacing the memory access rates, and texture systems are becoming constrained by memory bandwidth and latency. Caching in conjunction with prefetching can be used to alleviate this problem. In this paper, WC introduce a prefetching texture cache architecture designed to take advantage of the access characteristics of texture mapping. The structures needed are relatively simple and arc amenable to high clock rates. To quantify the robustness of our architecture, we identify a set of six scenes whose texture locality varies over nearly two orders of magnitude and a set 01 four memory systems with varying bandwidths and latencies. Through the use of a cycle-accurate simulation, we demonstrate that even in the presence of a high-latency memory system, our architecture can attain at least 97% of the performance of a zerolatency memory system.Item VIZARD - Visualization Accelerator for Realtime Display(The Eurographics Association, 1997) Knittel, Günter; Straßer, Wolfgang; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderVolume rendering has traditionally been an application for supercomputers, workstation networks or expensive special-purpose hardware. In contrast, this report shows how far we have reached using the other extreme: the low-end PC platform. We have alleviated the mismatch between this demanding application and the limited computational resources of a PC in three ways: several stages in the visualization pipeline are placed into a preprocessing step, the volume rendering algorithm was optimized using a special data compression scheme, and the algorithm has been implemented in hardware as a PCI-compatible coprocessor (lXZ,4RD). These methods give us a frame rate of up to 1OHz for 256 <sup>3</sup> data sets and an acceptable image quality, although the accelerator prototype was built using relatively slow FPGA-technology. In a low-cost environment a coprocessor must not be more expensive than the host itself, and so VIZARD was designed to be manufacturable for a few hundred dollars. The special data compression scheme allows the data set to be placed into the main memory of the PC and eliminates the need for an expensive, separate volume memory. The entire visualization system consists of a portable PC with two built-in accelerator boards. Despite its small size, the system provides perspective raycasting for realtime walk-throughs. Additional features include stereoscopic viewing using shutter glasses and volume animation.Item Parallel Texture Caching(The Eurographics Association, 1999) lgehy, Homan; Eldridge, Matthew; Hanrahan, Pat; A. Kaufmann and W. Strasser and S. Molnar and B.- O. SchneiderThe creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the graphics pipeline. In rasterization, texture caching has become prevalent for reducing texture bandwidth requirements. However, parallel rasterization architectures divide work across multiple functional units, thus potentially decreasing the locality of texture references. For such architectures to scale well, it is necessary to develop efficient parallel texture caching subsystems. We quantify the effects of parallel rasterization on texture locality for a number of rasterization architectures, representing both current commercial products and proposed future architectures. A cycle-accurate simulation of the rasterization system demonstrates the parallel speedup obtained by these systems and quantities inefficiencies due to redundant work, inherent parallel load imbalance, insufftcient memory bandwidth, and resource contention. We find that parallel texture caching works well, and is general enough to work with a wide variety of rasterization architectures.Item Memory Access Patterns of Occlusion-Compatible 3D Image Warping(The Eurographics Association, 1997) Murk, William R.; Bishop, Gary; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderMcMillan and Bishop s 3D image warp can be efficiently implemented by exploiting the coherency of its memory accesses. We analyze this coherency, and present algorithms that take advantage of it. These algorithms traverse the reference image in an occlusion-compatible order, which is an order that can resolve visibility using a painter s algorithm. Required cache sizes are calculated for several one-pass 3D warp algorithms, and we develop a two-pass algorithm which requires a smaller cache size than any of the practical one-pass algorithms. We also show that reference image traversal orders that are occlusion-compatible for continuous images are not always occlusion-compatible when applied to the discrete images used in practice.Item Z3: An Economical Hardware Technique for High-Quality Antialiasing and Transparency(The Eurographics Association, 1999) Jouppi, Norman P.; Chang, Chun-Fa; A. Kaufmann and W. Strasser and S. Molnar and B.- O. SchneiderIn this paper we present an algorithm for low-cost hardware antialiasing and transparency. This technique keeps a central Z value along with compact floating-point Z gradients in the X and Y dimensions for each fragment within a pixel (hence the name Z3). It uses a small fixed amount of storage per pixel. If the visible complexity of the pixel exceeds the storage space available for the pixel, the minimum number of fragments having the closest Z values are merged. This combines different fragments from the same surface, resulting in both storage and processing efficiency. When operating with opaque surfaces, Z3 can provide superior image quality over sparse supersampling methods that use eight samples per pixel while using storage for only three fragments. Z3 also makes the use of large numbers of samples (e.g., 16) feasible in inexpensive hardware, enabling higher quality images. It is simple to implement because it uses a small fixed number of fragments per pixel. Z3 can also provide order-independent transparency even if many transparent surfaces are present. Moreover, unlike the original A-buffer algorithm it correctly antialiases interpenetrating transparent surfaces because it has three-dimensional Z information within each pixel.Item Towards Real-Time Photorealistic Rendering: Challenges and Solutions(The Eurographics Association, 1997) Schilling, Andreas; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderA growing number of real-time applications need graphics with photorealistic quality, especially in the field of training (virtual operation, driving and flightsimulation), but also in the areas of design or ergonomic research. We take a closer look at main deficiencies of today s real time graphics hardware and present solutions for several of the identified problems in the areas of antialiasing and texture-. bump- and reflection mapping. In the second part of the paper, a new method for antialiasing bump maps is explained in more detail.Item PixelFlow: The Realization(The Eurographics Association, 1997) Eyles, John; Molnar, Steven; Poulton, John; Greer, Trey; Lastra, Anselmo; England, Nick; Westover, Lee; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderPixelFlow is an architecture for high-speed, highly realistic image generation, based on the techniques of object-parallelism and image composition. Its initial architecture was described in [MOLN92]. After development by the original team of researchers at the University of North Carolina, and codevelopment with industry partners, Division Ltd. and Hewlett- Packard, PixelFlow now is a much more capable system than initially conceived and its hardware and software systems have evolved considerably. This paper describes the final realization of PixelFlow, along with hardware and software enhancements heretofore unpublished.Item Texture Shaders(The Eurographics Association, 1999) McCool, Michael D.; Heidrich, Wolfgang; A. Kaufmann and W. Strasser and S. Molnar and B.- O. SchneiderExtensions to the texture-mapping support of the abstract graphics hardware pipeline and the OpenGL API are proposed to better support programmable shading, with a unified interface, on a variety of future graphics accelerator architectures. Our main proposals include better support for texture map coordinate generation and an abstract, programmable model for multitexturing. As motivation, we survey several interactive rendering algorithms that target important visual phenomena. With hardware implementation of programmable multitexturing support, implementations of these effects that currently take multiple passes can be rendered in one pass. The generality of our proposed extensions enable efficient implementation of a wide range of other interactive rendering algorithms. The intermediate level of abstraction of our API proposal enables high-level shader metaprogramming toolkits and relatively straightforward implementations, while hiding the details of multitexturing support that are currently fragmenting OpenGL into incompatible dialects.Item Simple Models of the Impact of Overlap in Bucket Rendering(The Eurographics Association, 1998) Chen, Milton; Stall, Gordon; Igehy, Homan; Proudfoot, Kekoa; Hanrahan, Pat; S. N. SpencerBucket rendering is a technique in which the framebuffer is subdivided into coherent regions that are rendered independently. The primary benelits of this technique are the decrease in the size of the working set of framebuffer memory required during rendering and the possibility of processing multiple regions in parallel. The drawbacks of this technique are the cost of computing the regions overlapped by each triangle and the redundant work required in processing triangles multiple times when they overlap multiple regions, Tile size is a critical parameter in bucket rendering systems: smaller tile sizes allow smaller memory footprints and better parallel load balancing but exacerbate the problem of redundant computation. In this paper, we use mathematical models, instrumentation, and trace-driven simulation to evaluate the impact of overlap and conclude that the problem of overlap is limited in scope. If triangles are small, the overlap factor itself is also small. If triangles are large, overlap is high but pixel work dominates the rendering time. In pipelined rendering systems, the worst-case impact of overlap occurs when the area of an input triangle is equal to the area for which the pipeline is balanced-that is, the trianglerelated computation time is equal to the pixel-related computation time. Thus, as the current trends of exponentially increasing triangle rate, slowly increasing screen resolution, and increasing per-pixel computation continue to push this balance point toward triangles with smaller area, bucket rendering systems will be able to utilize smaller tiles efficiently.Item TRIANGLECASTER Extensions To 3BTexturing Units For Accelerated Volume Rendering(The Eurographics Association, 1999) Knittel, Gunter; A. Kaufmann and W. Strasser and S. Molnar and B.- O. SchneiderWe discuss hardware extensions to 3D-texturing units, which are very small but nevertheless remove some substantial performance limits typically found when using a 3D-texturing unit for volume rendering. The underlying algorithm uses only a slight modification of existing method, which limits negative impacts on application software. In particular, the method speeds up the compositing operation, improves texture cache eflciency and allows for early ray termination and empty space skipping. Early ray termination can not be used in the traditional approach. Simulations show that, depending on data set properties, the performance of readily available, low-cost PC graphics accelerators is already suflcient for real-time volume visualization. Thus, in terms ofperformance, the TRIANGLECASTER-extensions can make dedicated volume rendering accelerators unnecessary.