Search Results

Now showing 1 - 10 of 10
  • Item
    Parallel Texture Caching
    (The Eurographics Association, 1999) lgehy, Homan; Eldridge, Matthew; Hanrahan, Pat; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
    The creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the graphics pipeline. In rasterization, texture caching has become prevalent for reducing texture bandwidth requirements. However, parallel rasterization architectures divide work across multiple functional units, thus potentially decreasing the locality of texture references. For such architectures to scale well, it is necessary to develop efficient parallel texture caching subsystems. We quantify the effects of parallel rasterization on texture locality for a number of rasterization architectures, representing both current commercial products and proposed future architectures. A cycle-accurate simulation of the rasterization system demonstrates the parallel speedup obtained by these systems and quantities inefficiencies due to redundant work, inherent parallel load imbalance, insufftcient memory bandwidth, and resource contention. We find that parallel texture caching works well, and is general enough to work with a wide variety of rasterization architectures.
  • Item
    Memory Access Patterns of Occlusion-Compatible 3D Image Warping
    (The Eurographics Association, 1997) Murk, William R.; Bishop, Gary; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
    McMillan and Bishop s 3D image warp can be efficiently implemented by exploiting the coherency of its memory accesses. We analyze this coherency, and present algorithms that take advantage of it. These algorithms traverse the reference image in an occlusion-compatible order, which is an order that can resolve visibility using a painter s algorithm. Required cache sizes are calculated for several one-pass 3D warp algorithms, and we develop a two-pass algorithm which requires a smaller cache size than any of the practical one-pass algorithms. We also show that reference image traversal orders that are occlusion-compatible for continuous images are not always occlusion-compatible when applied to the discrete images used in practice.
  • Item
    Texture Shaders
    (The Eurographics Association, 1999) McCool, Michael D.; Heidrich, Wolfgang; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
    Extensions to the texture-mapping support of the abstract graphics hardware pipeline and the OpenGL API are proposed to better support programmable shading, with a unified interface, on a variety of future graphics accelerator architectures. Our main proposals include better support for texture map coordinate generation and an abstract, programmable model for multitexturing. As motivation, we survey several interactive rendering algorithms that target important visual phenomena. With hardware implementation of programmable multitexturing support, implementations of these effects that currently take multiple passes can be rendered in one pass. The generality of our proposed extensions enable efficient implementation of a wide range of other interactive rendering algorithms. The intermediate level of abstraction of our API proposal enables high-level shader metaprogramming toolkits and relatively straightforward implementations, while hiding the details of multitexturing support that are currently fragmenting OpenGL into incompatible dialects.
  • Item
    Hybrid Volume and Polygon Rendering with Cube Hardware
    (The Eurographics Association, 1999) Kreeger, Kevin; Kaufman, Arie; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
    We present two methods which connect today s polygon graphics hardware accelerators to Cube-5 volume rendering hardware, the successor to Cube4 The proposed methods allow mixing of both opaque and translucent polygons with volumes on PC class machines, while ensuring the correct compositing order of all objects. Both implementations connect the two hardware acceleration subsystems at the frame buffer. One shares a common DRAM buffer and one run-length encodes images of thin slabs of polygonal data and then combines them in the Cube composite buffer In both realizations, we take advantage of the predictable ordered access to frame buffer storage that is utilized by Cube-5 and the rest of the family of volume rendering accelerators based on the Cube design.
  • Item
    EM-Cube: An Architecture for Low-Cost Real-Time Volume Rendering
    (The Eurographics Association, 1997) Osborne, Rändy; Pfister, Hanspeter; Lauer, Hugh; McKenzie, Neil; Gibson, Sarah; Hiatt, Wally; Ohkarni, TakaHide; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
    EM-Cube is a VLSI architecture for low-cost, high quality volume rendering at full video frame rates. Derived from the Cube4 architecture developed at SUNY at Stony Brook, EM-Cube computes sample points and gradients on-the-fly to project 3-dimensional volume data onto 2-dimensional images with realistic lighting and shading. A modest rendering system based on EM-Cube consists of a PC1 card with four rendering chips (ASICs), four 64Mbit SDRAMs to hold the volume data, and four SRAMs to capture the rendered image. The performance target for this configuration is to render images from a 256<sup>3</sup> x 16 bit data set at 30 frames/sec. The EM-Cube architecture can be scaled to larger volume data-sets and/or higher frame rates by adding additional ASKS, SDRAMs, and SRAMs. This paper addresses three major challenges encountered developing EM-Cube into a practical product: exploiting the bandwidth inherent in the SDRAMs containing the volume data, keeping the pin-count between adjacent ASICs at a tractable level, and reducing the on-chip storage required to hold the intermediate results of rendering.
  • Item
    High-Quality Volume Rendering Using Texture Mapping Hardware
    (The Eurographics Association, 1998) Dachille, Frank; Kreeger, Kevin; Chen, Baoquan; Bitter, Ingmar; Kaufman, Arie; S. N. Spencer
    We present a method Jor volume rendering of regular grids which takes advantage of 3D texture mapping hardware currently, available on graphics workstations. Our method products accurate shading for arbitrary and dynamically changing directional lights, viewing parameters, and transfer functions. This is achieved by hardware interpolating the data values and gradients before software classification and shading. The method works equally well for parallel and perspective projections. We present two approaches for OUT method: one which takes advantage of software ray casting optimizations and another which takes advantage of hardware blending acceleration.
  • Item
    Triangle Scan Conversion using 2D Homogeneous Coordinates
    (The Eurographics Association, 1997) Olano, Marc; Greer, Trey; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
    We present a new triangle scan conversion algorithm that works entirely in homogeneous coordinates. By using homogeneous coordinates, the algorithm avoids costly clipping tests which make pipelining or hardware implementations of previous scan conversion algorithms difficult. The algorithm handles clipping by the addition of clip edges, without the need to actually split the clipped triangle. Furthermore, the algorithm can render true homogeneous triangles, including external triangles that should pass through infinity with two visible sections. An implementation of the algorithm on Pixel-Planes 5 runs about 33% faster than a similar implementation of the previous algorithm.
  • Item
    Antialiased Parameterized Solid Texturing Simplified for Consumer- Level Hardware Implementation
    (The Eurographics Association, 1999) Hart, John C.; Carr, Nate; Karneya, Masaki; Tibbitts, Stephen A.; Coleman, Terrance J.; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
    Procedural solid texturing was introduced fourteen years ago, but has yet to find its way into consumer level graphics hardware for teal-time operation. To this end, a new model is introduced that yields a parameterized function capable of synthesizing the most common procedural solid textures, specifically wood, marble, clouds and fire. This model is simple enough to be implemented in hardware, and can be realized in VLSI with as little as 100,000 gates. The new model also yields a new method for antialiasing synthesized textures. An expression for the necessary box filter width is derived as a function of the texturing parameters, the texture coordinates and the rasterization variables. Given this filter width, a technique for efficiently box filtering the synthesized texture by either mip mapping the color table or using a summed area color table are presented. Examples of the antialiased results are shown.
  • Item
    Optimal Depth Buffer for Low-Cost Graphics Hardware
    (The Eurographics Association, 1999) Lapidous, Eugene; Jiao, Guofang; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
    3D applications using hardware depth buffers for visibility testing are confronted with multiple choices of buffer types, sizes and formats. Some of the options are not exposed through 3D API or may be used by the driver without application s knowledge. As a result, it becomes increasingly difficult to select depth buffer optimal for desired balance between performance and precision. In this paper we provide comparative evaluation of depth precision for main depth buffer types with different size and format combinations. Results indicate that integer storage is preferred for some buffer types, while others achieve maximal depth resolution with floating-point format optimized for known scene parameters. We propose to give 3D applications full control of the depth buffer optimization by supporting multiple storage formats with the same buffer size and exposing them in 3D API. In the search for a unified depth buffer solution, we describe new type of the depth buffer and compare it with other options. Complementary floating-point Z buffer is a combination of a reversed-direction Z buffer and an optimal floating-point storage format. Non-linear mapping and storage format compensate each other s effect on the depth precision; as a result, depth errors become significantly less dependent on the eye-space distance, improving depth resolution by the orders of magnitude in comparison with standard Z buffer. Results show that complementary Z buffer is also superior to inverse W buffer at any storage size. At 16 and 24 bits/pixel, average depth errors of complementary Z buffer remain 2 times larger than for true W buffer utilizing expensive high-precision per-pixel division. However, it provides absolutely best precision at 32 bits/pixel, when errors are limited by floating-point per-vertex input. Results suggest that complementary floating-point Z buffer can be considered as a candidate for replacement of both screen Z and inverse W buffers, at the same time making hardware investment in the true W buffer support less attractive.
  • Item
    Architectural Implications of Hardware-Accelerated Bucket Rendering on the PC
    (The Eurographics Association, 1997) Cox, Michael; Bhandari, Narendra; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
    Bucket rendering is a technique whereby a scene is sorted into screen-space tiles and each tile is rendered independently in turn. We expect hardware-accelerated bucket rendering to become available on the PC, and in this paper we explore the effect of such accelerators on main memory bandwidth, bus bandwidth to the accelerator, and on increased triangle set-up requirements. The most important impact is due to the fact that in general primitives overlap multiple buckets, which is a direct cause of overhead. In this paper we evaluate bucket rendering that uses the most common algorithm for bucket sorting, one based on screen-aligned primitive bounding boxes. We extend previous techniques for analytically evaluating bounding box overlap of buckets, and together with experimental results use these to evaluate accelerators that may support 32x32 pixel tiles, and those that may support 128x128 pixel tiles. We expect the former to be possible with dense SRAM, the latter to be possible with DRAM embedded in a logic process (embedded DRAM). Our results suggest that embedded DRAM implementations can support bucket rendering with bounding box bucket sorting, but that SRAM implementations will likely be at risk with respect to overall system performance when bounding box bucket sorting is employed. These results suggest the requirement for more precise but still low-overhead bucket sorting algorithms when bucket rendering hardware is constrained to 32 x 32 tiles.