Search Results

Now showing 1 - 10 of 15
  • Item
    Prefetching in a Texture Cache Architecture
    (The Eurographics Association, 1998) lgehy, Homan; Eldridge, Matthew; Proudfoot, Kekoa; S. N. Spencer
    Texture mapping has become so ubiquitous in real-time graphics hardware that many systems are able to perform filtered texturing without any penalty in fill rate. The computation rates available in hardware have been outpacing the memory access rates, and texture systems are becoming constrained by memory bandwidth and latency. Caching in conjunction with prefetching can be used to alleviate this problem. In this paper, WC introduce a prefetching texture cache architecture designed to take advantage of the access characteristics of texture mapping. The structures needed are relatively simple and arc amenable to high clock rates. To quantify the robustness of our architecture, we identify a set of six scenes whose texture locality varies over nearly two orders of magnitude and a set 01 four memory systems with varying bandwidths and latencies. Through the use of a cycle-accurate simulation, we demonstrate that even in the presence of a high-latency memory system, our architecture can attain at least 97% of the performance of a zerolatency memory system.
  • Item
    A Virtual Memory System Organization for Bit-Mapped Graphics Displays
    (The Eurographics Association, 1989) Barkans, Anthony C.; Richard Grimsdale and Wolfgang Strasser
    Described is a display sub-system, designed for support of a very high speed rendering engine. It provides high-performance graphics to an enVironment that consists of a hierarchy of resizable windows. The concept of virtual memory has been applied with the organization of the virtual to physical address spaces having a unique mapping that fits the organization of a bit-mapped graphics memory display.
  • Item
    Design of a Fast Voxel Processor for Parallel Volume Visualization
    (The Eurographics Association, 1995) Lichtennann, Jan; W. Strasser
    The basics of a parallel real-time volume visualization architecture are introduced. Volume data is divided into subcubes that are dis­ tributed among multiple image processors and stored in their pri­ vate voxel memories. Rays fall into ray segments at the subcube borders. Each image processor is responsible for the ray segments within its assigned subcubes. Results of the ray segments are passed to the image processor where the ray continues. The enu­ meration of resampling points on the ray segments and the interpo­ lation at resampling points is accelerated by the voxel processor. The voxel processor can additionally compute a normalized gradi­ ent vector at a resampling point used as a surface normal estima­ tion for shading calculations. In the paper the focus is on operation and hardware implementation of this pipeline processor and the organization of voxel memory. The instruction set of the voxel pro­ cessor is explained. A performance of 20 images per second for a 2563 voxel volume and 16 image processors can be achieved.
  • Item
    VIZARD - Visualization Accelerator for Realtime Display
    (The Eurographics Association, 1997) Knittel, Günter; Straßer, Wolfgang; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
    Volume rendering has traditionally been an application for supercomputers, workstation networks or expensive special-purpose hardware. In contrast, this report shows how far we have reached using the other extreme: the low-end PC platform. We have alleviated the mismatch between this demanding application and the limited computational resources of a PC in three ways: several stages in the visualization pipeline are placed into a preprocessing step, the volume rendering algorithm was optimized using a special data compression scheme, and the algorithm has been implemented in hardware as a PCI-compatible coprocessor (lXZ,4RD). These methods give us a frame rate of up to 1OHz for 256 <sup>3</sup> data sets and an acceptable image quality, although the accelerator prototype was built using relatively slow FPGA-technology. In a low-cost environment a coprocessor must not be more expensive than the host itself, and so VIZARD was designed to be manufacturable for a few hundred dollars. The special data compression scheme allows the data set to be placed into the main memory of the PC and eliminates the need for an expensive, separate volume memory. The entire visualization system consists of a portable PC with two built-in accelerator boards. Despite its small size, the system provides perspective raycasting for realtime walk-throughs. Additional features include stereoscopic viewing using shutter glasses and volume animation.
  • Item
    Memory Access Patterns of Occlusion-Compatible 3D Image Warping
    (The Eurographics Association, 1997) Murk, William R.; Bishop, Gary; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
    McMillan and Bishop s 3D image warp can be efficiently implemented by exploiting the coherency of its memory accesses. We analyze this coherency, and present algorithms that take advantage of it. These algorithms traverse the reference image in an occlusion-compatible order, which is an order that can resolve visibility using a painter s algorithm. Required cache sizes are calculated for several one-pass 3D warp algorithms, and we develop a two-pass algorithm which requires a smaller cache size than any of the practical one-pass algorithms. We also show that reference image traversal orders that are occlusion-compatible for continuous images are not always occlusion-compatible when applied to the discrete images used in practice.
  • Item
    Towards Real-Time Photorealistic Rendering: Challenges and Solutions
    (The Eurographics Association, 1997) Schilling, Andreas; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
    A growing number of real-time applications need graphics with photorealistic quality, especially in the field of training (virtual operation, driving and flightsimulation), but also in the areas of design or ergonomic research. We take a closer look at main deficiencies of today s real time graphics hardware and present solutions for several of the identified problems in the areas of antialiasing and texture-. bump- and reflection mapping. In the second part of the paper, a new method for antialiasing bump maps is explained in more detail.
  • Item
    PixelFlow: The Realization
    (The Eurographics Association, 1997) Eyles, John; Molnar, Steven; Poulton, John; Greer, Trey; Lastra, Anselmo; England, Nick; Westover, Lee; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
    PixelFlow is an architecture for high-speed, highly realistic image generation, based on the techniques of object-parallelism and image composition. Its initial architecture was described in [MOLN92]. After development by the original team of researchers at the University of North Carolina, and codevelopment with industry partners, Division Ltd. and Hewlett- Packard, PixelFlow now is a much more capable system than initially conceived and its hardware and software systems have evolved considerably. This paper describes the final realization of PixelFlow, along with hardware and software enhancements heretofore unpublished.
  • Item
    Simple Models of the Impact of Overlap in Bucket Rendering
    (The Eurographics Association, 1998) Chen, Milton; Stall, Gordon; Igehy, Homan; Proudfoot, Kekoa; Hanrahan, Pat; S. N. Spencer
    Bucket rendering is a technique in which the framebuffer is subdivided into coherent regions that are rendered independently. The primary benelits of this technique are the decrease in the size of the working set of framebuffer memory required during rendering and the possibility of processing multiple regions in parallel. The drawbacks of this technique are the cost of computing the regions overlapped by each triangle and the redundant work required in processing triangles multiple times when they overlap multiple regions, Tile size is a critical parameter in bucket rendering systems: smaller tile sizes allow smaller memory footprints and better parallel load balancing but exacerbate the problem of redundant computation. In this paper, we use mathematical models, instrumentation, and trace-driven simulation to evaluate the impact of overlap and conclude that the problem of overlap is limited in scope. If triangles are small, the overlap factor itself is also small. If triangles are large, overlap is high but pixel work dominates the rendering time. In pipelined rendering systems, the worst-case impact of overlap occurs when the area of an input triangle is equal to the area for which the pipeline is balanced-that is, the trianglerelated computation time is equal to the pixel-related computation time. Thus, as the current trends of exponentially increasing triangle rate, slowly increasing screen resolution, and increasing per-pixel computation continue to push this balance point toward triangles with smaller area, bucket rendering systems will be able to utilize smaller tiles efficiently.
  • Item
    PAVLOV: A Programmable Architecture for Volume Processing
    (The Eurographics Association, 1998) Kreeger, Kevin; Kaufman, Arie; S. N. Spencer
    We present a parallel 2D mesh connected architecture with SIMD processing elements. The design allows for real-time volume rendering as well as interactive 30 segmentation and 1D feature extraction. This is possible because the SIMD processing elements are programmable, a feature which also allows the use of many different rendering algorithms. We present an algorithm which, with the addition of hardware resources, provides conflict free access to volume slices along any of the three major axes. The volume access conflict has been the main reason why previous similar architectures could not perform real-time volume rendering. We present the performance of preliminary algorithms on a software simulator of the architecture design.
  • Item
    Neon: A Single-Chip 3D Workstation Graphics Accelerator
    (The Eurographics Association, 1998) McCormack, Joel; McNamara, Robert; Gianos, Christopher; Seiler, Larry; Jouppi, Norman P.; Correll, Ken; S. N. Spencer
    High-performance 3D graphics accelerators traditionally require multiple chips on multiple boards, including geometry, rasterizing, pixel processing, and texture mapping chips. These designs are often scalable: they can increase performance by using more chips. Scalability has obvious costs: a minimal configuration needs several chips, and some configurations must replicate texture maps. A less obvious cost is the almost irresistible temptation to replicate chips to increase performance, rather than to design individual chips for higher performance in the first place. In contrast, Neon is a single chip that performs like a multichip design. Neon accelerates OpenGL [19] 3D rendering, as well as X11 [20] and Windows/NT 2D rendering. Since our pin budget limited peak memory bandwidth, we designed Neon from the memory system upward in order to reduce bandwidth requirements. Neon has no special-purpose memories; its eight independent 32-bit memory controllers can access color buffers, 1. depth buffers, stencil buffers, and texture data. To fit our gate budget, we shared logic among different operations with similar implementation requirements, and left floating point calculations to Digital s Alpha CPUs. Neon s performance is between HP s Visualize fx<sup>4</sup> and fx<sup>6</sup>, and is well above SGI s MXE for most operations. Neon-based boards cost much less than these competitors, due to a small part count and use of commodity SDRAMs.