7 results
Search Results
Now showing 1 - 7 of 7
Item A VLSI Design for Fast Vector Normalization(The Eurographics Association, 1993) Knittel, G.; P. F. Lister and R. L. GrimsdaleThe design of a vector normalizer is described. It is an integral part of our graphics subsystemfor scientific visualization, but will be of great use for speeding up any computer graphics architecture.In the actual design, the circuitry handles 3D-vectors with 33 bit two's complement components.The components of the normalized vectors are computed as 16 bit two's complementfixed-point numbers. Due to the overall pipeline architecture, the chip accepts one 3D-vectorand produces one normalized vector each clock.To normalize a 3D-vector, three square operations, two additions, one square root operationand three divisions must be performed. The target clock frequency is 50 MHz, by which theperformance of the chip rates at 450 MOPS.A single-chip VLSI implementation is currently in work, simulation results will be available bythe end of the third quarter '93. We use Mentor 8.2 tools on HP 700 workstations and Toshiba'sTC160G Gate Array technology.Item Efficient Adaptive Deferred Shading with Hardware Scatter Tiles(ACM, 2020) Mallett, Ian; Yuksel, Cem; Seiler, Larry; Yuksel, Cem and Membarth, Richard and Zordan, VictorAdaptive shading is an effective mechanism for reducing the number of shaded pixels to a subset of the image resolution with minimal impact on final rendering quality. We present a new scheduling method based on on-chip tiles that, along with relatively minor modifications to the GPU architecture, provides efficient hardware support. As compared to software implementations on current hardware using compute shaders, our approach dramatically reduces memory bandwidth requirements, thereby significantly improving performance and energy use. We also introduce the concept of a fragment pre-shader for programmatically controlling when a fragment shader is invoked, and describe advanced techniques for utilizing our approach to further reduce the number of shaded pixels via temporal filtering, or to adjust rendering quality to maintain stable framerates.Item Decoupled Coverage Anti-Aliasing(ACM Siggraph, 2015) Wang, Yuxiang; Wyman, Chris; He, Yong; Sen, Pradeep; Petrik Clarberg and Elmar EisemannState-of-the-art methods for geometric anti-aliasing in real-time rendering are based on Multi-Sample Anti-Aliasing (MSAA), which samples visibility more than shading to reduce the number of expensive shading calculations. However, for high-quality results the number of visibility samples needs to be large (e.g., 64 samples/pixel), which requires significant memory because visibility samples are usually 24-bit depth values. In this paper, we present Decoupled Coverage Anti-Aliasing (DCAA), which improves upon MSAA by further decoupling coverage from visibility for high-quality geometric anti-aliasing. Our work is based on the previously-explored idea that all fragments at a pixel can be consolidated into a small set of visible surfaces. Although in the past this was only used to reduce the memory footprint of the G-Buffer for deferred shading with MSAA, we leverage this idea to represent each consolidated surface with a 64-bit binary mask for coverage and a single decoupled depth value, thus significantly reducing the overhead for high-quality anti-aliasing. To do this, we introduce new surface merging heuristics and resolve mechanisms to manage the decoupled depth and coverage samples. Our prototype implementation runs in real-time on current graphics hardware, and results in a significant reduction in geometric aliasing with less memory overhead than 8 MSAA for several complex scenes.Item An array based design for Real-Time Volume Rendering(The Eurographics Association, 1995) Doggett, Michael; W. StrasserThis paper describes a new algorithm and hardware design for the generation of two dimensional images from volume data using the ray casting technique. The algorithm is part of an image generation system that is broken down into three subsystems. The first subsystem stores the input data in a buffered memory using a rearrangement of the original ad dress value. The second subsystem reads data points from the buffered memory and shifts the data to computational el ements in order to complete the viewing calculations for the image synthesis process. The final stage takes the results of the viewing calculations combined with the original input data to complete the surface rendering and pixel compositing to create the final image.This paper focusses on the second subsystem which con sists of two, two dimensional arrays of processing elements. The first array performs a limited angle, single dimension ro tation by shifting the data. The second array performs a two dimensional ray casting operation where viewing rays are as signed to each processing element. The first stage is outlined in this paper and the final rendering stages are the subject of previous work. The hardware design associated with these algorithms is described and tested. It is estimated that this ar chitecture is capable of producing 384 x 384 pixel images at speeds of 15 frames per second for 256 data sets. Real time generation of images of volume data is important in scientific applications of volume visualization and computer graphics applications which use volume graphics.Item Ray Accelerator: Efficient and Flexible Ray Tracing on a Heterogeneous Architecture(© 2017 The Eurographics Association and John Wiley & Sons Ltd., 2017) Barringer, R.; Andersson, M.; Akenine‐Möller, T.; Chen, Min and Zhang, Hao (Richard)We present a hybrid ray tracing system, where the work is divided between the CPU cores and the GPU in an integrated chip, and communication occurs via shared memory. Rays are organized in large packets that can be distributed among the two units as needed. Testing visibility between rays and the scene is mostly performed using an optimized kernel on the GPU, but the CPU can help as necessary. The CPU cores typically handle most or all shading, which makes it easy to support complex appearances. For efficiency, the CPU cores shade whole batches of rays by sorting them on material and shading each material using a vectorized kernel. In addition, we introduce a method to support light paths with arbitrary recursion, such as multiple recursive Whitted‐style ray tracing and adaptive sampling where the result of a ray is examined before sending the next, while still batching up rays for the benefit of GPU‐accelerated traversal and vectorized shading. This allows our system to achieve high rendering performance while maintaining the flexibility to accommodate different rendering algorithms.We present a hybrid ray tracing system, where the work is divided between the CPU cores and the GPU in an integrated chip, and communication occurs via shared memory. Rays are organized in large packets that can be distributed among the two units as needed. Testing visibility between rays and the scene is mostly performed using an optimized kernel on the GPU, but the CPU can help as necessary. The CPU cores typically handle most or all shading, which makes it easy to support complex appearances. For efficiency, the CPU cores shade whole batches of rays by sorting them on material and shading each material using a vectorized kernel.Item A pel-based Volume Rendering Accelerator(The Eurographics Association, 1995) Knittel, Günter; W. StrasserWe discuss the underlying algorithms, design principles and implementation issues of an extremely compact and cost-efficient volume rendering accelerator for PCI-based systems. It operates on classified and shaded data sets which have been coded and compressed usingRedundant Block Compression (RBC), a tech nique originating from 2D-imaging and extended to 3D. This specific encoding scheme reduces drastically the required data traffic between the volume memory and the processing units. Thus, the volume data set can be placed into the main memory of the host, eliminating the need of a separate volume memory. Fur thermore, the tri-Iinear interpolation needed for perspective raycasting is very much simpli fied for RBC-transformed data sets.All in all, these techniques allow a volume ren dering accelerator to be implemented as a sin gle-chip coprocessor, or as an FPGA-based prototype for monochrome data sets as pre sented in this work. Although using a lossy compression scheme, image quality is still high, and expected frame rates are between 2 and 5Hz for typical data sets of 2563 voxels.Item Hardware for Superior Texture Performance(The Eurographics Association, 1995) Knittel, G.; Schilling, A.; Kugler, A.; Straßer, W.; W. StrasserMapping textures onto suIfaces of computer-gener ated objects is a technique which greatly improves the realism of their appearance. Unfortunately, this imposes high computational demands and, even worse, tremendous memory bandwidth require ments on the graphics system. Tight cost frames in the industry in conjunction with ever increasing user expectations make the design of a powerful texture mapping unit a difficult task.To meet these requirements we follow two different approaches. On the technology side, we observe a rapidly emerging technology which offers the com bination of enormous transfer rates and computing power: logic-embedded memories.On the algorithmic side, a common way to reduce data traffic is image compression. Its application to texture mapping, however, is difficult since the decompression must be done at pixel frequency.In this work we will focus on the latter approach, describing the use of a specific compression scheme for texture mapping. It allows the use of a very sim ple and fast decompression hardware, bringing high performance texture mapping to low-cost systems.