A Hardware Redundancy and Recovery Mechanism for Reliable Scientific Computation on Graphics Processors

dc.contributor.authorSheaffer, Jeremy W.en_US
dc.contributor.authorLuebke, David P.en_US
dc.contributor.authorSkadron, Kevinen_US
dc.contributor.editorMark Segal and Timo Ailaen_US
dc.date.accessioned2013-10-28T10:17:33Z
dc.date.available2013-10-28T10:17:33Z
dc.date.issued2007en_US
dc.description.abstractGeneral purpose computation on graphics processors (GPGPU) has rapidly evolved since the introduction of commodity programmable graphics hardware. With the appearance of GPGPU computation-oriented APIs such as AMD s Close to the Metal (CTM) and NVIDIA s Compute Unified Device Architecture (CUDA), we begin to see GPU vendors putting financial stakes into this non-graphics, one-time niche market. Major supercomputing installations are building GPGPU clusters to take advantage of massively parallel floating point capabilities, and Folding@Home has even released a GPU port of its protein folding distributed computation client. But in order for GPGPU to truly become important to the supercomputing community, vendors will have to address the heretofore unimportant reliability concerns of graphics processors. We present a hardware redundancy-based approach to reliability for general purpose computation on GPUs that requires minimal change to existing GPU architectures. Upon detecting an error, the system invokes an automatic recovery mechanism that only recomputes erroneous results. Our results show that our technique imposes less than a 1.5× performance penalty and saves energy for GPGPU but is completely transparent to general graphics and does not affect the performance of the games that drive the market.en_US
dc.description.seriesinformationSIGGRAPH/Eurographics Workshop on Graphics Hardwareen_US
dc.identifier.isbn978-3-905673-47-0en_US
dc.identifier.issn1727-3471en_US
dc.identifier.urihttps://doi.org/10.2312/EGGH/EGGH07/055-064en_US
dc.publisherThe Eurographics Associationen_US
dc.titleA Hardware Redundancy and Recovery Mechanism for Reliable Scientific Computation on Graphics Processorsen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
055-064.pdf
Size:
248.18 KB
Format:
Adobe Portable Document Format