Passive Spatio-Temporal Geometry Reconstruction of Human Faces at Very High Fidelity
The creation of realistic synthetic human faces is one of the most important and at the same time also most challenging topics in computer graphics. The high complexity of the face as well as our familiarity with it renders manual creation and animation impractical. The method of choice is thus to capture both shape and motion of the human face from the real life talent. To date, this is accomplished using active techniques which either augment the face either with markers or project specific illumination patterns onto it. Active techniques currently provide the highest geometric accuracy, but they have severe shortcomings when it comes to capturing performances.In this thesis we present an entirely passive and markerless system to capture and reconstruct facial performances at un-preceded spatio-temporal resolution. The proposed algorithms compute the facial shape and motion at skin-pore resolution from multiple cameras producing per frame temporally compatible geometry.The thesis contains several contributions, both in computer vision and computer graphics. We introduce multiple capture setups that employ off-theshelf cameras and are tailored to capturing the human face. We also propose different illumination setups, including the design and construction of a multi-purpose light-stage, with capabilities that reach beyond of what is required within this thesis. The light stage contains +-500 color LEDs that can be controlled individually to produce arbitrary spatio-temporal illumination patterns. We present a practical calibration technique designed to automatically calibrate face capture setups as well as techniques to geometrically calibrate the light stage.The core contribution of this thesis is a novel multi-view stereo (MVS) algorithm that introduces the concept of Mesoscopic Augmentation. We demonstrate that this algorithm can reconstruct facial skin at quality on-par with active techniques. The system is single shot in that it requires only a single exposure per camera to reconstruct the facial geometry, which enables it to reconstruct even ephemeral poses and makes it well suited for performance capture. We extend the proposed MVS algorithm by the concept of the Episurface, which provides a plausible approximation to the true skin surface in areas where it is occluded by facial hair. We also present the first algorithm to reconstruct sparse facial hair at hair fiber resolution from a single exposure.To track skin movement over time without the use of markers we propose an algorithm that employs optical flow. To overcome inherent limitations of optical flow, such as drift, we introduce the concept of Anchor Frames, which enables us to track facial performances robustly even over long periods of time. Most optical flow algorithms assume some sort of brightness constancy. This assumption, however, is violated for deforming surfaces, as the deformation changes self-shading over time. We present a technique called Ambient Occlusion Cancelling, which leverages the reconstructed per-frame geometry to remove varying self-shading from the images. We demonstrate that this technique complements and substantially improves existing optical flow methods. In addition, we show how the varying self-shading can be used to improve the reconstructed geometry.We hope that the concepts and ideas presented in this thesis will inspire future research in the area of time-varying geometry reconstruction. Already, several concepts presented in thesis have found their way into industry to help produce the next generation CG faces in theme parks, computer games, and feature films.