Multi-view image-based editing and rendering through deep learning and optimization
MetadataShow full item record
Computer-generated imagery (CGI) takes a growing place in our everyday environment. Whether it is in video games or movies, CGI techniques are constantly improving in quality but also require ever more qualitative artistic content which takes a growing time to create. With the emergence of virtual and augmented reality, often comes the need to render or re-render assets that exist in our world. To allow widespread use of CGI in applications such as telepresence or virtual visits, the need for manual artistic replication of assets must be removed from the process. This can be done with the help of Image-Based Rendering (IBR) techniques that allow scenes or objects to be rendered in a free-viewpoint manner from a set of sparse input photographs. While this process requires little to no artistic work, it also does not allow for artistic control or editing of scene content. In this dissertation, we explore Multi-view Image Editing and Rendering. To allow casually captured scenes to be rendered with content alterations such as object removal, lighting edition, or scene compositing, we leverage the use of optimization techniques and modern deep-learning. We design our methods to take advantage of all the information present in multi-view content while handling specific constraints such as multi-view coherency. For object removal, we introduce a new plane-based multi-view inpainting algorithm. Planes are a simple yet effective way to fill geometry and they naturally enforce multi-view coherency as inpainting is computed in a shared rectified texture space, allowing us to correctly respect perspective. We demonstrate instance-based object removal at the scale of a street in scenes composed of several hundreds of images. We next address outdoor relighting with a learning-based algorithm that efficiently allows the illumination in a scene to be changed, while removing and synthesizing cast shadows for any given sun position and accounting for global illumination. An approximate geometric proxy built using multi-view stereo is used to generate illumination and shadow related image buffers that guide a neural network. We train this network on a set of synthetic scenes allowing full supervision of the learning pipeline. Careful data augmentation allows our network to transfer to real scenes and provides state of the art relighting results. We also demonstrate the capacity of this network to be used to compose real scenes captured under different lighting conditions and orientation. We then present contributions to image-based rendering quality. We discuss how our carefully designed depth-map meshing and simplification algorithm improve rendering performance and quality of a new learning-based IBR method. Finally, we present a method that combines relighting, IBR, and material analysis. To enable relightable IBR with accurate glossy effects, we extract both material appearance variations and qualitative texture information from multi-view content in the form of several IBR heuristics. We further combine them with path-traced irradiance images that specify the input and target lighting. This combination allows a neural network to be trained to implicitly extract material properties and produce realistic-looking relit viewpoints. Separating diffuse and specular supervision is crucial in obtaining high-quality output.