23 results
Search Results
Now showing 1 - 10 of 23
Item Towards a Neural Graphics Pipeline for Controllable Image Generation(The Eurographics Association and John Wiley & Sons Ltd., 2021) Chen, Xuelin; Cohen-Or, Daniel; Chen, Baoquan; Mitra, Niloy J.; Mitra, Niloy and Viola, IvanIn this paper, we leverage advances in neural networks towards forming a neural rendering for controllable image generation, and thereby bypassing the need for detailed modeling in conventional graphics pipeline. To this end, we present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models. NGP decomposes the image into a set of interpretable appearance feature maps, uncovering direct control handles for controllable image generation. To form an image, NGP generates coarse 3D models that are fed into neural rendering modules to produce view-specific interpretable 2D maps, which are then composited into the final output image using a traditional image formation model. Our approach offers control over image generation by providing direct handles controlling illumination and camera parameters, in addition to control over shape and appearance variations. The key challenge is to learn these controls through unsupervised training that links generated coarse 3D models with unpaired real images via neural and traditional (e.g., Blinn- Phong) rendering functions, without establishing an explicit correspondence between them. We demonstrate the effectiveness of our approach on controllable image generation of single-object scenes. We evaluate our hybrid modeling framework, compare with neural-only generation methods (namely, DCGAN, LSGAN, WGAN-GP, VON, and SRNs), report improvement in FID scores against real images, and demonstrate that NGP supports direct controls common in traditional forward rendering. Code is available at http://geometry.cs.ucl.ac.uk/projects/2021/ngp.Item Decomposing Single Images for Layered Photo Retouching(The Eurographics Association and John Wiley & Sons Ltd., 2017) Innamorati, Carlo; Ritschel, Tobias; Weyrich, Tim; Mitra, Niloy J.; Zwicker, Matthias and Sander, PedroPhotographers routinely compose multiple manipulated photos of the same scene into a single image, producing a fidelity difficult to achieve using any individual photo. Alternately, 3D artists set up rendering systems to produce layered images to isolate individual aspects of the light transport, which are composed into the final result in post-production. Regrettably, these approaches either take considerable time and effort to capture, or remain limited to synthetic scenes. In this paper, we suggest a method to decompose a single image into multiple layers that approximates effects such as shadow, diffuse illumination, albedo, and specular shading. To this end, we extend the idea of intrinsic images along two axes: first, by complementing shading and reflectance with specularity and occlusion, and second, by introducing directional dependence. We do so by training a convolutional neural network (CNN) with synthetic data. Such decompositions can then be manipulated in any off-the-shelf image manipulation software and composited back. We demonstrate the effectiveness of our decomposition on synthetic (i. e., rendered) and real data (i. e., photographs), and use them for photo manipulations, which are otherwise impossible to perform based on single images. We provide comparisons with state-of-the-art methods and also evaluate the quality of our decompositions via a user study measuring the effectiveness of the resultant photo retouching setup. Supplementary material and code are available for research use at geometry.cs.ucl.ac.uk/projects/2017/layered-retouching.Item Factored Neural Representation for Scene Understanding(The Eurographics Association and John Wiley & Sons Ltd., 2023) Wong, Yu-Shiang; Mitra, Niloy J.; Memari, Pooran; Solomon, JustinA long-standing goal in scene understanding is to obtain interpretable and editable representations that can be directly constructed from a raw monocular RGB-D video, without requiring specialized hardware setup or priors. The problem is significantly more challenging in the presence of multiple moving and/or deforming objects. Traditional methods have approached the setup with a mix of simplifications, scene priors, pretrained templates, or known deformation models. The advent of neural representations, especially neural implicit representations and radiance fields, opens the possibility of end-to-end optimization to collectively capture geometry, appearance, and object motion. However, current approaches produce global scene encoding, assume multiview capture with limited or no motion in the scenes, and do not facilitate easy manipulation beyond novel view synthesis. In this work, we introduce a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural presentations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory). Code and data are available at: http://geometry.cs.ucl.ac.uk/projects/2023/factorednerf/.Item Dynamic SfM: Detecting Scene Changes from Image Pairs(The Eurographics Association and John Wiley & Sons Ltd., 2015) Wang, Tuanfeng Y.; Kohli, Pushmeet; Mitra, Niloy J.; Mirela Ben-Chen and Ligang LiuDetecting changes in scenes is important in many scene understanding tasks. In this paper, we pursue this goal simply from a pair of image recordings. Specifically, our goal is to infer what the objects are, how they are structured, and how they moved between the images. The problem is challenging as large changes make point-level correspondence establishment difficult, which in turn breaks the assumptions of standard Structure-from-Motion (SfM). We propose a novel algorithm for dynamic SfM wherein we first generate a pool of potential corresponding points by hypothesizing over possible movements, and then use a continuous optimization formulation to obtain a low complexity solution that best explains the scene recordings, i.e., the input image pairs. We test the algorithm on a variety of examples to recover the multiple object structures and their changes.Item Reforming Shapes for Material-aware Fabrication(The Eurographics Association and John Wiley & Sons Ltd., 2015) Yang, Yong-Liang; Wang, Jun; Mitra, Niloy J.; Mirela Ben-Chen and Ligang LiuAs humans, we regularly associate shape of an object with its built material. In the context of geometric modeling, however, this inter-relation between form and material is rarely explored. In this work, we propose a novel datadriven reforming (i.e., reshaping) algorithm that adapts an input multi-component model for a target fabrication material. The algorithm adapts both the part geometry and the inter-part topology of the input shape to better align with material-aware fabrication requirements. As output, we produce the reshaped model along with respective part dimensions and inter-part junction specifications. We evaluate our algorithm on a range of man-made models and demonstrate a variety of model reshaping examples focusing only on metal and wooden materials.Item Interactive Videos: Plausible Video Editing using Sparse Structure Points(The Eurographics Association and John Wiley & Sons Ltd., 2016) Chang, Chia-Sheng; Chu, Hung-Kuo; Mitra, Niloy J.; Joaquim Jorge and Ming LinVideo remains the method of choice for capturing temporal events. However, without access to the underlying 3D scene models, it remains difficult to make object level edits in a single video or across multiple videos. While it may be possible to explicitly reconstruct the 3D geometries to facilitate these edits, such a workflow is cumbersome, expensive, and tedious. In this work, we present a much simpler workflow to create plausible editing and mixing of raw video footage using only sparse structure points (SSP) directly recovered from the raw sequences. First, we utilize user-scribbles to structure the point representations obtained using structure-from-motion on the input videos. The resultant structure points, even when noisy and sparse, are then used to enable various video edits in 3D, including view perturbation, keyframe animation, object duplication and transfer across videos, etc. Specifically, we describe how to synthesize object images from new views adopting a novel image-based rendering technique using the SSPs as proxy for the missing 3D scene information. We propose a structure-preserving image warping on multiple input frames adaptively selected from object video, followed by a spatio-temporally coherent image stitching to compose the final object image. Simple planar shadows and depth maps are synthesized for objects to generate plausible video sequence mimicking real-world interactions. We demonstrate our system on a variety of input videos to produce complex edits, which are otherwise difficult to achieve.Item Computational Design and Optimization of Non-Circular Gears(The Eurographics Association and John Wiley & Sons Ltd., 2020) Xu, Hao; Fu, Tianwen; Song, Peng; Zhou, Mingjun; Fu, Chi-Wing; Mitra, Niloy J.; Panozzo, Daniele and Assarsson, UlfWe study a general form of gears known as non-circular gears that can transfer periodic motion with variable speed through their irregular shapes and eccentric rotation centers. To design functional non-circular gears is nontrivial, since the gear pair must have compatible shape to keep in contact during motion, so the driver gear can push the follower to rotate via a bounded torque that the motor can exert. To address the challenge, we model the geometry, kinematics, and dynamics of non-circular gears, formulate the design problem as a shape optimization, and identify necessary independent variables in the optimization search. Taking a pair of 2D shapes as inputs, our method optimizes them into gears by locating the rotation center on each shape, minimally modifying each shape to form the gear's boundary, and constructing appropriate teeth for gear meshing. Our optimized gears not only resemble the inputs but can also drive the motion with relatively small torque. We demonstrate our method's usability by generating a rich variety of non-circular gears from various inputs and 3D printing several of them.Item Autocorrelation Descriptor for Efficient CoāAlignment of 3D Shape Collections(Copyright Ā© 2016 The Eurographics Association and John Wiley & Sons Ltd., 2016) Averkiou, Melinos; Kim, Vladimir G.; Mitra, Niloy J.; Chen, Min and Zhang, Hao (Richard)Coāaligning a collection of shapes to a consistent pose is a common problem in shape analysis with applications in shape matching, retrieval and visualization. We observe that resolving among some orientations is easier than Others, for example, a common mistake for bicycles is to align frontātoāback, while even the simplest algorithm would not erroneously pick orthogonal alignment. The key idea of our work is to analyse rotational autocorrelations of shapes to facilitate shape coāalignment. In particular, we use such an autocorrelation measure of individual shapes to decide which shape pairs might have wellāmatching orientations; and, if so, which configurations are likely to produce better alignments. This significantly prunes the number of alignments to be examined, and leads to an efficient, scalable algorithm that performs comparably to stateāofātheāart techniques on benchmark data sets, but requires significantly fewer computations, resulting in 2ā16Ć speed improvement in our tests.Coāaligning a collection of shapes to a consistent pose is a common problem in shape analysis with applications in shape matching, retrieval and visualization. We observe that resolving among some orientations is easier than Others, for example, a common mistake for bicycles is to align frontātoāback, while even the simplest algorithm would not erroneously pick orthogonal alignment. The key idea of our work is to analyse rotational autocorrelations of shapes to facilitate shape coāalignment. In particular, we use such an autocorrelation measure of individual shapes to decide which shape pairs might have wellāmatching orientations; and, if so, which configurations are likely to produce better alignments. This significantly prunes the number of alignments to be examined, and leads to an efficient, scalable algorithm that performs comparably to stateāofātheāart techniques on benchmark data sets, but requires significantly fewer computations, resulting in 2ā16x speed improvement in our tests.Item RigidFusion: RGB-D Scene Reconstruction with Rigidly-moving Objects(The Eurographics Association and John Wiley & Sons Ltd., 2021) Wong, Yu-Shiang; Li, Changjian; NieĆner, Matthias; Mitra, Niloy J.; Mitra, Niloy and Viola, IvanAlthough surface reconstruction from depth data has made significant advances in the recent years, handling changing environments remains a major challenge. This is unsatisfactory, as humans regularly move objects in their environments. Existing solutions focus on a restricted set of objects (e.g., those detected by semantic classifiers) possibly with template meshes, assume static camera, or mark objects touched by humans as moving. We remove these assumptions by introducing RigidFusion. Our core idea is a novel asynchronous moving-object detection method, combined with a modified volumetric fusion. This is achieved by a model-to-frame TSDF decomposition leveraging free-space carving of tracked depth values of the current frame with respect to the background model during run-time. As output, we produce separate volumetric reconstructions for the background and each moving object in the scene, along with its trajectory over time. Our method does not rely on the object priors (e.g., semantic labels or pre-scanned meshes) and is insensitive to the motion residuals between objects and the camera. In comparison to state-of-the-art methods (e.g., Co-Fusion, MaskFusion), we handle significantly more challenging reconstruction scenarios involving moving camera and improve moving-object detection (26% on the miss-detection ratio), tracking (27% on MOTA), and reconstruction (3% on the reconstruction F1) on the synthetic dataset. Please refer the supplementary and the project website for the video demonstration (geometry.cs.ucl.ac.uk/projects/2021/rigidfusion).Item MoCo-Flow: Neural Motion Consensus Flow for Dynamic Humans in Stationary Monocular Cameras(The Eurographics Association and John Wiley & Sons Ltd., 2022) Chen, Xuelin; Li, Weiyu; Cohen-Or, Daniel; Mitra, Niloy J.; Chen, Baoquan; Chaine, RaphaĆ«lle; Kim, Min H.Synthesizing novel views of dynamic humans from stationary monocular cameras is a specialized but desirable setup. This is particularly attractive as it does not require static scenes, controlled environments, or specialized capture hardware. In contrast to techniques that exploit multi-view observations, the problem of modeling a dynamic scene from a single view is significantly more under-constrained and ill-posed. In this paper, we introduce Neural Motion Consensus Flow (MoCo-Flow), a representation that models dynamic humans in stationary monocular cameras using a 4D continuous time-variant function. We learn the proposed representation by optimizing for a dynamic scene that minimizes the total rendering error, over all the observed images. At the heart of our work lies a carefully designed optimization scheme, which includes a dedicated initialization step and is constrained by a motion consensus regularization on the estimated motion flow. We extensively evaluate MoCo-Flow on several datasets that contain human motions of varying complexity, and compare, both qualitatively and quantitatively, to several baselines and ablated variations of our methods, showing the efficacy and merits of the proposed approach. Pretrained model, code, and data will be released for research purposes upon paper acceptance.