Rempe, DavisGuibas, Leonidas J.Hertzmann, AaronRussell, BryanVillegas, RubenYang, JimeiHolden, Daniel2020-10-042020-10-042020978-3-03868-129-81727-5288https://doi.org/10.2312/sca.20201218https://diglib.eg.org:443/handle/10.2312/sca20201218Existing methods for human motion from video predict 2D and 3D poses that are approximately accurate, but contain visible errors that violate physical constraints, such as feet penetrating the ground and bodies leaning at extreme angles. We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input. We first estimate ground contact timings with a neural network which is trained without hand-labeled data. A physicsbased trajectory optimization then solves for a physically-plausible motion, based on the inputs. We show this process produces motions that are more realistic than those from purely kinematic methods for character animation from dynamic videos. A detailed report that fully describes our method is available at geometry.stanford.edu/projects/human-dynamics-eccv-2020.Computing methodologiesComputer vision problemsMotion captureContact and Human Dynamics from Monocular Video10.2312/sca.202012183-5