Real-time 3D Human Body Pose Estimation from Monocular RGB Input

dc.contributor.authorMehta, Dushyant
dc.date.accessioned2021-01-20T08:34:42Z
dc.date.available2021-01-20T08:34:42Z
dc.date.issued2020-10
dc.description.abstractHuman motion capture finds extensive application in movies, games, sports and biomechanical analysis. However, existing motion capture solutions require cumbersome external and/or on-body instrumentation, or use active sensors with limits on the possible capture volume dictated by power consumption. The ubiquity and ease of deployment of RGB cameras makes monocular RGB based human motion capture an extremely useful problem to solve, which would lower the barrier-to entry for content creators to employ motion capture tools, and enable newer applications of human motion capture. This thesis demonstrates the first real-time monocular RGB based motion-capture solutions that work in general scene settings. They are based on developing neural network based approaches to address the ill-posed problem of estimating 3D human pose from a single RGB image, in combination with model based fitting. In particular, the contributions of this work make advances towards three key aspects of real-time monocular RGB based motion capture, namely speed, accuracy, and the ability to work for general scenes. New training datasets are proposed, for single-person and multi-person scenarios, which, together with the proposed transfer learning based training pipeline, allow learning based approaches to be appearance invariant. The training datasets are accompanied by evaluation benchmarks with multiple avenues of fine-grained evaluation. The evaluation benchmarks differ visually from the training datasets, so as to promote efforts towards solutions that generalize to in-the-wild scenes. The proposed task formulations for the single-person and multi-person case allow higher accuracy, and incorporate additional qualities such as occlusion robustness, that are helpful in the context of a full motion capture solution. The multi-person formulations are designed to have a nearly constant inference time regardless of the number of subjects in the scene, and combined with contributions towards fast neural network inference, enable real-time 3D pose estimation for multiple subjects. Combining the proposed learning-based approaches with a model-based kinematic skeleton fitting step provides temporally stable joint angle estimates, which can be readily employed for driving virtual characters.en_US
dc.description.sponsorshipThe work that the thesis is comprised of was supported by ERC Starting GrantCapReal (335545) and ERC Consolidator Grant 4DRepLy (770784)en_US
dc.identifier.urihttps://diglib.eg.org:443/handle/10.2312/2632998
dc.language.isoenen_US
dc.publisherSaarländische Universitäts-und Landesbibliotheken_US
dc.subjectmotion captureen_US
dc.subjecthuman poseen_US
dc.subjecthcien_US
dc.subjectcomputer visionen_US
dc.subjectanimationen_US
dc.subjectmachine learningen_US
dc.titleReal-time 3D Human Body Pose Estimation from Monocular RGB Inputen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
thesis.pdf
Size:
57.81 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections