Joint Hand and Object Pose Estimation from a Single RGB Image using High-level 2D Constraints

Song, Hao-XuanMu, Tai-JiangMartin, Ralph R.Umetani, NobuyukiWojtan, ChrisVouga, Etienne2022-10-042022-10-0420221467-8659https://doi.org/10.1111/cgf.14685https://diglib.eg.org:443/handle/10.1111/cgf14685Joint pose estimation of human hands and objects from a single RGB image is an important topic for AR/VR, robot manipulation, etc. It is common practice to determine both poses directly from the image; some recent methods attempt to improve the initial poses using a variety of contact-based approaches. However, few methods take the real physical constraints conveyed by the image into consideration, leading to less realistic results than the initial estimates. To overcome this problem, we make use of a set of high-level 2D features which can be directly extracted from the image in a new pipeline which combines contact approaches and these constraints during optimization. Our pipeline achieves better results than direct regression or contactbased optimization: they are closer to the ground truth and provide high quality contact.CCS Concepts: Computing methodologies → ReconstructionComputing methodologies → ReconstructionJoint Hand and Object Pose Estimation from a Single RGB Image using High-level 2D Constraints10.1111/cgf.14685383-39412 pages