Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation
| dc.contributor.author | Ren, Kaiwen | en_US |
| dc.contributor.author | Hu, Lei | en_US |
| dc.contributor.author | Zhang, Zhiheng | en_US |
| dc.contributor.author | Ye, Yongjing | en_US |
| dc.contributor.author | Xia, Shihong | en_US |
| dc.contributor.editor | Christie, Marc | en_US |
| dc.contributor.editor | Han, Ping-Hsuan | en_US |
| dc.contributor.editor | Lin, Shih-Syun | en_US |
| dc.contributor.editor | Pietroni, Nico | en_US |
| dc.contributor.editor | Schneider, Teseo | en_US |
| dc.contributor.editor | Tsai, Hsin-Ruey | en_US |
| dc.contributor.editor | Wang, Yu-Shuen | en_US |
| dc.contributor.editor | Zhang, Eugene | en_US |
| dc.date.accessioned | 2025-10-07T06:03:16Z | |
| dc.date.available | 2025-10-07T06:03:16Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Vision-based regression tasks, such as hand pose estimation, have achieved higher accuracy and faster convergence through representation learning. However, existing representation learning methods often encounter the following issues: the high semantic level of features extracted from images is inadequate for regressing low-level information, and the extracted features include task-irrelevant information, reducing their compactness and interfering with regression tasks. To address these challenges, we propose TI-Net, a highly versatile visual Network backbone designed to construct a Transformation Isomorphic latent space. Specifically, we employ linear transformations to model geometric transformations in the latent space and ensure that TI-Net aligns them with those in the image space. This ensures that the latent features capture compact, low-level information beneficial for pose estimation tasks. We evaluated TI-Net on the hand pose estimation task to demonstrate the network's superiority. On the DexYCB dataset, TI-Net achieved a 10% improvement in the PA-MPJPE metric compared to specialized state-of-the-art (SOTA) hand pose estimation methods. Our code is available at https://github.com/Mine268/TI-Net. | en_US |
| dc.description.sectionheaders | Detecting & Estimating from images | |
| dc.description.seriesinformation | Pacific Graphics Conference Papers, Posters, and Demos | |
| dc.identifier.doi | 10.2312/pg.20251270 | |
| dc.identifier.isbn | 978-3-03868-295-0 | |
| dc.identifier.pages | 10 pages | |
| dc.identifier.uri | https://doi.org/10.2312/pg.20251270 | |
| dc.identifier.uri | https://diglib.eg.org/handle/10.2312/pg20251270 | |
| dc.publisher | The Eurographics Association | en_US |
| dc.rights | Attribution 4.0 International License | |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | CCS Concepts: Computing methodologies → Motion capture; Image representations; Tracking | |
| dc.subject | Computing methodologies → Motion capture | |
| dc.subject | Image representations | |
| dc.subject | Tracking | |
| dc.title | Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation | en_US |