Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation

Ren, Kaiwen; Hu, Lei; Zhang, Zhiheng; Ye, Yongjing; Xia, Shihong

Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation

dc.contributor.author	Ren, Kaiwen	en_US
dc.contributor.author	Hu, Lei	en_US
dc.contributor.author	Zhang, Zhiheng	en_US
dc.contributor.author	Ye, Yongjing	en_US
dc.contributor.author	Xia, Shihong	en_US
dc.contributor.editor	Christie, Marc	en_US
dc.contributor.editor	Han, Ping-Hsuan	en_US
dc.contributor.editor	Lin, Shih-Syun	en_US
dc.contributor.editor	Pietroni, Nico	en_US
dc.contributor.editor	Schneider, Teseo	en_US
dc.contributor.editor	Tsai, Hsin-Ruey	en_US
dc.contributor.editor	Wang, Yu-Shuen	en_US
dc.contributor.editor	Zhang, Eugene	en_US
dc.date.accessioned	2025-10-07T06:03:16Z
dc.date.available	2025-10-07T06:03:16Z
dc.date.issued	2025
dc.description.abstract	Vision-based regression tasks, such as hand pose estimation, have achieved higher accuracy and faster convergence through representation learning. However, existing representation learning methods often encounter the following issues: the high semantic level of features extracted from images is inadequate for regressing low-level information, and the extracted features include task-irrelevant information, reducing their compactness and interfering with regression tasks. To address these challenges, we propose TI-Net, a highly versatile visual Network backbone designed to construct a Transformation Isomorphic latent space. Specifically, we employ linear transformations to model geometric transformations in the latent space and ensure that TI-Net aligns them with those in the image space. This ensures that the latent features capture compact, low-level information beneficial for pose estimation tasks. We evaluated TI-Net on the hand pose estimation task to demonstrate the network's superiority. On the DexYCB dataset, TI-Net achieved a 10% improvement in the PA-MPJPE metric compared to specialized state-of-the-art (SOTA) hand pose estimation methods. Our code is available at https://github.com/Mine268/TI-Net.	en_US
dc.description.sectionheaders	Detecting & Estimating from images
dc.description.seriesinformation	Pacific Graphics Conference Papers, Posters, and Demos
dc.identifier.doi	10.2312/pg.20251270
dc.identifier.isbn	978-3-03868-295-0
dc.identifier.pages	10 pages
dc.identifier.uri	https://doi.org/10.2312/pg.20251270
dc.identifier.uri	https://diglib.eg.org/handle/10.2312/pg20251270
dc.publisher	The Eurographics Association	en_US
dc.rights	Attribution 4.0 International License
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	CCS Concepts: Computing methodologies → Motion capture; Image representations; Tracking
dc.subject	Computing methodologies → Motion capture
dc.subject	Image representations
dc.subject	Tracking
dc.title	Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: pg20251270.pdf
Size:: 2.89 MB
Format:: Adobe Portable Document Format

Download

Name:: paper1003_mm1.pdf
Size:: 105.44 KB
Format:: Adobe Portable Document Format

Download

Collections

PG2025 Conference Papers, Posters, and Demos