Token-Based Dual-Codebook Learning for Robust 3D Pose Lifting

Loading...
Thumbnail Image
Date
2026
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
3D human pose estimation from monocular images is inherently challenging due to frequent occlusions, which introduce significant ambiguity in joint visibility. For instance, regression-based methods are highly sensitive to these ambiguities, often leading to unstable and jittery pose estimates. To overcome these limitations, recent token-based methods discretize poses into structured representations and better capture joint dependencies. However, most existing approaches operate in a frame-wise manner, neglecting temporal continuity and consequently suffering from time-inconsistent predictions. Therefore, we propose a spatio-temporal token-based framework for 3D human pose estimation that explicitly models both spatial and temporal dependencies. In specific, a spatial and temporal tokenizer decomposes 3D pose sequences into discrete spatial and temporal tokens via a dual-codebook design. To predict these tokens from 2D pose sequences, we further develop spatial and temporal token classifiers based on a SemGCN–GraphGRU architecture, enabling effective temporal reasoning while preserving skeletal structure. Extensive experiments on the Human3.6M dataset demonstrate that our method achieves state-of-the-art performance among short-sequence methods, while significantly reducing high-frequency jitter and producing smooth, physically plausible 3D pose sequences.
Description

        
@inproceedings{
10.2312:egs.20261005
, booktitle = {
Eurographics 2026 - Short Papers
}, editor = {
Musialski, Przemyslaw
and
Lim, Isaak
}, title = {{
Token-Based Dual-Codebook Learning for Robust 3D Pose Lifting
}}, author = {
Jeon, Minsu
and
Kim, Janghyun
and
Park, Jinsun
}, year = {
2026
}, publisher = {
The Eurographics Association
}, ISSN = {
2309-5059
}, ISBN = {
978-3-03868-299-8
}, DOI = {
10.2312/egs.20261005
} }
Citation