Stacked Dual Attention for Joint Dependency Awareness in Pose Reconstruction and Motion Prediction

Guinot, LenaMatsumoto, RyutaroIwata, HiroyasuJean-Marie NormandMaki SugimotoVeronica Sundstedt2023-12-042023-12-042023978-3-03868-218-91727-530Xhttps://doi.org/10.2312/egve.20231326https://diglib.eg.org:443/handle/10.2312/egve20231326Human pose reconstruction and motion prediction in real-time environments have become pivotal areas of research, especially with the burgeoning applications in Virtual and Augmented Reality (VR/AR). This paper presents a novel deep neural network underpinned by a stacked dual attention mechanism, effectively leveraging data from just 6 Inertial Measurement Units (IMUs) to reconstruct human full body poses. While previous works have predominantly focused on image-based techniques, our approach, driven by the sparsity and versatility of sensors, taps into the potential of sensor-based motion data collection. Acknowledging the challenges posed by the under-constrained nature of IMU data and the inherent limitations in available open-source datasets, we innovatively transform motion capture data into an IMU-compatible format. Through a holistic understanding of joint dependencies and temporal dynamics, our method promises enhanced accuracy in motion prediction, even in uncontrolled environments typical of everyday scenarios. Benchmarking our model against prevailing methods, we underscore the superiority of our dual attention mechanism, setting a new benchmark for real-time motion prediction using minimalistic sensor arrangements.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Real-time simulation; Motion processing; ReconstructionComputing methodologies → Realtime simulationMotion processingReconstructionStacked Dual Attention for Joint Dependency Awareness in Pose Reconstruction and Motion Prediction10.2312/egve.20231326177-1848 pages