Markerless Multi-view Multi-person Tracking for Combat Sports

Feiz, HosseinLabbé, DavidAndrews, SheldonZordan, Victor2024-08-202024-08-202024978-3-03868-263-9https://doi.org/10.2312/sca.20241162https://diglib.eg.org/handle/10.2312/sca20241162We introduce a novel framework for 3D pose estimation in combat sports. Utilizing a sparse multi-camera setup, our approach employs a computer vision-based tracker to extract 2D pose predictions from each camera view, enforcing consistent tracking targets across views with epipolar constraints and long-term video object segmentation. Through a top-down transformerbased approach, we ensure high-quality 2D pose extraction. We estimate the 3D position via weighted triangulation, spline fitting and extended Kalman filtering. By employing kinematic optimization and physics-based trajectory refinement, we achieve state-of-the-art accuracy and robustness under challenging conditions such as occlusion and rapid movements. Experimental validation on diverse datasets, including a custom dataset featuring elite boxers, underscores the effectiveness of our approach. Additionally, we contribute a valuable sparring video dataset to advance research in multi-person tracking for sports.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Pose Estimation; OptimizationComputing methodologies → Pose EstimationOptimizationMarkerless Multi-view Multi-person Tracking for Combat Sports10.2312/sca.202411622 pages