Enhancing Robust Category-Agnostic Pose Estimation through Multi-Modal Feature Alignment

Li, Boxuan; Liu, Juan

Enhancing Robust Category-Agnostic Pose Estimation through Multi-Modal Feature Alignment

Files

cgf70368.pdf (3.31 MB)

Date

2026

Authors

Li, Boxuan
Liu, Juan

Publisher

The Eurographics Association and John Wiley & Sons Ltd.

Abstract

Category-Agnostic Pose Estimation (CAPE) aims to detect keypoints for objects of any category using only a few labeled samples, making it a challenging yet crucial task for general-purpose visual understanding. Existing methods rely on either visual or textual inputs, but the lack of cross-modal interaction limits generalization. Without a unified input representation, solely using visual features hinders consistent prediction of same-type keypoints, while fixed textual representations fail to capture the diverse characteristics of same-type keypoints, leading to coarse and over-generalized outputs. To address these limitations, we propose two multi-modal frameworks that perform visual-textual integration at both the feature and decision levels. Our feature-level module leverages cross-modal attention to align and enhance keypoint representations, while the decision-level fusion adaptively combines modality-specific predictions through a modality-consistency loss. Experiments on the large-scale MP-100 dataset demonstrate that our method surpasses existing baselines in both accuracy and robustness. Under the challenging 1-shot setting, our model achieves a 0.58% improvement in PCK0.2 over the state-of-the-art CAPE method.

        @article{10.1111:cgf.70368
,
journal = {Computer Graphics Forum},
title = {{Enhancing Robust Category-Agnostic Pose Estimation through Multi-Modal Feature Alignment
}},
author = {Li, Boxuan and 
Liu, Juan
},
year = {2026
},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.
},
ISSN = {1467-8659
},
DOI = {10.1111/cgf.70368
}
}

URI

https://diglib.eg.org/handle/10.1111/cgf70368
https://doi.org/10.1111/cgf.70368

Collections

45-Issue 2
EG 2026 - Full Papers - CGF 45-Issue 2

Full item page