A Multimodal Dataset for Dialogue Intent Recognition through Human Movement and Nonverbal Cues

Loading...
Thumbnail Image
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
This paper presents a multimodal dataset designed to advance dialogue intent recognition through skeleton-based representations and temporal human movement features. Rather than proposing a new model, our objective is to provide a high-quality, annotated dataset that captures subtle nonverbal cues preceding human speech and interaction. The dataset includes skeletal joint coordinates, facial orientation, and contextual object data (e.g., microphone positions), collected from diverse participants across varied conversational scenarios. In the future research, we will benchmark three types of learning methods and offer comparative insights. The benchmark three types of learning methods will be handcrafted feature models, sequence models (LSTM), and graph-based models (GCN). This resource aims to facilitate the development of more natural, sensor-free, and data-driven human-computer interaction systems by providing a robust foundation for training and evaluation.
Description

        
@inproceedings{
10.2312:pg.20251310
, booktitle = {
Pacific Graphics Conference Papers, Posters, and Demos
}, editor = {
Christie, Marc
and
Han, Ping-Hsuan
and
Lin, Shih-Syun
and
Pietroni, Nico
and
Schneider, Teseo
and
Tsai, Hsin-Ruey
and
Wang, Yu-Shuen
and
Zhang, Eugene
}, title = {{
A Multimodal Dataset for Dialogue Intent Recognition through Human Movement and Nonverbal Cues
}}, author = {
Lin, Shu-Wei
and
Zhang, Jia-Xiang
and
Lu, Jun-Fu Lin
and
Huang, Yi-Jheng
and
Zhang, Junpo
}, year = {
2025
}, publisher = {
The Eurographics Association
}, ISBN = {
978-3-03868-295-0
}, DOI = {
10.2312/pg.20251310
} }
Citation