A Multimodal Dataset for Dialogue Intent Recognition through Human Movement and Nonverbal Cues
Loading...
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
This paper presents a multimodal dataset designed to advance dialogue intent recognition through skeleton-based representations and temporal human movement features. Rather than proposing a new model, our objective is to provide a high-quality, annotated dataset that captures subtle nonverbal cues preceding human speech and interaction. The dataset includes skeletal joint coordinates, facial orientation, and contextual object data (e.g., microphone positions), collected from diverse participants across varied conversational scenarios. In the future research, we will benchmark three types of learning methods and offer comparative insights. The benchmark three types of learning methods will be handcrafted feature models, sequence models (LSTM), and graph-based models (GCN). This resource aims to facilitate the development of more natural, sensor-free, and data-driven human-computer interaction systems by providing a robust foundation for training and evaluation.
Description
@inproceedings{10.2312:pg.20251310,
booktitle = {Pacific Graphics Conference Papers, Posters, and Demos},
editor = {Christie, Marc and Han, Ping-Hsuan and Lin, Shih-Syun and Pietroni, Nico and Schneider, Teseo and Tsai, Hsin-Ruey and Wang, Yu-Shuen and Zhang, Eugene},
title = {{A Multimodal Dataset for Dialogue Intent Recognition through Human Movement and Nonverbal Cues}},
author = {Lin, Shu-Wei and Zhang, Jia-Xiang and Lu, Jun-Fu Lin and Huang, Yi-Jheng and Zhang, Junpo},
year = {2025},
publisher = {The Eurographics Association},
ISBN = {978-3-03868-295-0},
DOI = {10.2312/pg.20251310}
}
