Automatic Labeling of Training Data by Vowel Recognition for Mouth Shape Recognition with Optical Sensors Embedded in Head-Mounted Display

Nakamura, FumihikoSuzuki, KatsuhiroMasai, KatsutoshiItoh, YutaSugiura, YutaSugimoto, MakiKakehi, Yasuaki and Hiyama, Atsushi2019-09-112019-09-112019978-3-03868-083-31727-530Xhttps://doi.org/10.2312/egve.20191274https://diglib.eg.org:443/handle/10.2312/egve20191274Facial expressions enrich communication via avatars. However, in common immersive virtual reality (VR) systems, facial occlusions by head-mounted displays (HMD) lead to difficulties in capturing users' faces. In particular, the mouth plays an important role in facial expressions because it is essential for rich interaction. In this paper, we propose a technique that classifies mouth shapes into six classes using optical sensors embedded in HMD and gives labels automatically to the training dataset by vowel recognition. We experiment with five subjects to compare the recognition rates of machine learning under manual and automated labeling conditions. Results show that our method achieves average classification accuracy of 99.9% and 96.3% under manual and automated labeling conditions, respectively. These findings indicate that automated labeling is competitive relative to manual labeling, although the former's classification accuracy is slightly higher than that of the latter. Furthermore, we develop an application that reflects the mouth shape on avatars. This application blends six mouth shapes and then applies the blended mouth shapes to avatars.Humancentered computingHuman computer interaction (HCI)Automatic Labeling of Training Data by Vowel Recognition for Mouth Shape Recognition with Optical Sensors Embedded in Head-Mounted Display10.2312/egve.201912749-16