Computational models of visual attention and gaze behavior in virtual reality
 No Thumbnail Available 
Date
2024-03-08
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Virtual reality (VR) is an emerging medium that has the potential to unlock unprecedented experiences. Since the late 1960s, this technology has advanced steadily, and can nowadays be a gateway to a completely different world. VR offers a degree of realism, immersion, and engagement never seen before, and lately we have witnessed how newer virtual content is being continuously created. However, to get the most out of this promising medium, there is still much to learn about people’s visual attention and gaze behavior in the virtual universe. Questions like “What attracts users’ attention?” or “How malleable is the human brain when in a virtual experience?” have no definite answer yet. We argue that it is important to build a principled understanding of viewing and attentional behavior in VR. This thesis presents contributions in two key aspects: Understanding and modeling users’ gaze behavior, and leveraging imperceptible manipulations to improve the virtual experience.
In the first part of this thesis we have focused on developing computational models of gaze behavior in virtual environments. First, and resorting to the well-known concept of saliency, we have devised models of user attention in 360o images and 360o videos that are able to predict which parts of a virtual scene are more likely to draw viewers’ attention. Then, we have designed another two computational models for spatio-temporal attention prediction, one of them able to simulate thousands of virtual observers per second by generating realistic sequences of gaze points in 360o images, and the other one predicting different, yet plausible sequences of fixations on traditional images. Additionally, we have explored how attention works in 3D meshes. All such models have allowed us to delve into the particularities of human gaze behavior under different environments. 
Besides that, we have aimed at achieving a deeper understanding on visual attention in multimodal environments. First, we have exhaustively reviewed a vast literature on the use of additional sensory modalities, like audio, haptics, or proprioception, in virtual reality - also known as multimodality -, and its role and benefits in several disciplines. Then, we have gathered and analyzed the largest dataset of viewing behavior in ambisonic 360o videos to date, finding effects on different factors like type of content, or gender, among others. We have finally analyzed how viewing behavior varies depending on the performed tasks: We have delved into attention in the very specific case of driving scenarios, and we have also studied how significant effects in gaze behavior can be found when performing different tasks in immersive environments.
The second part of this thesis attempts to improve virtual experiences by means of imperceptible manipulations. We have firstly focused on lateral movement in VR, and have devised thresholds for the detection of such manipulations, which we then applied in three key problems in VR that have no definite solution yet, namely 6-DoF viewing of 3-DoF content, overcoming physical space constraints, and reducing motion sickness. On the other hand, we have explored the manipulation of the virtual scene, resorting to the phenomenon of change blindness, and have derived insights and guidelines on how to elicit or avoid such an effect, and how human brains’ limitations affect it.
Description
PhD thesis for Daniel Martin.
Supervised by Prof. Belen Masia and Prof. Diego Gutierrez.