Joint Attention for Automated Video Editing
Loading...
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
Joint attention refers to the shared focal points of attention for occupants in a space. In this work, we introduce a computational definition of joint attention for the automated editing of meetings in multi-camera environments from the AMI corpus. Using extracted head pose and individual headset amplitude as features, we developed three editing methods: (1) a naive audio-based method that selects the camera using only the headset input, (2) a rule-based edit that selects cameras at a fixed pacing using pose data, and (3) an editing algorithm using LSTM (Long-short term memory) learned joint-attention from both pose and audio data, trained on expert edits. The methods are evaluated qualitatively against the human edit, and quantitatively in a user study with 22 participants. Results indicate that LSTM-trained joint attention produces edits that are comparable to the expert edit, offering a wider range of camera views than audio, while being more generalizable as compared to rule-based methods.
Description
        @inproceedings{10.2312:wiced.20201131,
booktitle = {Workshop on Intelligent Cinematography and Editing},
editor = {Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet},
title = {{Joint Attention for Automated Video Editing}},
author = {Wu, Hui-Yin and Santarra, Trevor and Leece, Michael and Vargas, Rolando and Jhala, Arnav},
year = {2020},
publisher = {The Eurographics Association},
ISSN = {2411-9733},
ISBN = {978-3-03868-127-4},
DOI = {10.2312/wiced.20201131}
}