Almushyti, MunaLi, Frederick W. B.Vidal, Franck P. and Tam, Gary K. L. and Roberts, Jonathan C.2019-09-112019-09-112019978-3-03868-096-3https://doi.org/10.2312/cgvc.20191269https://diglib.eg.org:443/handle/10.2312/cgvc20191269Recognising Human-object interactions (HOIs) in videos is a challenge task especially when a human can interact with multiple objects. This paper attempts to solve the problem of HOIs by proposing a hierarchical framework that analyzes human-object interactions from a video sequence. The framework consists of LSTMs that firstly capture both human motion and temporal object information independently, followed by fusing these information through a bilinear layer to aggregate human-object features, which are then fed to a global deep LSTM to learn high-level information of HOIs. The proposed approach applies an attention mechanism to LSTMs in order to focus on important parts of human and object temporal information.Computing methodologiesHumanobject interactions (HOIs)LSTMCNNHierarchical designTemporal informationAttentionRecognising Human-Object Interactions Using Attention-based LSTMs10.2312/cgvc.20191269135-139