Universiti Teknologi Malaysia Institutional Repository

Human action interpretation using convolutional neural network: a survey

Malik, Zainab and Shapiai, Mohd. Ibrahim (2022) Human action interpretation using convolutional neural network: a survey. Machine Vision and Applications, 33 (3). pp. 1-23. ISSN 0932-8092

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007/s00138-022-01291-0

Abstract

Human action interpretation (HAI) is one of the trending domains in the era of computer vision. It can further be divided into human action recognition (HAR) and human action detection (HAD). The HAR analyzes frames and provides label(s) to overall video, whereas the HAD localizes actor first, in each frame, and then estimates the action score for the detected region. The effectiveness of a HAI model is highly dependent on the representation of spatiotemporal features and the model’s architectural design. For the effective representation of these features, various studies have been carried out. Moreover, to better learn these features and to get the action score on the basis of these features, different designs of deep architectures have also been proposed. Among various deep architectures, convolutional neural network (CNN) is relatively more explored for HAI due to its lesser computational cost. To provide overview of these efforts, various surveys have been published to date; however, none of these surveys is focusing the features’ representation and design of proposed architectures in detail. Secondly, none of these studies is focusing the pose assisted HAI techniques. This study provides a more detailed survey on existing CNN-based HAI techniques by incorporating the frame level as well as pose level spatiotemporal features-based techniques. Besides these, it offers comparative study on different publicly available datasets used to evaluate HAI models based on various spatiotemporal features’ representations. Furthermore, it also discusses the limitations and challenges of the HAI and concludes that human action interpretation from visual data is still very far from the actual interpretation of human action in realistic videos which are continuous in nature and may contain multiple human beings performing multiple actions sequentially or in parallel.

Item Type:Article
Uncontrolled Keywords:convolution neural networks, human action classification, human action detection, human action recognition, human pose estimation, pose heatmaps
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions:Malaysia-Japan International Institute of Technology
ID Code:102763
Deposited By: Yanti Mohd Shah
Deposited On:24 Sep 2023 03:06
Last Modified:24 Sep 2023 03:06

Repository Staff Only: item control page