基于CNN-LSTM的多视角人体动作识别的级联姿态特征

Signals Pub Date : 2023-01-04 DOI:10.3390/signals4010002

N. Malik, S. Abu-Bakar, U. U. Sheikh, Asma Channa, N. Popescu

{"title":"基于CNN-LSTM的多视角人体动作识别的级联姿态特征","authors":"N. Malik, S. Abu-Bakar, U. U. Sheikh, Asma Channa, N. Popescu","doi":"10.3390/signals4010002","DOIUrl":null,"url":null,"abstract":"Human Action Recognition (HAR) is a branch of computer vision that deals with the identification of human actions at various levels including low level, action level, and interaction level. Previously, a number of HAR algorithms have been proposed based on handcrafted methods for action recognition. However, the handcrafted techniques are inefficient in case of recognizing interaction level actions as they involve complex scenarios. Meanwhile, the traditional deep learning-based approaches take the entire image as an input and later extract volumes of features, which greatly increase the complexity of the systems; hence, resulting in significantly higher computational time and utilization of resources. Therefore, this research focuses on the development of an efficient multi-view interaction level action recognition system using 2D skeleton data with higher accuracy while reducing the computation complexity based on deep learning architecture. The proposed system extracts 2D skeleton data from the dataset using the OpenPose technique. Later, the extracted 2D skeleton features are given as an input directly to the Convolutional Neural Networks and Long Short-Term Memory (CNN-LSTM) architecture for action recognition. To reduce the complexity, instead of passing the whole image, only extracted features are given to the CNN-LSTM architecture, thus eliminating the need for feature extraction. The proposed method was compared with other existing methods, and the outcomes confirm the potential of the proposed technique. The proposed OpenPose-CNNLSTM achieved an accuracy of 94.4% for MCAD (Multi-camera action dataset) and 91.67% for IXMAS (INRIA Xmas Motion Acquisition Sequences). Our proposed method also significantly decreases the computational complexity by reducing the number of inputs features to 50.","PeriodicalId":93815,"journal":{"name":"Signals","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Cascading Pose Features with CNN-LSTM for Multiview Human Action Recognition\",\"authors\":\"N. Malik, S. Abu-Bakar, U. U. Sheikh, Asma Channa, N. Popescu\",\"doi\":\"10.3390/signals4010002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human Action Recognition (HAR) is a branch of computer vision that deals with the identification of human actions at various levels including low level, action level, and interaction level. Previously, a number of HAR algorithms have been proposed based on handcrafted methods for action recognition. However, the handcrafted techniques are inefficient in case of recognizing interaction level actions as they involve complex scenarios. Meanwhile, the traditional deep learning-based approaches take the entire image as an input and later extract volumes of features, which greatly increase the complexity of the systems; hence, resulting in significantly higher computational time and utilization of resources. Therefore, this research focuses on the development of an efficient multi-view interaction level action recognition system using 2D skeleton data with higher accuracy while reducing the computation complexity based on deep learning architecture. The proposed system extracts 2D skeleton data from the dataset using the OpenPose technique. Later, the extracted 2D skeleton features are given as an input directly to the Convolutional Neural Networks and Long Short-Term Memory (CNN-LSTM) architecture for action recognition. To reduce the complexity, instead of passing the whole image, only extracted features are given to the CNN-LSTM architecture, thus eliminating the need for feature extraction. The proposed method was compared with other existing methods, and the outcomes confirm the potential of the proposed technique. The proposed OpenPose-CNNLSTM achieved an accuracy of 94.4% for MCAD (Multi-camera action dataset) and 91.67% for IXMAS (INRIA Xmas Motion Acquisition Sequences). Our proposed method also significantly decreases the computational complexity by reducing the number of inputs features to 50.\",\"PeriodicalId\":93815,\"journal\":{\"name\":\"Signals\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Signals\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/signals4010002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signals","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/signals4010002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

人类动作识别（HAR）是计算机视觉的一个分支，它处理不同层次的人类动作识别，包括低层次、动作层次和交互层次。以前，已经基于手工制作的动作识别方法提出了许多HAR算法。然而，手工制作的技术在识别交互级别的操作时效率低下，因为它们涉及复杂的场景。同时，传统的基于深度学习的方法将整个图像作为输入，然后提取大量的特征，这大大增加了系统的复杂性；从而导致显著更高的计算时间和资源利用率。因此，本研究专注于开发一种高效的多视图交互级动作识别系统，该系统使用2D骨架数据，具有更高的精度，同时降低了基于深度学习架构的计算复杂性。所提出的系统使用OpenPose技术从数据集中提取2D骨架数据。随后，将提取的2D骨架特征作为输入直接提供给卷积神经网络和长短期记忆（CNN-LSTM）架构用于动作识别。为了降低复杂度，CNN-LSTM架构只提供提取的特征，而不是通过整个图像，从而消除了对特征提取的需要。将所提出的方法与其他现有方法进行了比较，结果证实了所提出技术的潜力。所提出的OpenPose CNNLSTM对MCAD（多摄像机动作数据集）和IXMAS（INRIA圣诞运动采集序列）的准确率分别为94.4%和91.67%。我们提出的方法还通过将输入特征的数量减少到50来显著降低计算复杂度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Cascading Pose Features with CNN-LSTM for Multiview Human Action Recognition

Human Action Recognition (HAR) is a branch of computer vision that deals with the identification of human actions at various levels including low level, action level, and interaction level. Previously, a number of HAR algorithms have been proposed based on handcrafted methods for action recognition. However, the handcrafted techniques are inefficient in case of recognizing interaction level actions as they involve complex scenarios. Meanwhile, the traditional deep learning-based approaches take the entire image as an input and later extract volumes of features, which greatly increase the complexity of the systems; hence, resulting in significantly higher computational time and utilization of resources. Therefore, this research focuses on the development of an efficient multi-view interaction level action recognition system using 2D skeleton data with higher accuracy while reducing the computation complexity based on deep learning architecture. The proposed system extracts 2D skeleton data from the dataset using the OpenPose technique. Later, the extracted 2D skeleton features are given as an input directly to the Convolutional Neural Networks and Long Short-Term Memory (CNN-LSTM) architecture for action recognition. To reduce the complexity, instead of passing the whole image, only extracted features are given to the CNN-LSTM architecture, thus eliminating the need for feature extraction. The proposed method was compared with other existing methods, and the outcomes confirm the potential of the proposed technique. The proposed OpenPose-CNNLSTM achieved an accuracy of 94.4% for MCAD (Multi-camera action dataset) and 91.67% for IXMAS (INRIA Xmas Motion Acquisition Sequences). Our proposed method also significantly decreases the computational complexity by reducing the number of inputs features to 50.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Signals

CiteScore

3.20

自引率

0.00%

发文量

审稿时长

11 weeks