基于CNN-LSTM的多视角人体动作识别的级联姿态特征

Signals Pub Date : 2023-01-04 DOI:10.3390/signals4010002
N. Malik, S. Abu-Bakar, U. U. Sheikh, Asma Channa, N. Popescu
{"title":"基于CNN-LSTM的多视角人体动作识别的级联姿态特征","authors":"N. Malik, S. Abu-Bakar, U. U. Sheikh, Asma Channa, N. Popescu","doi":"10.3390/signals4010002","DOIUrl":null,"url":null,"abstract":"Human Action Recognition (HAR) is a branch of computer vision that deals with the identification of human actions at various levels including low level, action level, and interaction level. Previously, a number of HAR algorithms have been proposed based on handcrafted methods for action recognition. However, the handcrafted techniques are inefficient in case of recognizing interaction level actions as they involve complex scenarios. Meanwhile, the traditional deep learning-based approaches take the entire image as an input and later extract volumes of features, which greatly increase the complexity of the systems; hence, resulting in significantly higher computational time and utilization of resources. Therefore, this research focuses on the development of an efficient multi-view interaction level action recognition system using 2D skeleton data with higher accuracy while reducing the computation complexity based on deep learning architecture. The proposed system extracts 2D skeleton data from the dataset using the OpenPose technique. Later, the extracted 2D skeleton features are given as an input directly to the Convolutional Neural Networks and Long Short-Term Memory (CNN-LSTM) architecture for action recognition. To reduce the complexity, instead of passing the whole image, only extracted features are given to the CNN-LSTM architecture, thus eliminating the need for feature extraction. The proposed method was compared with other existing methods, and the outcomes confirm the potential of the proposed technique. The proposed OpenPose-CNNLSTM achieved an accuracy of 94.4% for MCAD (Multi-camera action dataset) and 91.67% for IXMAS (INRIA Xmas Motion Acquisition Sequences). Our proposed method also significantly decreases the computational complexity by reducing the number of inputs features to 50.","PeriodicalId":93815,"journal":{"name":"Signals","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Cascading Pose Features with CNN-LSTM for Multiview Human Action Recognition\",\"authors\":\"N. Malik, S. Abu-Bakar, U. U. Sheikh, Asma Channa, N. Popescu\",\"doi\":\"10.3390/signals4010002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human Action Recognition (HAR) is a branch of computer vision that deals with the identification of human actions at various levels including low level, action level, and interaction level. Previously, a number of HAR algorithms have been proposed based on handcrafted methods for action recognition. However, the handcrafted techniques are inefficient in case of recognizing interaction level actions as they involve complex scenarios. Meanwhile, the traditional deep learning-based approaches take the entire image as an input and later extract volumes of features, which greatly increase the complexity of the systems; hence, resulting in significantly higher computational time and utilization of resources. Therefore, this research focuses on the development of an efficient multi-view interaction level action recognition system using 2D skeleton data with higher accuracy while reducing the computation complexity based on deep learning architecture. The proposed system extracts 2D skeleton data from the dataset using the OpenPose technique. Later, the extracted 2D skeleton features are given as an input directly to the Convolutional Neural Networks and Long Short-Term Memory (CNN-LSTM) architecture for action recognition. To reduce the complexity, instead of passing the whole image, only extracted features are given to the CNN-LSTM architecture, thus eliminating the need for feature extraction. The proposed method was compared with other existing methods, and the outcomes confirm the potential of the proposed technique. The proposed OpenPose-CNNLSTM achieved an accuracy of 94.4% for MCAD (Multi-camera action dataset) and 91.67% for IXMAS (INRIA Xmas Motion Acquisition Sequences). Our proposed method also significantly decreases the computational complexity by reducing the number of inputs features to 50.\",\"PeriodicalId\":93815,\"journal\":{\"name\":\"Signals\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Signals\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/signals4010002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signals","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/signals4010002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

人类动作识别(HAR)是计算机视觉的一个分支,它处理不同层次的人类动作识别,包括低层次、动作层次和交互层次。以前,已经基于手工制作的动作识别方法提出了许多HAR算法。然而,手工制作的技术在识别交互级别的操作时效率低下,因为它们涉及复杂的场景。同时,传统的基于深度学习的方法将整个图像作为输入,然后提取大量的特征,这大大增加了系统的复杂性;从而导致显著更高的计算时间和资源利用率。因此,本研究专注于开发一种高效的多视图交互级动作识别系统,该系统使用2D骨架数据,具有更高的精度,同时降低了基于深度学习架构的计算复杂性。所提出的系统使用OpenPose技术从数据集中提取2D骨架数据。随后,将提取的2D骨架特征作为输入直接提供给卷积神经网络和长短期记忆(CNN-LSTM)架构用于动作识别。为了降低复杂度,CNN-LSTM架构只提供提取的特征,而不是通过整个图像,从而消除了对特征提取的需要。将所提出的方法与其他现有方法进行了比较,结果证实了所提出技术的潜力。所提出的OpenPose CNNLSTM对MCAD(多摄像机动作数据集)和IXMAS(INRIA圣诞运动采集序列)的准确率分别为94.4%和91.67%。我们提出的方法还通过将输入特征的数量减少到50来显著降低计算复杂度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Cascading Pose Features with CNN-LSTM for Multiview Human Action Recognition
Human Action Recognition (HAR) is a branch of computer vision that deals with the identification of human actions at various levels including low level, action level, and interaction level. Previously, a number of HAR algorithms have been proposed based on handcrafted methods for action recognition. However, the handcrafted techniques are inefficient in case of recognizing interaction level actions as they involve complex scenarios. Meanwhile, the traditional deep learning-based approaches take the entire image as an input and later extract volumes of features, which greatly increase the complexity of the systems; hence, resulting in significantly higher computational time and utilization of resources. Therefore, this research focuses on the development of an efficient multi-view interaction level action recognition system using 2D skeleton data with higher accuracy while reducing the computation complexity based on deep learning architecture. The proposed system extracts 2D skeleton data from the dataset using the OpenPose technique. Later, the extracted 2D skeleton features are given as an input directly to the Convolutional Neural Networks and Long Short-Term Memory (CNN-LSTM) architecture for action recognition. To reduce the complexity, instead of passing the whole image, only extracted features are given to the CNN-LSTM architecture, thus eliminating the need for feature extraction. The proposed method was compared with other existing methods, and the outcomes confirm the potential of the proposed technique. The proposed OpenPose-CNNLSTM achieved an accuracy of 94.4% for MCAD (Multi-camera action dataset) and 91.67% for IXMAS (INRIA Xmas Motion Acquisition Sequences). Our proposed method also significantly decreases the computational complexity by reducing the number of inputs features to 50.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.20
自引率
0.00%
发文量
0
审稿时长
11 weeks
期刊最新文献
Detection of Movement and Lead-Popping Artifacts in Polysomnography EEG Data. Development of an Integrated System of sEMG Signal Acquisition, Processing, and Analysis with AI Techniques Correction: Martin et al. ApeTI: A Thermal Image Dataset for Face and Nose Segmentation with Apes. Signals 2024, 5, 147–164 On the Impulse Response of Singular Discrete LTI Systems and Three Fourier Transform Pairs Noncooperative Spectrum Sensing Strategy Based on Recurrence Quantification Analysis in the Context of the Cognitive Radio
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1