Human Activity Recognition Utilizing Ensemble of Transfer-Learned Attention Networks and a Low-Cost Convolutional Neural Architecture

2022 25th International Conference on Computer and Information Technology (ICCIT) Pub Date : 2022-12-17 DOI:10.1109/ICCIT57492.2022.10055456

Azmain Yakin Srizon, S. Hasan, Md. Farukuzzaman Faruk, Abu Sayeed, Md. Ali Hossain

{"title":"Human Activity Recognition Utilizing Ensemble of Transfer-Learned Attention Networks and a Low-Cost Convolutional Neural Architecture","authors":"Azmain Yakin Srizon, S. Hasan, Md. Farukuzzaman Faruk, Abu Sayeed, Md. Ali Hossain","doi":"10.1109/ICCIT57492.2022.10055456","DOIUrl":null,"url":null,"abstract":"Throughout the last decades, human activity recognition has been considered one of the most complex tasks in the domain of computer vision. Previously, many works have suggested different machine learning models for the recognition of human actions from sensor-based data and video-based data which is not cost-efficient. The recent advancement of the convolutional neural network (CNN) has opened the possibility of accurate human activity recognition from still images. Although many researchers have already proposed some deep learning-based approaches addressing the problem, due to the high diversity in human actions, those approaches failed to achieve decent performance for all human actions under consideration. Some researchers argued that an ensemble of different models may work better in this regard. However, as the images used for recognition in this domain are mostly captured by security cameras, often, the deep models couldn’t extract valuable features resulting in misclassifications. To resolve these issues, in this study, we have considered three transfer-learned models i.e., DenseNet201, Xception, and EfficientNetB6, and applied a multichannel attention module to extract more distinguishable features. Moreover, a custom-made low-cost CNN has been proposed that works with small images extracting features that often get lost due to deep computations. Finally, the fusion of features extracted by attention-based transfer-learned models and the low-cost CNN has been used for the final prediction. We validated the proposed ensemble model on Stanford 40 actions, BU-101, and Willow datasets, and it achieved 97.48%, 98.29%, and 94.19% overall accuracy respectively which outperformed the previous performances by notable margins.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10055456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Throughout the last decades, human activity recognition has been considered one of the most complex tasks in the domain of computer vision. Previously, many works have suggested different machine learning models for the recognition of human actions from sensor-based data and video-based data which is not cost-efficient. The recent advancement of the convolutional neural network (CNN) has opened the possibility of accurate human activity recognition from still images. Although many researchers have already proposed some deep learning-based approaches addressing the problem, due to the high diversity in human actions, those approaches failed to achieve decent performance for all human actions under consideration. Some researchers argued that an ensemble of different models may work better in this regard. However, as the images used for recognition in this domain are mostly captured by security cameras, often, the deep models couldn’t extract valuable features resulting in misclassifications. To resolve these issues, in this study, we have considered three transfer-learned models i.e., DenseNet201, Xception, and EfficientNetB6, and applied a multichannel attention module to extract more distinguishable features. Moreover, a custom-made low-cost CNN has been proposed that works with small images extracting features that often get lost due to deep computations. Finally, the fusion of features extracted by attention-based transfer-learned models and the low-cost CNN has been used for the final prediction. We validated the proposed ensemble model on Stanford 40 actions, BU-101, and Willow datasets, and it achieved 97.48%, 98.29%, and 94.19% overall accuracy respectively which outperformed the previous performances by notable margins.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于转移学习注意网络集成和低成本卷积神经结构的人类活动识别

在过去的几十年里，人类活动识别一直被认为是计算机视觉领域最复杂的任务之一。以前，许多工作已经提出了不同的机器学习模型，用于从基于传感器的数据和基于视频的数据中识别人类行为，这是不划算的。卷积神经网络(CNN)的最新进展已经开启了从静止图像中准确识别人类活动的可能性。尽管许多研究人员已经提出了一些基于深度学习的方法来解决这个问题，但由于人类行为的高度多样性，这些方法未能在考虑的所有人类行为中取得良好的表现。一些研究人员认为，在这方面，不同模型的综合可能会更好。然而，由于用于该领域识别的图像大多是由安全摄像机捕获的，因此深度模型通常无法提取有价值的特征，从而导致错误分类。为了解决这些问题，在本研究中，我们考虑了三个迁移学习模型，即DenseNet201, Xception和EfficientNetB6，并应用了一个多通道注意力模块来提取更多可区分的特征。此外，还提出了一种定制的低成本CNN，用于小图像提取由于深度计算而经常丢失的特征。最后，将基于注意的迁移学习模型提取的特征与低成本的CNN进行融合，用于最终的预测。我们在Stanford 40 actions、BU-101和Willow数据集上验证了所提出的集成模型，其总体准确率分别达到97.48%、98.29%和94.19%，显著优于之前的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 25th International Conference on Computer and Information Technology (ICCIT)

自引率

0.00%

发文量