Skeletal-based Classification for Human Activity Recognition

2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom) Pub Date : 2022-06-16 DOI:10.1109/CyberneticsCom55287.2022.9865354

Agung Suhendar, Tri Ayuningsih, S. Suyanto

{"title":"Skeletal-based Classification for Human Activity Recognition","authors":"Agung Suhendar, Tri Ayuningsih, S. Suyanto","doi":"10.1109/CyberneticsCom55287.2022.9865354","DOIUrl":null,"url":null,"abstract":"Human activity recognition (HAR) is critical for determining human interactions and interpersonal relationships. Among the various classification techniques, two things become the main focus of HAR, namely the type of activity and its localization. Most of the tasks in HAR involve identifying a human scene from a series of frames in a video, where the subject being monitored is free to perform an activity. For some of the current HAR approaches, 3D sensors are used as input extractors for the skeleton/body pose of the subject being monitored. It is much more precise than using only 2D information obtained from conventional cameras. Of course, the use of 3D sensors is a significant limitation for implementing video-based surveillance systems. In this research, we use the Deep learning OpenPose 3D method as a substitute for 3D sensors that can estimate the 3D frame/pose of the subject's body identified from conventional camera 2D input sources. It is then combined with other machine learning methods for the activity classification process from the obtained 3D framework. Classifiers that can be used include Support Vector Machine (SVM), Neural Network (NN), Long short-term memory (LSTM), and Transformer. Thus, HAR can be applied flexibly in various scopes of supervision without the help of 3D sensors. The experiment results inform that Transformer is the best in accuracy while SVM is in speed.","PeriodicalId":178279,"journal":{"name":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberneticsCom55287.2022.9865354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Human activity recognition (HAR) is critical for determining human interactions and interpersonal relationships. Among the various classification techniques, two things become the main focus of HAR, namely the type of activity and its localization. Most of the tasks in HAR involve identifying a human scene from a series of frames in a video, where the subject being monitored is free to perform an activity. For some of the current HAR approaches, 3D sensors are used as input extractors for the skeleton/body pose of the subject being monitored. It is much more precise than using only 2D information obtained from conventional cameras. Of course, the use of 3D sensors is a significant limitation for implementing video-based surveillance systems. In this research, we use the Deep learning OpenPose 3D method as a substitute for 3D sensors that can estimate the 3D frame/pose of the subject's body identified from conventional camera 2D input sources. It is then combined with other machine learning methods for the activity classification process from the obtained 3D framework. Classifiers that can be used include Support Vector Machine (SVM), Neural Network (NN), Long short-term memory (LSTM), and Transformer. Thus, HAR can be applied flexibly in various scopes of supervision without the help of 3D sensors. The experiment results inform that Transformer is the best in accuracy while SVM is in speed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于骨骼的人体活动识别分类

人类活动识别(HAR)是确定人类互动和人际关系的关键。在各种分类技术中，有两件事成为HAR的主要焦点，即活动类型及其定位。HAR中的大多数任务涉及从视频中的一系列帧中识别人类场景，其中被监控的主体可以自由地执行活动。对于目前的一些HAR方法，3D传感器被用作被监测对象的骨骼/身体姿势的输入提取器。它比仅使用从传统相机获得的二维信息精确得多。当然，3D传感器的使用是实现基于视频的监控系统的一个重大限制。在本研究中，我们使用深度学习OpenPose 3D方法作为3D传感器的替代品，可以估计从传统相机2D输入源识别的受试者身体的3D帧/姿势。然后将其与其他机器学习方法相结合，从获得的3D框架中进行活动分类过程。可以使用的分类器包括支持向量机(SVM)、神经网络(NN)、长短期记忆(LSTM)和Transformer。因此，无需借助3D传感器，HAR可以灵活地应用于各种监管范围。实验结果表明，变压器在精度上是最好的，而SVM在速度上是最好的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)

自引率

0.00%

发文量