{"title":"Skeletal-based Classification for Human Activity Recognition","authors":"Agung Suhendar, Tri Ayuningsih, S. Suyanto","doi":"10.1109/CyberneticsCom55287.2022.9865354","DOIUrl":null,"url":null,"abstract":"Human activity recognition (HAR) is critical for determining human interactions and interpersonal relationships. Among the various classification techniques, two things become the main focus of HAR, namely the type of activity and its localization. Most of the tasks in HAR involve identifying a human scene from a series of frames in a video, where the subject being monitored is free to perform an activity. For some of the current HAR approaches, 3D sensors are used as input extractors for the skeleton/body pose of the subject being monitored. It is much more precise than using only 2D information obtained from conventional cameras. Of course, the use of 3D sensors is a significant limitation for implementing video-based surveillance systems. In this research, we use the Deep learning OpenPose 3D method as a substitute for 3D sensors that can estimate the 3D frame/pose of the subject's body identified from conventional camera 2D input sources. It is then combined with other machine learning methods for the activity classification process from the obtained 3D framework. Classifiers that can be used include Support Vector Machine (SVM), Neural Network (NN), Long short-term memory (LSTM), and Transformer. Thus, HAR can be applied flexibly in various scopes of supervision without the help of 3D sensors. The experiment results inform that Transformer is the best in accuracy while SVM is in speed.","PeriodicalId":178279,"journal":{"name":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberneticsCom55287.2022.9865354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Human activity recognition (HAR) is critical for determining human interactions and interpersonal relationships. Among the various classification techniques, two things become the main focus of HAR, namely the type of activity and its localization. Most of the tasks in HAR involve identifying a human scene from a series of frames in a video, where the subject being monitored is free to perform an activity. For some of the current HAR approaches, 3D sensors are used as input extractors for the skeleton/body pose of the subject being monitored. It is much more precise than using only 2D information obtained from conventional cameras. Of course, the use of 3D sensors is a significant limitation for implementing video-based surveillance systems. In this research, we use the Deep learning OpenPose 3D method as a substitute for 3D sensors that can estimate the 3D frame/pose of the subject's body identified from conventional camera 2D input sources. It is then combined with other machine learning methods for the activity classification process from the obtained 3D framework. Classifiers that can be used include Support Vector Machine (SVM), Neural Network (NN), Long short-term memory (LSTM), and Transformer. Thus, HAR can be applied flexibly in various scopes of supervision without the help of 3D sensors. The experiment results inform that Transformer is the best in accuracy while SVM is in speed.