{"title":"使用视觉和惯性传感器融合方法识别人类活动和装配任务的状态","authors":"J. Male, Uriel Martinez-Hernandez","doi":"10.1109/ICIT46573.2021.9453672","DOIUrl":null,"url":null,"abstract":"Reliable human machine interfaces is key to accomplishing the goals of Industry 4.0. This work proposes the late fusion of a visual recognition and human action recognition (HAR) classifier. Vision is used to recognise the number of screws assembled into a mock part while HAR from body worn Inertial Measurement Units (IMUs) classifies actions done to assemble the part. Convolutional Neural Network (CNN) methods are used in both modes of classification before various late fusion methods are analysed for prediction of a final state estimate. The fusion methods investigated are mean, weighted average, Support Vector Machine (SVM), Bayesian, Artificial Neural Network (ANN) and Long Short Term Memory (LSTM). The results show the LSTM fusion method to perform best, with accuracy of 93% compared to 81% for IMU and 77% for visual sensing. Development of sensor fusion methods such as these is key to reliable Human Machine Interaction (HMI).","PeriodicalId":193338,"journal":{"name":"2021 22nd IEEE International Conference on Industrial Technology (ICIT)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Recognition of human activity and the state of an assembly task using vision and inertial sensor fusion methods\",\"authors\":\"J. Male, Uriel Martinez-Hernandez\",\"doi\":\"10.1109/ICIT46573.2021.9453672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reliable human machine interfaces is key to accomplishing the goals of Industry 4.0. This work proposes the late fusion of a visual recognition and human action recognition (HAR) classifier. Vision is used to recognise the number of screws assembled into a mock part while HAR from body worn Inertial Measurement Units (IMUs) classifies actions done to assemble the part. Convolutional Neural Network (CNN) methods are used in both modes of classification before various late fusion methods are analysed for prediction of a final state estimate. The fusion methods investigated are mean, weighted average, Support Vector Machine (SVM), Bayesian, Artificial Neural Network (ANN) and Long Short Term Memory (LSTM). The results show the LSTM fusion method to perform best, with accuracy of 93% compared to 81% for IMU and 77% for visual sensing. Development of sensor fusion methods such as these is key to reliable Human Machine Interaction (HMI).\",\"PeriodicalId\":193338,\"journal\":{\"name\":\"2021 22nd IEEE International Conference on Industrial Technology (ICIT)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 22nd IEEE International Conference on Industrial Technology (ICIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIT46573.2021.9453672\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 22nd IEEE International Conference on Industrial Technology (ICIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT46573.2021.9453672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Recognition of human activity and the state of an assembly task using vision and inertial sensor fusion methods
Reliable human machine interfaces is key to accomplishing the goals of Industry 4.0. This work proposes the late fusion of a visual recognition and human action recognition (HAR) classifier. Vision is used to recognise the number of screws assembled into a mock part while HAR from body worn Inertial Measurement Units (IMUs) classifies actions done to assemble the part. Convolutional Neural Network (CNN) methods are used in both modes of classification before various late fusion methods are analysed for prediction of a final state estimate. The fusion methods investigated are mean, weighted average, Support Vector Machine (SVM), Bayesian, Artificial Neural Network (ANN) and Long Short Term Memory (LSTM). The results show the LSTM fusion method to perform best, with accuracy of 93% compared to 81% for IMU and 77% for visual sensing. Development of sensor fusion methods such as these is key to reliable Human Machine Interaction (HMI).