Chao-Lung Yang, Shang-Che Hsu, Simi Wang, Jing-Feng Nian
{"title":"基于骨架聚类和模糊相似度的人类行为自动标记","authors":"Chao-Lung Yang, Shang-Che Hsu, Simi Wang, Jing-Feng Nian","doi":"10.54941/ahfe1001457","DOIUrl":null,"url":null,"abstract":"Nowadays, human action recognition (HAR) has been applied in multiple fields with the rapid growth of artificial intelligence and machine learning. Applying HAR onto industrial production lines can help on visualizing and analyzing the correlation between human operators and machine utilization to improve overall productivity. However, to train HAR model, the manual labeling of certain actions in a large amount of the collected video data is required and very costly. How to label a large amount of video automatically is an emerging practical problem in HAR research domain. This research proposed an automatic labeling framework by integrating Dynamic Time Warping (DTW), human skeleton clustering, and Fuzzy similarity to assign the labels based on the pre-defined human actions. First, the skeleton estimation method such as OpenPose was used to jointly detect key points of the human operator’s skeleton. Then, the skeleton data was converted to spatial-temporal data for calculating the DTW distance between skeletons. The groups of human skeletons can be clustered based on DTW distance among skeletons. Within a group of skeletons, the undefined skeletons will be compared with the pre-defined skeletons, considered as the references, and the labels are assigned according to the similarity against the references. The experimental dataset was created by simulating the human actions of manual drilling operations. By comparing with the manual labeled data, the results show that all of accuracy, precision, recall, and F1 of the proposed labeling model can achieve up to 95% with 40% saving time.","PeriodicalId":405313,"journal":{"name":"Artificial Intelligence and Social Computing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Labeling of Human Actions by Skeleton Clustering and Fuzzy Similarity\",\"authors\":\"Chao-Lung Yang, Shang-Che Hsu, Simi Wang, Jing-Feng Nian\",\"doi\":\"10.54941/ahfe1001457\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, human action recognition (HAR) has been applied in multiple fields with the rapid growth of artificial intelligence and machine learning. Applying HAR onto industrial production lines can help on visualizing and analyzing the correlation between human operators and machine utilization to improve overall productivity. However, to train HAR model, the manual labeling of certain actions in a large amount of the collected video data is required and very costly. How to label a large amount of video automatically is an emerging practical problem in HAR research domain. This research proposed an automatic labeling framework by integrating Dynamic Time Warping (DTW), human skeleton clustering, and Fuzzy similarity to assign the labels based on the pre-defined human actions. First, the skeleton estimation method such as OpenPose was used to jointly detect key points of the human operator’s skeleton. Then, the skeleton data was converted to spatial-temporal data for calculating the DTW distance between skeletons. The groups of human skeletons can be clustered based on DTW distance among skeletons. Within a group of skeletons, the undefined skeletons will be compared with the pre-defined skeletons, considered as the references, and the labels are assigned according to the similarity against the references. The experimental dataset was created by simulating the human actions of manual drilling operations. By comparing with the manual labeled data, the results show that all of accuracy, precision, recall, and F1 of the proposed labeling model can achieve up to 95% with 40% saving time.\",\"PeriodicalId\":405313,\"journal\":{\"name\":\"Artificial Intelligence and Social Computing\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence and Social Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54941/ahfe1001457\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence and Social Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1001457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
如今,随着人工智能和机器学习的快速发展,人体动作识别(HAR)已被应用于多个领域。将HAR应用于工业生产线可以帮助可视化和分析操作员与机器利用率之间的相关性,从而提高整体生产率。然而,为了训练HAR模型,需要对收集到的大量视频数据中的某些动作进行人工标记,并且成本非常高。如何对大量视频进行自动标注是HAR研究领域中一个新兴的实际问题。该研究提出了一种基于预定义人类行为的自动标记框架,该框架将动态时间扭曲(Dynamic Time warp, DTW)、人体骨架聚类和模糊相似度相结合。首先,采用OpenPose等骨架估计方法,对人体操作员骨架关键点进行联合检测;然后,将骨架数据转换为时空数据,计算骨架之间的DTW距离。基于骨骼间的DTW距离可以对人类骨骼群进行聚类。在一组骨架中,未定义的骨架将与预定义的骨架进行比较,作为参考,并根据与参考的相似度分配标签。实验数据集是通过模拟人工钻井作业的人类行为来创建的。通过与手工标注数据的比较,结果表明,所提标注模型的准确率、精密度、召回率和F1均达到95%以上,节省时间40%。
Automatic Labeling of Human Actions by Skeleton Clustering and Fuzzy Similarity
Nowadays, human action recognition (HAR) has been applied in multiple fields with the rapid growth of artificial intelligence and machine learning. Applying HAR onto industrial production lines can help on visualizing and analyzing the correlation between human operators and machine utilization to improve overall productivity. However, to train HAR model, the manual labeling of certain actions in a large amount of the collected video data is required and very costly. How to label a large amount of video automatically is an emerging practical problem in HAR research domain. This research proposed an automatic labeling framework by integrating Dynamic Time Warping (DTW), human skeleton clustering, and Fuzzy similarity to assign the labels based on the pre-defined human actions. First, the skeleton estimation method such as OpenPose was used to jointly detect key points of the human operator’s skeleton. Then, the skeleton data was converted to spatial-temporal data for calculating the DTW distance between skeletons. The groups of human skeletons can be clustered based on DTW distance among skeletons. Within a group of skeletons, the undefined skeletons will be compared with the pre-defined skeletons, considered as the references, and the labels are assigned according to the similarity against the references. The experimental dataset was created by simulating the human actions of manual drilling operations. By comparing with the manual labeled data, the results show that all of accuracy, precision, recall, and F1 of the proposed labeling model can achieve up to 95% with 40% saving time.