基于CenterNet的端到端无锚点手势识别方法

2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT) Pub Date : 2023-07-13 DOI:10.1109/IAICT59002.2023.10205726

H. Dutta, K. Manivas, Marjana Bhuyan, M. Bhuyan

{"title":"基于CenterNet的端到端无锚点手势识别方法","authors":"H. Dutta, K. Manivas, Marjana Bhuyan, M. Bhuyan","doi":"10.1109/IAICT59002.2023.10205726","DOIUrl":null,"url":null,"abstract":"Hand gesture recognition is one of the interesting problems of Computer Vision. It has a wide range of applications in the fields of Human-Computer Interaction, Robotics, Sign language interpretation, Augmented Reality, etc. Most of the existing deep learning methods detect hand gestures in two stages. The hand is located in the first stage, and classification is performed on the hand portion in the second stage to estimate the hand pose. Although these methods are accurate, they are slow and cant be used for real-time applications. Few existing literature even explored one-stage approaches, like YOLO, SSD, etc., for hand gesture recognition as they have less inference time. But they place many anchor boxes over an image of which only a small percentage are positive. This leads to a huge imbalance between positive and negative anchor boxes and slows the training process. In this paper, we have used an end-to-end, one-stage hand detection-based approach, namely, CenterNet, for hand gesture recognition. It detects the object as a point, i.e., the center point of the bounding box encompassing the object, and regresses to the object size. This eliminates the need for anchor boxes in CenterNet. We have added Dual Attention Network to the CenterNet architecture to improve the performance. Our model achieves a mean F1-score of 84.40% and 98.83% on Ouhands and NUS hand pose datasets, respectively. Results show that our model can perform well even under complex backgrounds and varying illumination conditions, and the F1-scores obtained are close to benchmark values.","PeriodicalId":339796,"journal":{"name":"2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An End-to-end Anchorless Approach to Recognize Hand Gestures using CenterNet\",\"authors\":\"H. Dutta, K. Manivas, Marjana Bhuyan, M. Bhuyan\",\"doi\":\"10.1109/IAICT59002.2023.10205726\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hand gesture recognition is one of the interesting problems of Computer Vision. It has a wide range of applications in the fields of Human-Computer Interaction, Robotics, Sign language interpretation, Augmented Reality, etc. Most of the existing deep learning methods detect hand gestures in two stages. The hand is located in the first stage, and classification is performed on the hand portion in the second stage to estimate the hand pose. Although these methods are accurate, they are slow and cant be used for real-time applications. Few existing literature even explored one-stage approaches, like YOLO, SSD, etc., for hand gesture recognition as they have less inference time. But they place many anchor boxes over an image of which only a small percentage are positive. This leads to a huge imbalance between positive and negative anchor boxes and slows the training process. In this paper, we have used an end-to-end, one-stage hand detection-based approach, namely, CenterNet, for hand gesture recognition. It detects the object as a point, i.e., the center point of the bounding box encompassing the object, and regresses to the object size. This eliminates the need for anchor boxes in CenterNet. We have added Dual Attention Network to the CenterNet architecture to improve the performance. Our model achieves a mean F1-score of 84.40% and 98.83% on Ouhands and NUS hand pose datasets, respectively. Results show that our model can perform well even under complex backgrounds and varying illumination conditions, and the F1-scores obtained are close to benchmark values.\",\"PeriodicalId\":339796,\"journal\":{\"name\":\"2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAICT59002.2023.10205726\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAICT59002.2023.10205726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

手势识别是计算机视觉研究的热点问题之一。它在人机交互、机器人、手语翻译、增强现实等领域有着广泛的应用。大多数现有的深度学习方法检测手势分为两个阶段。手位于第一阶段，在第二阶段对手部分进行分类，以估计手的姿势。虽然这些方法是准确的，但它们速度慢，不能用于实时应用。由于推理时间较短，现有文献中很少有针对手势识别的单阶段方法，如YOLO、SSD等。但他们在一张图片上放置了许多锚框，而其中只有一小部分是正面的。这会导致正面和负面锚盒之间的巨大不平衡，并减缓训练过程。在本文中，我们使用了端到端、单阶段的基于手部检测的方法，即CenterNet来进行手势识别。它将对象检测为一个点，即包围对象的边界框的中心点，并回归到对象大小。这消除了在CenterNet中对锚框的需求。我们在CenterNet架构中添加了双注意力网络(Dual Attention Network)来提高性能。我们的模型在Ouhands和NUS手部姿势数据集上的平均f1得分分别为84.40%和98.83%。结果表明，该模型在复杂背景和不同光照条件下也能很好地发挥作用，得到的f1分数接近基准值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An End-to-end Anchorless Approach to Recognize Hand Gestures using CenterNet

Hand gesture recognition is one of the interesting problems of Computer Vision. It has a wide range of applications in the fields of Human-Computer Interaction, Robotics, Sign language interpretation, Augmented Reality, etc. Most of the existing deep learning methods detect hand gestures in two stages. The hand is located in the first stage, and classification is performed on the hand portion in the second stage to estimate the hand pose. Although these methods are accurate, they are slow and cant be used for real-time applications. Few existing literature even explored one-stage approaches, like YOLO, SSD, etc., for hand gesture recognition as they have less inference time. But they place many anchor boxes over an image of which only a small percentage are positive. This leads to a huge imbalance between positive and negative anchor boxes and slows the training process. In this paper, we have used an end-to-end, one-stage hand detection-based approach, namely, CenterNet, for hand gesture recognition. It detects the object as a point, i.e., the center point of the bounding box encompassing the object, and regresses to the object size. This eliminates the need for anchor boxes in CenterNet. We have added Dual Attention Network to the CenterNet architecture to improve the performance. Our model achieves a mean F1-score of 84.40% and 98.83% on Ouhands and NUS hand pose datasets, respectively. Results show that our model can perform well even under complex backgrounds and varying illumination conditions, and the F1-scores obtained are close to benchmark values.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)

自引率

0.00%

发文量