{"title":"基于CenterNet的端到端无锚点手势识别方法","authors":"H. Dutta, K. Manivas, Marjana Bhuyan, M. Bhuyan","doi":"10.1109/IAICT59002.2023.10205726","DOIUrl":null,"url":null,"abstract":"Hand gesture recognition is one of the interesting problems of Computer Vision. It has a wide range of applications in the fields of Human-Computer Interaction, Robotics, Sign language interpretation, Augmented Reality, etc. Most of the existing deep learning methods detect hand gestures in two stages. The hand is located in the first stage, and classification is performed on the hand portion in the second stage to estimate the hand pose. Although these methods are accurate, they are slow and cant be used for real-time applications. Few existing literature even explored one-stage approaches, like YOLO, SSD, etc., for hand gesture recognition as they have less inference time. But they place many anchor boxes over an image of which only a small percentage are positive. This leads to a huge imbalance between positive and negative anchor boxes and slows the training process. In this paper, we have used an end-to-end, one-stage hand detection-based approach, namely, CenterNet, for hand gesture recognition. It detects the object as a point, i.e., the center point of the bounding box encompassing the object, and regresses to the object size. This eliminates the need for anchor boxes in CenterNet. We have added Dual Attention Network to the CenterNet architecture to improve the performance. Our model achieves a mean F1-score of 84.40% and 98.83% on Ouhands and NUS hand pose datasets, respectively. Results show that our model can perform well even under complex backgrounds and varying illumination conditions, and the F1-scores obtained are close to benchmark values.","PeriodicalId":339796,"journal":{"name":"2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An End-to-end Anchorless Approach to Recognize Hand Gestures using CenterNet\",\"authors\":\"H. Dutta, K. Manivas, Marjana Bhuyan, M. Bhuyan\",\"doi\":\"10.1109/IAICT59002.2023.10205726\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hand gesture recognition is one of the interesting problems of Computer Vision. It has a wide range of applications in the fields of Human-Computer Interaction, Robotics, Sign language interpretation, Augmented Reality, etc. Most of the existing deep learning methods detect hand gestures in two stages. The hand is located in the first stage, and classification is performed on the hand portion in the second stage to estimate the hand pose. Although these methods are accurate, they are slow and cant be used for real-time applications. Few existing literature even explored one-stage approaches, like YOLO, SSD, etc., for hand gesture recognition as they have less inference time. But they place many anchor boxes over an image of which only a small percentage are positive. This leads to a huge imbalance between positive and negative anchor boxes and slows the training process. In this paper, we have used an end-to-end, one-stage hand detection-based approach, namely, CenterNet, for hand gesture recognition. It detects the object as a point, i.e., the center point of the bounding box encompassing the object, and regresses to the object size. This eliminates the need for anchor boxes in CenterNet. We have added Dual Attention Network to the CenterNet architecture to improve the performance. Our model achieves a mean F1-score of 84.40% and 98.83% on Ouhands and NUS hand pose datasets, respectively. Results show that our model can perform well even under complex backgrounds and varying illumination conditions, and the F1-scores obtained are close to benchmark values.\",\"PeriodicalId\":339796,\"journal\":{\"name\":\"2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAICT59002.2023.10205726\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAICT59002.2023.10205726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An End-to-end Anchorless Approach to Recognize Hand Gestures using CenterNet
Hand gesture recognition is one of the interesting problems of Computer Vision. It has a wide range of applications in the fields of Human-Computer Interaction, Robotics, Sign language interpretation, Augmented Reality, etc. Most of the existing deep learning methods detect hand gestures in two stages. The hand is located in the first stage, and classification is performed on the hand portion in the second stage to estimate the hand pose. Although these methods are accurate, they are slow and cant be used for real-time applications. Few existing literature even explored one-stage approaches, like YOLO, SSD, etc., for hand gesture recognition as they have less inference time. But they place many anchor boxes over an image of which only a small percentage are positive. This leads to a huge imbalance between positive and negative anchor boxes and slows the training process. In this paper, we have used an end-to-end, one-stage hand detection-based approach, namely, CenterNet, for hand gesture recognition. It detects the object as a point, i.e., the center point of the bounding box encompassing the object, and regresses to the object size. This eliminates the need for anchor boxes in CenterNet. We have added Dual Attention Network to the CenterNet architecture to improve the performance. Our model achieves a mean F1-score of 84.40% and 98.83% on Ouhands and NUS hand pose datasets, respectively. Results show that our model can perform well even under complex backgrounds and varying illumination conditions, and the F1-scores obtained are close to benchmark values.