{"title":"Attention-Based 2-D Hand Keypoints Localization","authors":"H Pallab Jyoti Dutta;M. K. Bhuyan","doi":"10.1109/LSENS.2024.3443072","DOIUrl":null,"url":null,"abstract":"Hand keypoint localization is used extensively in human–computer interaction, but accurate localization is challenging due to closeness between the fingers and the keypoints, occlusion, varied hand poses, complex backgrounds, and extreme lighting conditions. Despite much research, challenges persist. Therefore, we propose an encoder–decoder architecture aided by a novel attention module to precisely localize hand keypoints. The attention module captures keypoint-relevant features at two different scales that encompass local and global characteristics. Further, the loss function teaches the model to remove spurious detected keypoints in the initial learning phase. The proposed architecture outputs precise keypoint locations, as indicated by the qualitative and quantitative results. Evaluation of two benchmark RGB image datasets, comprising all the challenges encountered in keypoint localization, resulted in endpoint errors as low as 2.78 and 1.85 pixels and 98.50% and 99.77% correct keypoints, respectively. This shows the proposed model's effectiveness and ability to overcome challenges.","PeriodicalId":13014,"journal":{"name":"IEEE Sensors Letters","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10636024/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Hand keypoint localization is used extensively in human–computer interaction, but accurate localization is challenging due to closeness between the fingers and the keypoints, occlusion, varied hand poses, complex backgrounds, and extreme lighting conditions. Despite much research, challenges persist. Therefore, we propose an encoder–decoder architecture aided by a novel attention module to precisely localize hand keypoints. The attention module captures keypoint-relevant features at two different scales that encompass local and global characteristics. Further, the loss function teaches the model to remove spurious detected keypoints in the initial learning phase. The proposed architecture outputs precise keypoint locations, as indicated by the qualitative and quantitative results. Evaluation of two benchmark RGB image datasets, comprising all the challenges encountered in keypoint localization, resulted in endpoint errors as low as 2.78 and 1.85 pixels and 98.50% and 99.77% correct keypoints, respectively. This shows the proposed model's effectiveness and ability to overcome challenges.