{"title":"3D CNN hand pose estimation with end-to-end hierarchical model and physical constraints from depth images","authors":"Zhengze Xu, Wenjun Zhang","doi":"10.14311/nnw.2023.33.003","DOIUrl":null,"url":null,"abstract":"Previous studies are mainly focused on the works that depth image is treated as flat image, and then depth data tends to be mapped as gray values during the convolution processing and features extraction. To address this issue, an approach of 3D CNN hand pose estimation with end-to-end hierarchical model and physical constraints is proposed. After reconstruction of 3D space structure of hand from depth image, 3D model is converted into voxel grid for further hand pose estimation by 3D CNN. The 3D CNN method makes improvements by embedding end-to-end hierarchical model and constraints algorithm into the networks, resulting to train at fast convergence rate and avoid unrealistic hand pose. According to the experimental results, it reaches 87.98% of mean accuracy and 8.82 mm of mean absolute error (MAE) for all 21 joints within 24 ms at the inference time, which consistently outperforms several well-known gesture recognition algorithms.","PeriodicalId":49765,"journal":{"name":"Neural Network World","volume":"1 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Network World","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.14311/nnw.2023.33.003","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Previous studies are mainly focused on the works that depth image is treated as flat image, and then depth data tends to be mapped as gray values during the convolution processing and features extraction. To address this issue, an approach of 3D CNN hand pose estimation with end-to-end hierarchical model and physical constraints is proposed. After reconstruction of 3D space structure of hand from depth image, 3D model is converted into voxel grid for further hand pose estimation by 3D CNN. The 3D CNN method makes improvements by embedding end-to-end hierarchical model and constraints algorithm into the networks, resulting to train at fast convergence rate and avoid unrealistic hand pose. According to the experimental results, it reaches 87.98% of mean accuracy and 8.82 mm of mean absolute error (MAE) for all 21 joints within 24 ms at the inference time, which consistently outperforms several well-known gesture recognition algorithms.
期刊介绍:
Neural Network World is a bimonthly journal providing the latest developments in the field of informatics with attention mainly devoted to the problems of:
brain science,
theory and applications of neural networks (both artificial and natural),
fuzzy-neural systems,
methods and applications of evolutionary algorithms,
methods of parallel and mass-parallel computing,
problems of soft-computing,
methods of artificial intelligence.