Yuan Cui , Moran Li , Yuan Gao , Changxin Gao , Fan Wu , Hao Wen , Jiwei Li , Nong Sang
{"title":"相机距离有助于从单个RGB图像估计3D手姿势","authors":"Yuan Cui , Moran Li , Yuan Gao , Changxin Gao , Fan Wu , Hao Wen , Jiwei Li , Nong Sang","doi":"10.1016/j.gmod.2023.101179","DOIUrl":null,"url":null,"abstract":"<div><p>Most existing methods for RGB hand pose estimation use root-relative 3D coordinates for supervision. However, such supervision neglects the distance between the camera and the object (i.e., the hand). The camera distance is especially important under a perspective camera, which controls the depth-dependent scaling of the perspective projection. As a result, the same hand pose, with different camera distances can be projected into different 2D shapes by the same perspective camera. Neglecting such important information results in ambiguities in recovering 3D poses from 2D images. In this article, we propose a camera projection learning module (CPLM) that uses the scale factor contained in the camera distance to associate 3D hand pose with 2D UV coordinates, which facilities to further optimize the accuracy of the estimated hand joints. Specifically, following the previous work, we use a two-stage RGB-to-2D and 2D-to-3D method to estimate 3D hand pose and embed a graph convolutional network in the second stage to leverage the information contained in the complex non-Euclidean structure of 2D hand joints. Experimental results demonstrate that our proposed method surpasses state-of-the-art methods on the benchmark dataset RHD and obtains competitive results on the STB and D+O datasets.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"127 ","pages":"Article 101179"},"PeriodicalIF":2.5000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Camera distance helps 3D hand pose estimated from a single RGB image\",\"authors\":\"Yuan Cui , Moran Li , Yuan Gao , Changxin Gao , Fan Wu , Hao Wen , Jiwei Li , Nong Sang\",\"doi\":\"10.1016/j.gmod.2023.101179\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Most existing methods for RGB hand pose estimation use root-relative 3D coordinates for supervision. However, such supervision neglects the distance between the camera and the object (i.e., the hand). The camera distance is especially important under a perspective camera, which controls the depth-dependent scaling of the perspective projection. As a result, the same hand pose, with different camera distances can be projected into different 2D shapes by the same perspective camera. Neglecting such important information results in ambiguities in recovering 3D poses from 2D images. In this article, we propose a camera projection learning module (CPLM) that uses the scale factor contained in the camera distance to associate 3D hand pose with 2D UV coordinates, which facilities to further optimize the accuracy of the estimated hand joints. Specifically, following the previous work, we use a two-stage RGB-to-2D and 2D-to-3D method to estimate 3D hand pose and embed a graph convolutional network in the second stage to leverage the information contained in the complex non-Euclidean structure of 2D hand joints. Experimental results demonstrate that our proposed method surpasses state-of-the-art methods on the benchmark dataset RHD and obtains competitive results on the STB and D+O datasets.</p></div>\",\"PeriodicalId\":55083,\"journal\":{\"name\":\"Graphical Models\",\"volume\":\"127 \",\"pages\":\"Article 101179\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Graphical Models\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1524070323000097\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Graphical Models","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1524070323000097","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Camera distance helps 3D hand pose estimated from a single RGB image
Most existing methods for RGB hand pose estimation use root-relative 3D coordinates for supervision. However, such supervision neglects the distance between the camera and the object (i.e., the hand). The camera distance is especially important under a perspective camera, which controls the depth-dependent scaling of the perspective projection. As a result, the same hand pose, with different camera distances can be projected into different 2D shapes by the same perspective camera. Neglecting such important information results in ambiguities in recovering 3D poses from 2D images. In this article, we propose a camera projection learning module (CPLM) that uses the scale factor contained in the camera distance to associate 3D hand pose with 2D UV coordinates, which facilities to further optimize the accuracy of the estimated hand joints. Specifically, following the previous work, we use a two-stage RGB-to-2D and 2D-to-3D method to estimate 3D hand pose and embed a graph convolutional network in the second stage to leverage the information contained in the complex non-Euclidean structure of 2D hand joints. Experimental results demonstrate that our proposed method surpasses state-of-the-art methods on the benchmark dataset RHD and obtains competitive results on the STB and D+O datasets.
期刊介绍:
Graphical Models is recognized internationally as a highly rated, top tier journal and is focused on the creation, geometric processing, animation, and visualization of graphical models and on their applications in engineering, science, culture, and entertainment. GMOD provides its readers with thoroughly reviewed and carefully selected papers that disseminate exciting innovations, that teach rigorous theoretical foundations, that propose robust and efficient solutions, or that describe ambitious systems or applications in a variety of topics.
We invite papers in five categories: research (contributions of novel theoretical or practical approaches or solutions), survey (opinionated views of the state-of-the-art and challenges in a specific topic), system (the architecture and implementation details of an innovative architecture for a complete system that supports model/animation design, acquisition, analysis, visualization?), application (description of a novel application of know techniques and evaluation of its impact), or lecture (an elegant and inspiring perspective on previously published results that clarifies them and teaches them in a new way).
GMOD offers its authors an accelerated review, feedback from experts in the field, immediate online publication of accepted papers, no restriction on color and length (when justified by the content) in the online version, and a broad promotion of published papers. A prestigious group of editors selected from among the premier international researchers in their fields oversees the review process.