Camera distance helps 3D hand pose estimated from a single RGB image

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Graphical Models Pub Date : 2023-05-01 DOI:10.1016/j.gmod.2023.101179

Yuan Cui , Moran Li , Yuan Gao , Changxin Gao , Fan Wu , Hao Wen , Jiwei Li , Nong Sang

{"title":"Camera distance helps 3D hand pose estimated from a single RGB image","authors":"Yuan Cui , Moran Li , Yuan Gao , Changxin Gao , Fan Wu , Hao Wen , Jiwei Li , Nong Sang","doi":"10.1016/j.gmod.2023.101179","DOIUrl":null,"url":null,"abstract":"<div><p>Most existing methods for RGB hand pose estimation use root-relative 3D coordinates for supervision. However, such supervision neglects the distance between the camera and the object (i.e., the hand). The camera distance is especially important under a perspective camera, which controls the depth-dependent scaling of the perspective projection. As a result, the same hand pose, with different camera distances can be projected into different 2D shapes by the same perspective camera. Neglecting such important information results in ambiguities in recovering 3D poses from 2D images. In this article, we propose a camera projection learning module (CPLM) that uses the scale factor contained in the camera distance to associate 3D hand pose with 2D UV coordinates, which facilities to further optimize the accuracy of the estimated hand joints. Specifically, following the previous work, we use a two-stage RGB-to-2D and 2D-to-3D method to estimate 3D hand pose and embed a graph convolutional network in the second stage to leverage the information contained in the complex non-Euclidean structure of 2D hand joints. Experimental results demonstrate that our proposed method surpasses state-of-the-art methods on the benchmark dataset RHD and obtains competitive results on the STB and D+O datasets.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"127 ","pages":"Article 101179"},"PeriodicalIF":2.5000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Graphical Models","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1524070323000097","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Most existing methods for RGB hand pose estimation use root-relative 3D coordinates for supervision. However, such supervision neglects the distance between the camera and the object (i.e., the hand). The camera distance is especially important under a perspective camera, which controls the depth-dependent scaling of the perspective projection. As a result, the same hand pose, with different camera distances can be projected into different 2D shapes by the same perspective camera. Neglecting such important information results in ambiguities in recovering 3D poses from 2D images. In this article, we propose a camera projection learning module (CPLM) that uses the scale factor contained in the camera distance to associate 3D hand pose with 2D UV coordinates, which facilities to further optimize the accuracy of the estimated hand joints. Specifically, following the previous work, we use a two-stage RGB-to-2D and 2D-to-3D method to estimate 3D hand pose and embed a graph convolutional network in the second stage to leverage the information contained in the complex non-Euclidean structure of 2D hand joints. Experimental results demonstrate that our proposed method surpasses state-of-the-art methods on the benchmark dataset RHD and obtains competitive results on the STB and D+O datasets.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

相机距离有助于从单个RGB图像估计3D手姿势

大多数现有的RGB手姿态估计方法使用根相对3D坐标进行监督。然而，这种监督忽略了相机和物体（即手）之间的距离。在控制透视投影的深度相关缩放的透视相机下，相机距离尤其重要。因此，具有不同相机距离的相同手姿势可以通过相同的透视相机投影到不同的2D形状。忽略这样的重要信息会导致从2D图像中恢复3D姿态的模糊性。在本文中，我们提出了一个相机投影学习模块（CPLM），该模块使用相机距离中包含的比例因子将3D手部姿势与2D UV坐标相关联，从而有助于进一步优化估计的手关节的精度。具体来说，在之前的工作之后，我们使用两阶段RGB-to-2D和2D-to-3D方法来估计3D手部姿势，并在第二阶段嵌入图卷积网络，以利用2D手部关节的复杂非欧几里得结构中包含的信息。实验结果表明，我们提出的方法在基准数据集RHD上超过了最先进的方法，并在STB和D+O数据集上获得了有竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Graphical Models 工程技术-计算机：软件工程

CiteScore

3.60

自引率

5.90%

发文量

审稿时长

47 days

期刊介绍： Graphical Models is recognized internationally as a highly rated, top tier journal and is focused on the creation, geometric processing, animation, and visualization of graphical models and on their applications in engineering, science, culture, and entertainment. GMOD provides its readers with thoroughly reviewed and carefully selected papers that disseminate exciting innovations, that teach rigorous theoretical foundations, that propose robust and efficient solutions, or that describe ambitious systems or applications in a variety of topics. We invite papers in five categories: research (contributions of novel theoretical or practical approaches or solutions), survey (opinionated views of the state-of-the-art and challenges in a specific topic), system (the architecture and implementation details of an innovative architecture for a complete system that supports model/animation design, acquisition, analysis, visualization?), application (description of a novel application of know techniques and evaluation of its impact), or lecture (an elegant and inspiring perspective on previously published results that clarifies them and teaches them in a new way). GMOD offers its authors an accelerated review, feedback from experts in the field, immediate online publication of accepted papers, no restriction on color and length (when justified by the content) in the online version, and a broad promotion of published papers. A prestigious group of editors selected from among the premier international researchers in their fields oversees the review process.