MPCTrans：多视角线索感知联合关系表征，通过斯温变换器进行三维手部姿势估计

IF 4 3区综合性期刊 Q2 CHEMISTRY, ANALYTICAL Sensors Pub Date : 2024-10-31 DOI:10.3390/s24217029

Xiangan Wan, Jianping Ju, Jianying Tang, Mingyu Lin, Ning Rao, Deng Chen, Tingting Liu, Jing Li, Fan Bian, Nicholas Xiong

{"title":"MPCTrans：多视角线索感知联合关系表征，通过斯温变换器进行三维手部姿势估计","authors":"Xiangan Wan, Jianping Ju, Jianying Tang, Mingyu Lin, Ning Rao, Deng Chen, Tingting Liu, Jing Li, Fan Bian, Nicholas Xiong","doi":"10.3390/s24217029","DOIUrl":null,"url":null,"abstract":"The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.","PeriodicalId":21698,"journal":{"name":"Sensors","volume":"24 21","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11548048/pdf/","citationCount":"0","resultStr":"{\"title\":\"MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer.\",\"authors\":\"Xiangan Wan, Jianping Ju, Jianying Tang, Mingyu Lin, Ning Rao, Deng Chen, Tingting Liu, Jing Li, Fan Bian, Nicholas Xiong\",\"doi\":\"10.3390/s24217029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.\",\"PeriodicalId\":21698,\"journal\":{\"name\":\"Sensors\",\"volume\":\"24 21\",\"pages\":\"\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11548048/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sensors\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.3390/s24217029\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sensors","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/s24217029","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

摘要

基于深度图像的三维手部姿态估计（HPE）的目标是准确定位和预测手部的关键点。然而，由于不同视角下手部外观的变化以及严重的遮挡，这项任务仍然具有挑战性。为了有效应对这些挑战，本研究引入了一种新方法，即通过 Swin Transformer（简称 MPCTrans）实现三维 HPE 的多视角线索感知联合关系表示法。该方法旨在从手部深度图像中学习多视角线索和基本信息。为了实现这一目标，我们提出了三个新模块来利用手部多个虚拟视角的特征，即自适应虚拟多视角（AVM）、层次特征估计（HFE）和虚拟视角评估（VVE）模块。自适应虚拟多视点（AVM）模块可自适应地调整虚拟视点的角度，并学习理想的虚拟视点，从而生成信息丰富的多个虚拟视图。HFE 模块通过分层特征提取估算手部关键点。VVE 模块通过使用来自 HFE 模块的链式高级函数来评估虚拟视点。变换器作为骨干用于提取手部深度图像中的长距离语义关节关系。大量实验证明，MPCTrans 模型在四个具有挑战性的基准数据集上取得了一流的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer.

The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sensors 工程技术-电化学

CiteScore

7.30

自引率

12.80%

发文量

8430

审稿时长

1.7 months

期刊介绍： Sensors (ISSN 1424-8220) provides an advanced forum for the science and technology of sensors and biosensors. It publishes reviews (including comprehensive reviews on the complete sensors products), regular research papers and short notes. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.