Geometric Consistency-Guaranteed Spatio-Temporal Transformer for Unsupervised Multiview 3-D Pose Estimation

IF 5.6 2区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Instrumentation and Measurement Pub Date : 2024-09-02 DOI:10.1109/TIM.2024.3440376
Kaiwen Dong;Kévin Riou;Jingwen Zhu;Andréas Pastor;Kévin Subrin;Yu Zhou;Xiao Yun;Yanjing Sun;Patrick Le Callet
{"title":"Geometric Consistency-Guaranteed Spatio-Temporal Transformer for Unsupervised Multiview 3-D Pose Estimation","authors":"Kaiwen Dong;Kévin Riou;Jingwen Zhu;Andréas Pastor;Kévin Subrin;Yu Zhou;Xiao Yun;Yanjing Sun;Patrick Le Callet","doi":"10.1109/TIM.2024.3440376","DOIUrl":null,"url":null,"abstract":"Unsupervised 3-D pose estimation has gained prominence due to the challenges in acquiring labeled 3-D data for training. Despite promising progress, unsupervised approaches still lag behind supervised methods in performance. Two factors impede the progress of unsupervised approaches: incomplete geometric constraint and inadequate interaction among spatial, temporal, and multiview features. This article introduces an unsupervised pipeline that uses calibrated camera parameters as geometric constraints across views and coordinate spaces to optimize the model by minimizing inconsistencies between the 2-D input pose and the reprojection of the predicted 3-D pose. This pipeline utilizes the novel hierarchical cross transformer (HCT) to encode higher levels of information by enabling interactions among hierarchical features containing different levels of temporal, spatial, and cross-view information. By minimizing the reliance on human-specific parts, the HCT shows potential for adapting to various pose estimation tasks. To validate the adaptability, we build a connection between human pose estimation and scene pose estimation, introducing a dynamic-keypoints-3-D (DKs-3D) dataset tailored for 3-D scene pose estimation in robotic manipulation. Experiments on two 3-D human pose estimation datasets demonstrate our method’s new state-of-the-art performance among weakly and unsupervised approaches. The adaptability of our method is confirmed through experiments on DK-3D, setting the initial benchmark for unsupervised 2-D-to-3-D scene pose lifting.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10663570/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Unsupervised 3-D pose estimation has gained prominence due to the challenges in acquiring labeled 3-D data for training. Despite promising progress, unsupervised approaches still lag behind supervised methods in performance. Two factors impede the progress of unsupervised approaches: incomplete geometric constraint and inadequate interaction among spatial, temporal, and multiview features. This article introduces an unsupervised pipeline that uses calibrated camera parameters as geometric constraints across views and coordinate spaces to optimize the model by minimizing inconsistencies between the 2-D input pose and the reprojection of the predicted 3-D pose. This pipeline utilizes the novel hierarchical cross transformer (HCT) to encode higher levels of information by enabling interactions among hierarchical features containing different levels of temporal, spatial, and cross-view information. By minimizing the reliance on human-specific parts, the HCT shows potential for adapting to various pose estimation tasks. To validate the adaptability, we build a connection between human pose estimation and scene pose estimation, introducing a dynamic-keypoints-3-D (DKs-3D) dataset tailored for 3-D scene pose estimation in robotic manipulation. Experiments on two 3-D human pose estimation datasets demonstrate our method’s new state-of-the-art performance among weakly and unsupervised approaches. The adaptability of our method is confirmed through experiments on DK-3D, setting the initial benchmark for unsupervised 2-D-to-3-D scene pose lifting.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于无监督多视角三维姿态估计的几何一致性保证时空变换器
由于在获取用于训练的标记三维数据方面存在挑战,无监督三维姿态估计日益受到重视。尽管取得了可喜的进展,但无监督方法在性能上仍落后于有监督方法。有两个因素阻碍了无监督方法的发展:不完整的几何约束以及空间、时间和多视角特征之间的交互不足。本文介绍了一种无监督流水线,它使用校准过的相机参数作为跨视图和坐标空间的几何约束,通过最小化二维输入姿态与预测三维姿态的重投影之间的不一致性来优化模型。该管道利用新颖的分层交叉变换器(HCT),通过在包含不同层次的时间、空间和跨视图信息的分层特征之间进行交互,对更高层次的信息进行编码。通过最大限度地减少对人类特定部分的依赖,HCT 显示出适应各种姿势估计任务的潜力。为了验证其适应性,我们在人体姿态估计和场景姿态估计之间建立了联系,引入了一个为机器人操纵中的三维场景姿态估计量身定制的动态关键点三维(DKs-3D)数据集。在两个三维人体姿态估计数据集上的实验证明,在弱监督和无监督方法中,我们的方法具有全新的一流性能。我们方法的适应性通过在 DK-3D 上的实验得到了证实,DK-3D 为无监督的二维到三维场景姿态提升设定了初始基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Instrumentation and Measurement
IEEE Transactions on Instrumentation and Measurement 工程技术-工程:电子与电气
CiteScore
9.00
自引率
23.20%
发文量
1294
审稿时长
3.9 months
期刊介绍: Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.
期刊最新文献
Image Tracking of Fire Extinguishing Jet Drop Point Based on Improved ENet and Robust Adaptive Cubature Kalman Filtering The CTIgram: A Novel Optimal Demodulation Band Selection Method and Its Applications in Condition Monitoring of Rotating Machinery Clustering Federated Learning for Wafer Defects Classification on Statistical Heterogeneous Data Unsupervised Scale Network for Monocular Relative Depth and Visual Odometry Unsupervised GAN With Fine-Tuning: A Novel Framework for Induction Motor Fault Diagnosis in Scarcely Labeled Sample Scenarios
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1