GDRNPP: A Geometry-Guided and Fully Learning-Based Object Pose Estimator

Xingyu Liu;Ruida Zhang;Chenyangguang Zhang;Gu Wang;Jiwen Tang;Zhigang Li;Xiangyang Ji
{"title":"GDRNPP: A Geometry-Guided and Fully Learning-Based Object Pose Estimator","authors":"Xingyu Liu;Ruida Zhang;Chenyangguang Zhang;Gu Wang;Jiwen Tang;Zhigang Li;Xiangyang Ji","doi":"10.1109/TPAMI.2025.3553485","DOIUrl":null,"url":null,"abstract":"6D pose estimation of rigid objects is a long-standing and challenging task in computer vision. Recently, the emergence of deep learning reveals the potential of Convolutional Neural Networks (CNNs) to predict reliable 6D poses. Given that direct pose regression networks currently exhibit suboptimal performance, most methods still resort to traditional techniques to varying degrees. For example, top-performing methods often adopt an indirect strategy by first establishing 2D-3D or 3D-3D correspondences followed by applying the RANSAC-based P <inline-formula><tex-math>$n$</tex-math></inline-formula> P or Kabsch algorithms, and further employing ICP for refinement. Despite the performance enhancement, the integration of traditional techniques makes the networks time-consuming and not end-to-end trainable. Orthogonal to them, this paper introduces a fully learning-based object pose estimator. In this work, we first perform an in-depth investigation of both direct and indirect methods and propose a simple yet effective Geometry-guided Direct Regression Network (GDRN) to learn the 6D pose from monocular images in an end-to-end manner. Afterwards, we introduce a geometry-guided pose refinement module, enhancing pose accuracy when extra depth data is available. Guided by the predicted coordinate map, we build an end-to-end differentiable architecture that establishes robust and accurate 3D-3D correspondences between the observed and rendered RGB-D images to refine the pose. Our enhanced pose estimation pipeline GDRNPP (GDRN Plus Plus) conquered the leaderboard of the BOP Challenge for two consecutive years, becoming the first to surpass all prior methods that relied on traditional techniques in both accuracy and speed.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 7","pages":"5742-5759"},"PeriodicalIF":18.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10937121/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

6D pose estimation of rigid objects is a long-standing and challenging task in computer vision. Recently, the emergence of deep learning reveals the potential of Convolutional Neural Networks (CNNs) to predict reliable 6D poses. Given that direct pose regression networks currently exhibit suboptimal performance, most methods still resort to traditional techniques to varying degrees. For example, top-performing methods often adopt an indirect strategy by first establishing 2D-3D or 3D-3D correspondences followed by applying the RANSAC-based P $n$ P or Kabsch algorithms, and further employing ICP for refinement. Despite the performance enhancement, the integration of traditional techniques makes the networks time-consuming and not end-to-end trainable. Orthogonal to them, this paper introduces a fully learning-based object pose estimator. In this work, we first perform an in-depth investigation of both direct and indirect methods and propose a simple yet effective Geometry-guided Direct Regression Network (GDRN) to learn the 6D pose from monocular images in an end-to-end manner. Afterwards, we introduce a geometry-guided pose refinement module, enhancing pose accuracy when extra depth data is available. Guided by the predicted coordinate map, we build an end-to-end differentiable architecture that establishes robust and accurate 3D-3D correspondences between the observed and rendered RGB-D images to refine the pose. Our enhanced pose estimation pipeline GDRNPP (GDRN Plus Plus) conquered the leaderboard of the BOP Challenge for two consecutive years, becoming the first to surpass all prior methods that relied on traditional techniques in both accuracy and speed.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GDRNPP:基于几何引导和完全学习的物体姿态估计器
在计算机视觉中,刚性物体的6D姿态估计是一个长期存在且具有挑战性的任务。最近,深度学习的出现揭示了卷积神经网络(cnn)在预测可靠的6D姿势方面的潜力。鉴于直接姿态回归网络目前表现出次优的性能,大多数方法仍然在不同程度上诉诸于传统技术。例如,表现最好的方法通常采用间接策略,首先建立2D-3D或3D-3D对应关系,然后应用基于ransac的P $n$ P或kabch算法,并进一步使用ICP进行细化。尽管性能有所提高,但传统技术的集成使得网络耗时且不具有端到端可训练性。与它们正交,本文引入了一种完全基于学习的目标姿态估计器。在这项工作中,我们首先对直接和间接方法进行了深入的研究,并提出了一种简单而有效的几何引导直接回归网络(GDRN),以端到端方式从单眼图像中学习6D姿势。然后,我们引入了一个几何导向的姿态优化模块,当有额外的深度数据可用时,可以提高姿态精度。在预测坐标图的指导下,我们构建了一个端到端的可微架构,该架构在观察到的RGB-D图像和渲染的RGB-D图像之间建立了鲁棒和准确的3D-3D对应关系,以改进姿态。我们的增强姿态估计管道GDRNPP (GDRN Plus Plus)连续两年征服了防喷器挑战赛的排行榜,成为第一个在精度和速度上超越所有依赖传统技术的先前方法的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GrowSP++: Growing Superpoints and Primitives for Unsupervised 3D Semantic Segmentation. Unsupervised Gaze Representation Learning by Switching Features. H2OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers. MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection. Parse Trees Guided LLM Prompt Compression.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1