Images, normal maps and point clouds fusion decoder for 6D pose estimation

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2025-01-01 DOI:10.1016/j.inffus.2024.102907
Hong-Bo Zhang , Jia-Xin Hong , Jing-Hua Liu , Qing Lei , Ji-Xiang Du
{"title":"Images, normal maps and point clouds fusion decoder for 6D pose estimation","authors":"Hong-Bo Zhang ,&nbsp;Jia-Xin Hong ,&nbsp;Jing-Hua Liu ,&nbsp;Qing Lei ,&nbsp;Ji-Xiang Du","doi":"10.1016/j.inffus.2024.102907","DOIUrl":null,"url":null,"abstract":"<div><div>6D pose estimation plays a crucial role in enabling intelligent robots to interact with their environment by understanding 3D scene information. This task is challenging due to factors such as texture-less objects, illumination variations, and scene occlusions. In this work, we present a novel approach that integrates feature fusion from multiple data modalities—specifically, RGB images, normal maps, and point clouds—to enhance the accuracy of 6D pose estimation. Unlike previous methods that rely solely on RGB-D data or focus on either shallow or deep feature fusion, the proposed method uniquely incorporates both shallow and deep feature fusion across heterogeneous modalities, compensating for the information often lost in point clouds. Specifically, the proposed method includes an adaptive feature fusion module designed to improve the communication and fusion of shallow features between RGB images and normal maps. Additionally, a multi-modal fusion decoder is implemented to facilitate cross-modal feature fusion between image and point cloud data. Experimental results demonstrate that the proposed method achieves state-of-the-art performance, with 6D pose estimation accuracy reaching 97.7% on the Linemod dataset, 71.5% on the Occlusion Linemod dataset, and 95.8% on the YCB-Video dataset. These results underline the robustness and effectiveness of the proposed approach in complex environments.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"117 ","pages":"Article 102907"},"PeriodicalIF":15.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524006857","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

6D pose estimation plays a crucial role in enabling intelligent robots to interact with their environment by understanding 3D scene information. This task is challenging due to factors such as texture-less objects, illumination variations, and scene occlusions. In this work, we present a novel approach that integrates feature fusion from multiple data modalities—specifically, RGB images, normal maps, and point clouds—to enhance the accuracy of 6D pose estimation. Unlike previous methods that rely solely on RGB-D data or focus on either shallow or deep feature fusion, the proposed method uniquely incorporates both shallow and deep feature fusion across heterogeneous modalities, compensating for the information often lost in point clouds. Specifically, the proposed method includes an adaptive feature fusion module designed to improve the communication and fusion of shallow features between RGB images and normal maps. Additionally, a multi-modal fusion decoder is implemented to facilitate cross-modal feature fusion between image and point cloud data. Experimental results demonstrate that the proposed method achieves state-of-the-art performance, with 6D pose estimation accuracy reaching 97.7% on the Linemod dataset, 71.5% on the Occlusion Linemod dataset, and 95.8% on the YCB-Video dataset. These results underline the robustness and effectiveness of the proposed approach in complex environments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
图像,法线贴图和点云融合解码器的6D姿态估计
6D姿态估计在智能机器人通过理解3D场景信息与环境交互方面起着至关重要的作用。由于诸如无纹理物体、光照变化和场景遮挡等因素,这项任务具有挑战性。在这项工作中,我们提出了一种新的方法,该方法集成了来自多种数据模式的特征融合,特别是RGB图像、法线图和点云,以提高6D姿态估计的准确性。与以往仅依赖RGB-D数据或只关注浅层或深层特征融合的方法不同,该方法独特地融合了跨异构模式的浅层和深层特征融合,弥补了点云中经常丢失的信息。具体而言,该方法包括一个自适应特征融合模块,旨在改善RGB图像与法线贴图之间的浅特征通信和融合。此外,还实现了多模态融合解码器,以促进图像和点云数据之间的跨模态特征融合。实验结果表明,该方法达到了最先进的性能,在Linemod数据集上6D姿态估计精度达到97.7%,在遮挡线emod数据集上达到71.5%,在ybc - video数据集上达到95.8%。这些结果强调了所提出的方法在复杂环境中的鲁棒性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
期刊最新文献
PCFNet: Period–channel fusion network for multivariate time series forecasting — towards multi-period dependency modeling Learning Spatio-Temporal Affine Representation Subspace for Video-based Person Re-Identification From Unimodal to Flexible: A Survey of Generalized Biometric Systems Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey Consensus Learning Framework Boosting Co-clustering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1