EPAWFusion: multimodal fusion for 3D object detection based on enhanced points and adaptive weights

IF 1.4 4区 地球科学 Q4 ENVIRONMENTAL SCIENCES Journal of Applied Remote Sensing Pub Date : 2024-01-01 DOI:10.1117/1.jrs.18.017501
Xiang Sun, Shaojing Song, Fan Wu, Tingting Lu, Bohao Li, Zhiqing Miao
{"title":"EPAWFusion: multimodal fusion for 3D object detection based on enhanced points and adaptive weights","authors":"Xiang Sun, Shaojing Song, Fan Wu, Tingting Lu, Bohao Li, Zhiqing Miao","doi":"10.1117/1.jrs.18.017501","DOIUrl":null,"url":null,"abstract":"Fusing LiDAR point cloud and camera image for 3D object detection in autonomous driving has emerged as a captivating research avenue. The core challenge of multimodal fusion is how to seamlessly fuse 3D LiDAR point cloud with 2D camera image. Although current approaches exhibit promising results, they often rely solely on fusion at either the data level, feature level, or object level, and there is still a room for improvement in the utilization of multimodal information. We present an advanced and effective multimodal fusion framework called EPAWFusion for fusing 3D point cloud and 2D camera image at both data level and feature level. EPAWFusion model consists of three key modules: a point enhanced module based on semantic segmentation for data-level fusion, an adaptive weight allocation module for feature-level fusion, and a detector based on 3D sparse convolution. The semantic information of the 2D image is extracted using semantic segmentation, and the calibration matrix is used to establish the point-pixel correspondence. The semantic information and distance information are then attached to the point cloud to achieve data-level fusion. The geometry features of enhanced point cloud are extracted by voxel encoding, and the texture features of image are obtained using a pretrained 2D CNN. Feature-level fusion is achieved via the adaptive weight allocation module. The fused features are fed into a 3D sparse convolution-based detector to obtain the accurate 3D objects. Experiment results demonstrate that EPAWFusion outperforms the baseline network MVXNet on the KITTI dataset for 3D detection of cars, pedestrians, and cyclists by 5.81%, 6.97%, and 3.88%. Additionally, EPAWFusion performs well for single-vehicle-side 3D object detection based on the experimental findings on DAIR-V2X dataset and the inference frame rate of our proposed model reaches 11.1 FPS. The two-layer level fusion of EPAWFusion significantly enhances the performance of multimodal 3D object detection.","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1117/1.jrs.18.017501","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Fusing LiDAR point cloud and camera image for 3D object detection in autonomous driving has emerged as a captivating research avenue. The core challenge of multimodal fusion is how to seamlessly fuse 3D LiDAR point cloud with 2D camera image. Although current approaches exhibit promising results, they often rely solely on fusion at either the data level, feature level, or object level, and there is still a room for improvement in the utilization of multimodal information. We present an advanced and effective multimodal fusion framework called EPAWFusion for fusing 3D point cloud and 2D camera image at both data level and feature level. EPAWFusion model consists of three key modules: a point enhanced module based on semantic segmentation for data-level fusion, an adaptive weight allocation module for feature-level fusion, and a detector based on 3D sparse convolution. The semantic information of the 2D image is extracted using semantic segmentation, and the calibration matrix is used to establish the point-pixel correspondence. The semantic information and distance information are then attached to the point cloud to achieve data-level fusion. The geometry features of enhanced point cloud are extracted by voxel encoding, and the texture features of image are obtained using a pretrained 2D CNN. Feature-level fusion is achieved via the adaptive weight allocation module. The fused features are fed into a 3D sparse convolution-based detector to obtain the accurate 3D objects. Experiment results demonstrate that EPAWFusion outperforms the baseline network MVXNet on the KITTI dataset for 3D detection of cars, pedestrians, and cyclists by 5.81%, 6.97%, and 3.88%. Additionally, EPAWFusion performs well for single-vehicle-side 3D object detection based on the experimental findings on DAIR-V2X dataset and the inference frame rate of our proposed model reaches 11.1 FPS. The two-layer level fusion of EPAWFusion significantly enhances the performance of multimodal 3D object detection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
EPAWFusion:基于增强点和自适应权重的三维物体检测多模态融合
在自动驾驶中融合激光雷达点云和摄像头图像进行三维物体检测已成为一个引人注目的研究领域。多模态融合的核心挑战是如何无缝融合三维激光雷达点云和二维相机图像。尽管目前的方法取得了可喜的成果,但它们往往仅仅依赖于数据级、特征级或对象级的融合,在多模态信息的利用方面仍有改进的余地。我们提出了一种先进而有效的多模态融合框架,称为 EPAWFusion,用于在数据级和特征级融合三维点云和二维相机图像。EPAWFusion 模型由三个关键模块组成:用于数据级融合的基于语义分割的点增强模块、用于特征级融合的自适应权重分配模块和基于三维稀疏卷积的检测器。利用语义分割提取二维图像的语义信息,并利用校准矩阵建立点-像素对应关系。然后将语义信息和距离信息附加到点云上,实现数据级融合。通过体素编码提取增强点云的几何特征,并使用预训练的二维 CNN 获取图像的纹理特征。通过自适应权重分配模块实现特征级融合。融合后的特征被送入基于三维稀疏卷积的检测器,从而获得精确的三维物体。实验结果表明,在 KITTI 数据集上,EPAWFusion 对汽车、行人和骑车人的三维检测结果分别比基准网络 MVXNet 高出 5.81%、6.97% 和 3.88%。此外,根据在 DAIR-V2X 数据集上的实验结果,EPAWFusion 在单车侧三维物体检测方面表现出色,我们提出的模型的推理帧速率达到 11.1 FPS。EPAWFusion 的双层融合显著提高了多模态三维物体检测的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Applied Remote Sensing
Journal of Applied Remote Sensing 环境科学-成像科学与照相技术
CiteScore
3.40
自引率
11.80%
发文量
194
审稿时长
3 months
期刊介绍: The Journal of Applied Remote Sensing is a peer-reviewed journal that optimizes the communication of concepts, information, and progress among the remote sensing community.
期刊最新文献
Few-shot synthetic aperture radar object detection algorithm based on meta-learning and variational inference Object-based strategy for generating high-resolution four-dimensional thermal surface models of buildings based on integration of visible and thermal unmanned aerial vehicle imagery Frequent oversights in on-orbit modulation transfer function estimation of optical imager onboard EO satellites Comprehensive comparison of different gridded precipitation products over geographic regions of Türkiye Monitoring soil moisture in cotton fields with synthetic aperture radar and optical data in arid and semi-arid regions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1