VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-06-05 DOI:10.1109/TMM.2024.3410117

Lin Wang;Shiliang Sun;Jing Zhao

{"title":"VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection","authors":"Lin Wang;Shiliang Sun;Jing Zhao","doi":"10.1109/TMM.2024.3410117","DOIUrl":null,"url":null,"abstract":"LiDAR and camera are the most common used sensors to percept the road scenes in autonomous driving. Current methods tried to fuse the two complementary information to boost 3D object detection. However, there are still two burning problems for multi-modality 3D object detection. One is the detection problem for the objects with sparse point clouds. The other is the misalignment of different sensors caused by the fixed physical locations. Therefore, this paper argues that explicitly fusing information from the two modalities with the physical misalignment is suboptimal for multi-modality 3D object detection. This paper presents a novel virtual point generation network, VirPNet, to overcome the multi-modality fusion challenges. On one hand, it completes sparse point cloud objects from image source and improves the final detection accuracy. On the other hand, it directly detects 3D targets from raw point clouds to avoid the physical misalignment between LiDAR and camera sensors. Different from previous point cloud completion methods, VirPNet fully utilizes the geometric information of pixels and point clouds and simplifies 3D point cloud regression into a 2D distance regression problem through a virtual plane. Experimental results on KITTI 3D object detection dataset and nuScenes dataset demonstrate that VirPNet improves the detection accuracy with the help of the generated virtual points.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10597-10609"},"PeriodicalIF":8.4000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10549844/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

LiDAR and camera are the most common used sensors to percept the road scenes in autonomous driving. Current methods tried to fuse the two complementary information to boost 3D object detection. However, there are still two burning problems for multi-modality 3D object detection. One is the detection problem for the objects with sparse point clouds. The other is the misalignment of different sensors caused by the fixed physical locations. Therefore, this paper argues that explicitly fusing information from the two modalities with the physical misalignment is suboptimal for multi-modality 3D object detection. This paper presents a novel virtual point generation network, VirPNet, to overcome the multi-modality fusion challenges. On one hand, it completes sparse point cloud objects from image source and improves the final detection accuracy. On the other hand, it directly detects 3D targets from raw point clouds to avoid the physical misalignment between LiDAR and camera sensors. Different from previous point cloud completion methods, VirPNet fully utilizes the geometric information of pixels and point clouds and simplifies 3D point cloud regression into a 2D distance regression problem through a virtual plane. Experimental results on KITTI 3D object detection dataset and nuScenes dataset demonstrate that VirPNet improves the detection accuracy with the help of the generated virtual points.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

VirPNet：用于 3D 物体检测的多模态虚拟点生成网络

激光雷达和摄像头是自动驾驶中感知道路场景最常用的传感器。目前的方法试图融合这两种互补信息来增强三维物体检测。然而，多模态三维物体检测仍存在两个棘手问题。一个是稀疏点云的物体检测问题。另一个问题是固定物理位置造成的不同传感器之间的错位。因此，本文认为，在多模态三维物体检测中，明确融合两种模态的信息以及物理错位是次优的。本文提出了一种新型虚拟点生成网络 VirPNet，以克服多模态融合的挑战。一方面，它完成了来自图像源的稀疏点云对象，提高了最终检测精度。另一方面，它直接从原始点云中检测三维目标，避免了激光雷达和相机传感器之间的物理错位。与以往的点云补全方法不同，VirPNet 充分利用了像素和点云的几何信息，通过虚拟平面将三维点云回归简化为二维距离回归问题。在 KITTI 3D 物体检测数据集和 nuScenes 数据集上的实验结果表明，VirPNet 借助生成的虚拟点提高了检测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.