Yidong Chen;Guorong Cai;Ziying Song;Zhaoliang Liu;Binghui Zeng;Jonathan Li;Zongyue Wang
{"title":"LVP: Leverage Virtual Points in Multimodal Early Fusion for 3-D Object Detection","authors":"Yidong Chen;Guorong Cai;Ziying Song;Zhaoliang Liu;Binghui Zeng;Jonathan Li;Zongyue Wang","doi":"10.1109/TGRS.2024.3519386","DOIUrl":null,"url":null,"abstract":"Due to the sparsity and occlusion of point clouds, pure point cloud detection has limited effectiveness in detecting such samples. Researchers have been actively exploring the fusion of multimodal data, attempting to address the bottleneck issue based on LiDAR. In particular, virtual points, generated through depth completion from front-view RGB image, offer the potential for better integration with point clouds. Nevertheless, recent approaches fuse these two modalities in the region of interest (RoI), which limits the fusion effectiveness due to the inaccurate RoI region issue in the point cloud’s branch, especially in hard samples. To overcome it and unleash the potential of virtual points, while combining late fusion, we present leverage virtual point (LVP), a high-performance 3-D object detector which LVPs in early fusion to enhance the quality of RoI generation. LVP consists of three early fusion modules: virtual points painting (VPP), virtual points auxiliary (VPA), and virtual points completion (VPC) to achieve point-level fusion and global-level fusion. The integration of these modules effectively improves occlusion handling and improves the detection of distant small objects. In the KITTI benchmark, LVP achieves 85.45% 3-D mAP. As for large dataset nuScenes, we could improve the detection accuracy of large objects by compensating for errors in depth estimation. Without whistles and bells, these results establish LVP as an impressive solution for a 3-D outdoor object detection algorithm.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-15"},"PeriodicalIF":7.5000,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10804692/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Due to the sparsity and occlusion of point clouds, pure point cloud detection has limited effectiveness in detecting such samples. Researchers have been actively exploring the fusion of multimodal data, attempting to address the bottleneck issue based on LiDAR. In particular, virtual points, generated through depth completion from front-view RGB image, offer the potential for better integration with point clouds. Nevertheless, recent approaches fuse these two modalities in the region of interest (RoI), which limits the fusion effectiveness due to the inaccurate RoI region issue in the point cloud’s branch, especially in hard samples. To overcome it and unleash the potential of virtual points, while combining late fusion, we present leverage virtual point (LVP), a high-performance 3-D object detector which LVPs in early fusion to enhance the quality of RoI generation. LVP consists of three early fusion modules: virtual points painting (VPP), virtual points auxiliary (VPA), and virtual points completion (VPC) to achieve point-level fusion and global-level fusion. The integration of these modules effectively improves occlusion handling and improves the detection of distant small objects. In the KITTI benchmark, LVP achieves 85.45% 3-D mAP. As for large dataset nuScenes, we could improve the detection accuracy of large objects by compensating for errors in depth estimation. Without whistles and bells, these results establish LVP as an impressive solution for a 3-D outdoor object detection algorithm.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.