{"title":"VirInteraction:利用图像语义和密度估计增强三维目标检测中虚拟与激光雷达点的交互","authors":"Huming Zhu;Yiyu Xue;Ximiao Dong;Xinyue Cheng","doi":"10.1109/LRA.2025.3526568","DOIUrl":null,"url":null,"abstract":"Distant object detection is a difficult problem in LiDAR-based 3D object detection. In recent years, the 3D detection of distant objects has achieved great success with the proposed fusion method of the virtual points generated by depth completion and LiDAR points. However, the inaccuracy of depth completion brings a lot of noise which significantly reduces the detection accuracy. To reduce noise and improve the detection accuracy of distant objects, we propose a solution called VirInteraction, which is a semantic-guided Virtual-LiDAR fusion method to enhance the interaction of virtual points and LiDAR points. Specifically, VirInteraction mainly includes three new designs: 1) Foreground-based adaptive Voxel Denoising (FgVD), 2) Semantic neighboring Sampling (Se-Sampling), and 3) Multi-scale Density-aware Cross Attention (MDC-Attention). FgVD uses Kernel Density Estimation (KDE) to adaptively denoise the foreground and background voxels. Se-Sampling completes the shape cues of distant objects using bidirectional sampling based on self-attention mechanism. Meanwhile, we built on these two designs and VirConvNet to develop a more robust VirInterNet as our virtual-point-based backbone. Finally, MDC-Attention elegantly aggregates the features of the images and points at the feature level according to the density distribution. Extensive experiments on KITTI and nuScenes datasets demonstrate the effectiveness of VirInteraction.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"1872-1879"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VirInteraction: Enhancing Virtual-LiDAR Points Interaction by Using Image Semantics and Density Estimation for 3D Object Detection\",\"authors\":\"Huming Zhu;Yiyu Xue;Ximiao Dong;Xinyue Cheng\",\"doi\":\"10.1109/LRA.2025.3526568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distant object detection is a difficult problem in LiDAR-based 3D object detection. In recent years, the 3D detection of distant objects has achieved great success with the proposed fusion method of the virtual points generated by depth completion and LiDAR points. However, the inaccuracy of depth completion brings a lot of noise which significantly reduces the detection accuracy. To reduce noise and improve the detection accuracy of distant objects, we propose a solution called VirInteraction, which is a semantic-guided Virtual-LiDAR fusion method to enhance the interaction of virtual points and LiDAR points. Specifically, VirInteraction mainly includes three new designs: 1) Foreground-based adaptive Voxel Denoising (FgVD), 2) Semantic neighboring Sampling (Se-Sampling), and 3) Multi-scale Density-aware Cross Attention (MDC-Attention). FgVD uses Kernel Density Estimation (KDE) to adaptively denoise the foreground and background voxels. Se-Sampling completes the shape cues of distant objects using bidirectional sampling based on self-attention mechanism. Meanwhile, we built on these two designs and VirConvNet to develop a more robust VirInterNet as our virtual-point-based backbone. Finally, MDC-Attention elegantly aggregates the features of the images and points at the feature level according to the density distribution. Extensive experiments on KITTI and nuScenes datasets demonstrate the effectiveness of VirInteraction.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 2\",\"pages\":\"1872-1879\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10829633/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829633/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
VirInteraction: Enhancing Virtual-LiDAR Points Interaction by Using Image Semantics and Density Estimation for 3D Object Detection
Distant object detection is a difficult problem in LiDAR-based 3D object detection. In recent years, the 3D detection of distant objects has achieved great success with the proposed fusion method of the virtual points generated by depth completion and LiDAR points. However, the inaccuracy of depth completion brings a lot of noise which significantly reduces the detection accuracy. To reduce noise and improve the detection accuracy of distant objects, we propose a solution called VirInteraction, which is a semantic-guided Virtual-LiDAR fusion method to enhance the interaction of virtual points and LiDAR points. Specifically, VirInteraction mainly includes three new designs: 1) Foreground-based adaptive Voxel Denoising (FgVD), 2) Semantic neighboring Sampling (Se-Sampling), and 3) Multi-scale Density-aware Cross Attention (MDC-Attention). FgVD uses Kernel Density Estimation (KDE) to adaptively denoise the foreground and background voxels. Se-Sampling completes the shape cues of distant objects using bidirectional sampling based on self-attention mechanism. Meanwhile, we built on these two designs and VirConvNet to develop a more robust VirInterNet as our virtual-point-based backbone. Finally, MDC-Attention elegantly aggregates the features of the images and points at the feature level according to the density distribution. Extensive experiments on KITTI and nuScenes datasets demonstrate the effectiveness of VirInteraction.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.