VirInteraction：利用图像语义和密度估计增强三维目标检测中虚拟与激光雷达点的交互

IF 4.6 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2025-01-06 DOI:10.1109/LRA.2025.3526568

Huming Zhu;Yiyu Xue;Ximiao Dong;Xinyue Cheng

{"title":"VirInteraction：利用图像语义和密度估计增强三维目标检测中虚拟与激光雷达点的交互","authors":"Huming Zhu;Yiyu Xue;Ximiao Dong;Xinyue Cheng","doi":"10.1109/LRA.2025.3526568","DOIUrl":null,"url":null,"abstract":"Distant object detection is a difficult problem in LiDAR-based 3D object detection. In recent years, the 3D detection of distant objects has achieved great success with the proposed fusion method of the virtual points generated by depth completion and LiDAR points. However, the inaccuracy of depth completion brings a lot of noise which significantly reduces the detection accuracy. To reduce noise and improve the detection accuracy of distant objects, we propose a solution called VirInteraction, which is a semantic-guided Virtual-LiDAR fusion method to enhance the interaction of virtual points and LiDAR points. Specifically, VirInteraction mainly includes three new designs: 1) Foreground-based adaptive Voxel Denoising (FgVD), 2) Semantic neighboring Sampling (Se-Sampling), and 3) Multi-scale Density-aware Cross Attention (MDC-Attention). FgVD uses Kernel Density Estimation (KDE) to adaptively denoise the foreground and background voxels. Se-Sampling completes the shape cues of distant objects using bidirectional sampling based on self-attention mechanism. Meanwhile, we built on these two designs and VirConvNet to develop a more robust VirInterNet as our virtual-point-based backbone. Finally, MDC-Attention elegantly aggregates the features of the images and points at the feature level according to the density distribution. Extensive experiments on KITTI and nuScenes datasets demonstrate the effectiveness of VirInteraction.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"1872-1879"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VirInteraction: Enhancing Virtual-LiDAR Points Interaction by Using Image Semantics and Density Estimation for 3D Object Detection\",\"authors\":\"Huming Zhu;Yiyu Xue;Ximiao Dong;Xinyue Cheng\",\"doi\":\"10.1109/LRA.2025.3526568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distant object detection is a difficult problem in LiDAR-based 3D object detection. In recent years, the 3D detection of distant objects has achieved great success with the proposed fusion method of the virtual points generated by depth completion and LiDAR points. However, the inaccuracy of depth completion brings a lot of noise which significantly reduces the detection accuracy. To reduce noise and improve the detection accuracy of distant objects, we propose a solution called VirInteraction, which is a semantic-guided Virtual-LiDAR fusion method to enhance the interaction of virtual points and LiDAR points. Specifically, VirInteraction mainly includes three new designs: 1) Foreground-based adaptive Voxel Denoising (FgVD), 2) Semantic neighboring Sampling (Se-Sampling), and 3) Multi-scale Density-aware Cross Attention (MDC-Attention). FgVD uses Kernel Density Estimation (KDE) to adaptively denoise the foreground and background voxels. Se-Sampling completes the shape cues of distant objects using bidirectional sampling based on self-attention mechanism. Meanwhile, we built on these two designs and VirConvNet to develop a more robust VirInterNet as our virtual-point-based backbone. Finally, MDC-Attention elegantly aggregates the features of the images and points at the feature level according to the density distribution. Extensive experiments on KITTI and nuScenes datasets demonstrate the effectiveness of VirInteraction.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 2\",\"pages\":\"1872-1879\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10829633/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829633/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

远距离目标检测是基于激光雷达的三维目标检测中的一个难点。近年来，将深度补全生成的虚拟点与LiDAR点融合的方法在远距离物体的三维检测中取得了很大的成功。然而，深度补全的不准确性带来了大量的噪声，大大降低了探测精度。为了降低噪声，提高对远距离目标的检测精度，我们提出了一种名为VirInteraction的解决方案，这是一种语义引导的虚拟-激光雷达融合方法，以增强虚拟点和激光雷达点的交互性。具体来说，VirInteraction主要包括三种新的设计：1)基于前景的自适应体素去噪（FgVD）， 2)语义邻近采样（Se-Sampling）和3)多尺度密度感知交叉注意（MDC-Attention）。FgVD使用核密度估计（KDE）对前景和背景体素进行自适应降噪。Se-Sampling利用基于自注意机制的双向采样来完成远距离物体的形状线索。同时，我们在这两个设计和VirConvNet的基础上开发了一个更强大的VirInterNet作为我们基于虚拟点的骨干。最后，MDC-Attention根据密度分布优雅地聚合图像的特征和特征级的点。在KITTI和nuScenes数据集上的大量实验证明了VirInteraction的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

VirInteraction: Enhancing Virtual-LiDAR Points Interaction by Using Image Semantics and Density Estimation for 3D Object Detection

Distant object detection is a difficult problem in LiDAR-based 3D object detection. In recent years, the 3D detection of distant objects has achieved great success with the proposed fusion method of the virtual points generated by depth completion and LiDAR points. However, the inaccuracy of depth completion brings a lot of noise which significantly reduces the detection accuracy. To reduce noise and improve the detection accuracy of distant objects, we propose a solution called VirInteraction, which is a semantic-guided Virtual-LiDAR fusion method to enhance the interaction of virtual points and LiDAR points. Specifically, VirInteraction mainly includes three new designs: 1) Foreground-based adaptive Voxel Denoising (FgVD), 2) Semantic neighboring Sampling (Se-Sampling), and 3) Multi-scale Density-aware Cross Attention (MDC-Attention). FgVD uses Kernel Density Estimation (KDE) to adaptively denoise the foreground and background voxels. Se-Sampling completes the shape cues of distant objects using bidirectional sampling based on self-attention mechanism. Meanwhile, we built on these two designs and VirConvNet to develop a more robust VirInterNet as our virtual-point-based backbone. Finally, MDC-Attention elegantly aggregates the features of the images and points at the feature level according to the density distribution. Extensive experiments on KITTI and nuScenes datasets demonstrate the effectiveness of VirInteraction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.