CWGA-Net：用于从点云检测 3D 物体的中心加权图注意力网络

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-10-29 DOI:10.1016/j.imavis.2024.105314

Jun Shu , Qi Wu , Liang Tan , Xinyi Shu , Fengchun Wan

{"title":"CWGA-Net：用于从点云检测 3D 物体的中心加权图注意力网络","authors":"Jun Shu , Qi Wu , Liang Tan , Xinyi Shu , Fengchun Wan","doi":"10.1016/j.imavis.2024.105314","DOIUrl":null,"url":null,"abstract":"<div><div>The precision of 3D object detection from unevenly distributed outdoor point clouds is critical in autonomous driving perception systems. Current point-based detectors employ self-attention and graph convolution to establish contextual relationships between point clouds; however, they often introduce weakly correlated redundant information, leading to blurred geometric details and false detections. To address this issue, a novel Center-weighted Graph Attention Network (CWGA-Net) has been proposed to fuse geometric and semantic similarities for weighting cross-attention scores, thereby capturing precise fine-grained geometric features. CWGA-Net initially constructs and encodes local graphs between foreground points, establishing connections between point clouds from geometric and semantic dimensions. Subsequently, center-weighted cross-attention is utilized to compute the contextual relationships between vertices within the graph, and geometric and semantic similarities between vertices are fused to weight attention scores, thereby extracting strongly related geometric shape features. Finally, a cross-feature fusion Module is introduced to deeply fuse high and low-resolution features to compensate for the information loss during downsampling. Experiments conducted on the KITTI and Waymo datasets demonstrate that the network achieves superior detection capabilities, outperforming state-of-the-art point-based single-stage methods in terms of average precision metrics while maintaining good speed.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105314"},"PeriodicalIF":4.2000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds\",\"authors\":\"Jun Shu , Qi Wu , Liang Tan , Xinyi Shu , Fengchun Wan\",\"doi\":\"10.1016/j.imavis.2024.105314\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The precision of 3D object detection from unevenly distributed outdoor point clouds is critical in autonomous driving perception systems. Current point-based detectors employ self-attention and graph convolution to establish contextual relationships between point clouds; however, they often introduce weakly correlated redundant information, leading to blurred geometric details and false detections. To address this issue, a novel Center-weighted Graph Attention Network (CWGA-Net) has been proposed to fuse geometric and semantic similarities for weighting cross-attention scores, thereby capturing precise fine-grained geometric features. CWGA-Net initially constructs and encodes local graphs between foreground points, establishing connections between point clouds from geometric and semantic dimensions. Subsequently, center-weighted cross-attention is utilized to compute the contextual relationships between vertices within the graph, and geometric and semantic similarities between vertices are fused to weight attention scores, thereby extracting strongly related geometric shape features. Finally, a cross-feature fusion Module is introduced to deeply fuse high and low-resolution features to compensate for the information loss during downsampling. Experiments conducted on the KITTI and Waymo datasets demonstrate that the network achieves superior detection capabilities, outperforming state-of-the-art point-based single-stage methods in terms of average precision metrics while maintaining good speed.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"152 \",\"pages\":\"Article 105314\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624004190\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004190","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在自动驾驶感知系统中，从分布不均的室外点云中检测三维物体的精度至关重要。目前基于点的检测器采用自我注意和图卷积来建立点云之间的上下文关系；然而，它们经常引入弱相关的冗余信息，导致几何细节模糊和错误检测。为解决这一问题，我们提出了一种新颖的中心加权图注意网络（CWGA-Net），它融合了几何和语义相似性，用于加权交叉注意得分，从而精确捕捉细粒度几何特征。CWGA-Net 首先构建并编码前景点之间的局部图，从几何和语义两个维度建立点云之间的联系。随后，利用中心加权交叉注意力计算图中顶点之间的上下文关系，并将顶点之间的几何和语义相似性融合为加权注意力分数，从而提取出关联性强的几何形状特征。最后，还引入了交叉特征融合模块，对高分辨率和低分辨率特征进行深度融合，以弥补降采样过程中的信息损失。在 KITTI 和 Waymo 数据集上进行的实验表明，该网络实现了卓越的检测能力，在平均精度指标方面优于最先进的基于点的单级方法，同时保持了良好的速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds

The precision of 3D object detection from unevenly distributed outdoor point clouds is critical in autonomous driving perception systems. Current point-based detectors employ self-attention and graph convolution to establish contextual relationships between point clouds; however, they often introduce weakly correlated redundant information, leading to blurred geometric details and false detections. To address this issue, a novel Center-weighted Graph Attention Network (CWGA-Net) has been proposed to fuse geometric and semantic similarities for weighting cross-attention scores, thereby capturing precise fine-grained geometric features. CWGA-Net initially constructs and encodes local graphs between foreground points, establishing connections between point clouds from geometric and semantic dimensions. Subsequently, center-weighted cross-attention is utilized to compute the contextual relationships between vertices within the graph, and geometric and semantic similarities between vertices are fused to weight attention scores, thereby extracting strongly related geometric shape features. Finally, a cross-feature fusion Module is introduced to deeply fuse high and low-resolution features to compensate for the information loss during downsampling. Experiments conducted on the KITTI and Waymo datasets demonstrate that the network achieves superior detection capabilities, outperforming state-of-the-art point-based single-stage methods in terms of average precision metrics while maintaining good speed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.