CWGA-Net:用于从点云检测 3D 物体的中心加权图注意力网络

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-10-29 DOI:10.1016/j.imavis.2024.105314
Jun Shu , Qi Wu , Liang Tan , Xinyi Shu , Fengchun Wan
{"title":"CWGA-Net:用于从点云检测 3D 物体的中心加权图注意力网络","authors":"Jun Shu ,&nbsp;Qi Wu ,&nbsp;Liang Tan ,&nbsp;Xinyi Shu ,&nbsp;Fengchun Wan","doi":"10.1016/j.imavis.2024.105314","DOIUrl":null,"url":null,"abstract":"<div><div>The precision of 3D object detection from unevenly distributed outdoor point clouds is critical in autonomous driving perception systems. Current point-based detectors employ self-attention and graph convolution to establish contextual relationships between point clouds; however, they often introduce weakly correlated redundant information, leading to blurred geometric details and false detections. To address this issue, a novel Center-weighted Graph Attention Network (CWGA-Net) has been proposed to fuse geometric and semantic similarities for weighting cross-attention scores, thereby capturing precise fine-grained geometric features. CWGA-Net initially constructs and encodes local graphs between foreground points, establishing connections between point clouds from geometric and semantic dimensions. Subsequently, center-weighted cross-attention is utilized to compute the contextual relationships between vertices within the graph, and geometric and semantic similarities between vertices are fused to weight attention scores, thereby extracting strongly related geometric shape features. Finally, a cross-feature fusion Module is introduced to deeply fuse high and low-resolution features to compensate for the information loss during downsampling. Experiments conducted on the KITTI and Waymo datasets demonstrate that the network achieves superior detection capabilities, outperforming state-of-the-art point-based single-stage methods in terms of average precision metrics while maintaining good speed.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105314"},"PeriodicalIF":4.2000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds\",\"authors\":\"Jun Shu ,&nbsp;Qi Wu ,&nbsp;Liang Tan ,&nbsp;Xinyi Shu ,&nbsp;Fengchun Wan\",\"doi\":\"10.1016/j.imavis.2024.105314\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The precision of 3D object detection from unevenly distributed outdoor point clouds is critical in autonomous driving perception systems. Current point-based detectors employ self-attention and graph convolution to establish contextual relationships between point clouds; however, they often introduce weakly correlated redundant information, leading to blurred geometric details and false detections. To address this issue, a novel Center-weighted Graph Attention Network (CWGA-Net) has been proposed to fuse geometric and semantic similarities for weighting cross-attention scores, thereby capturing precise fine-grained geometric features. CWGA-Net initially constructs and encodes local graphs between foreground points, establishing connections between point clouds from geometric and semantic dimensions. Subsequently, center-weighted cross-attention is utilized to compute the contextual relationships between vertices within the graph, and geometric and semantic similarities between vertices are fused to weight attention scores, thereby extracting strongly related geometric shape features. Finally, a cross-feature fusion Module is introduced to deeply fuse high and low-resolution features to compensate for the information loss during downsampling. Experiments conducted on the KITTI and Waymo datasets demonstrate that the network achieves superior detection capabilities, outperforming state-of-the-art point-based single-stage methods in terms of average precision metrics while maintaining good speed.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"152 \",\"pages\":\"Article 105314\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624004190\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004190","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

在自动驾驶感知系统中,从分布不均的室外点云中检测三维物体的精度至关重要。目前基于点的检测器采用自我注意和图卷积来建立点云之间的上下文关系;然而,它们经常引入弱相关的冗余信息,导致几何细节模糊和错误检测。为解决这一问题,我们提出了一种新颖的中心加权图注意网络(CWGA-Net),它融合了几何和语义相似性,用于加权交叉注意得分,从而精确捕捉细粒度几何特征。CWGA-Net 首先构建并编码前景点之间的局部图,从几何和语义两个维度建立点云之间的联系。随后,利用中心加权交叉注意力计算图中顶点之间的上下文关系,并将顶点之间的几何和语义相似性融合为加权注意力分数,从而提取出关联性强的几何形状特征。最后,还引入了交叉特征融合模块,对高分辨率和低分辨率特征进行深度融合,以弥补降采样过程中的信息损失。在 KITTI 和 Waymo 数据集上进行的实验表明,该网络实现了卓越的检测能力,在平均精度指标方面优于最先进的基于点的单级方法,同时保持了良好的速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds
The precision of 3D object detection from unevenly distributed outdoor point clouds is critical in autonomous driving perception systems. Current point-based detectors employ self-attention and graph convolution to establish contextual relationships between point clouds; however, they often introduce weakly correlated redundant information, leading to blurred geometric details and false detections. To address this issue, a novel Center-weighted Graph Attention Network (CWGA-Net) has been proposed to fuse geometric and semantic similarities for weighting cross-attention scores, thereby capturing precise fine-grained geometric features. CWGA-Net initially constructs and encodes local graphs between foreground points, establishing connections between point clouds from geometric and semantic dimensions. Subsequently, center-weighted cross-attention is utilized to compute the contextual relationships between vertices within the graph, and geometric and semantic similarities between vertices are fused to weight attention scores, thereby extracting strongly related geometric shape features. Finally, a cross-feature fusion Module is introduced to deeply fuse high and low-resolution features to compensate for the information loss during downsampling. Experiments conducted on the KITTI and Waymo datasets demonstrate that the network achieves superior detection capabilities, outperforming state-of-the-art point-based single-stage methods in terms of average precision metrics while maintaining good speed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
期刊最新文献
CF-SOLT: Real-time and accurate traffic accident detection using correlation filter-based tracking TransWild: Enhancing 3D interacting hands recovery in the wild with IoU-guided Transformer Machine learning applications in breast cancer prediction using mammography Channel and Spatial Enhancement Network for human parsing Non-negative subspace feature representation for few-shot learning in medical imaging
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1