首页 > 最新文献

Image and Vision Computing最新文献

英文 中文
Corrigendum to “STAFFormer: Spatio-temporal adaptive fusion transformer for efficient 3D human pose estimation” [Journal of Image and Vision Computing volume 149 (2024) 105142] STAFFormer:用于高效三维人体姿态估计的时空自适应融合变换器"[《图像和视觉计算杂志》第 149 (2024) 105142 卷] 的更正
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.imavis.2024.105305
Feng Hao, Fujin Zhong, Yunhe Wang, Hong Yu, Jun Hu, Yan Yang
{"title":"Corrigendum to “STAFFormer: Spatio-temporal adaptive fusion transformer for efficient 3D human pose estimation” [Journal of Image and Vision Computing volume 149 (2024) 105142]","authors":"Feng Hao, Fujin Zhong, Yunhe Wang, Hong Yu, Jun Hu, Yan Yang","doi":"10.1016/j.imavis.2024.105305","DOIUrl":"10.1016/j.imavis.2024.105305","url":null,"abstract":"","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105305"},"PeriodicalIF":4.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “A method of degradation mechanism-based unsupervised remote sensing image super-resolution” [Image and Vision Computing, Vol 148 (2024), 105108] 基于退化机制的无监督遥感图像超分辨率方法"[《图像与视觉计算》,第 148 卷 (2024),105108] 更正
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.imavis.2024.105275
Zhikang Zhao , Yongcheng Wang , Ning Zhang , Yuxi Zhang , Zheng Li , Chi Chen
{"title":"Corrigendum to “A method of degradation mechanism-based unsupervised remote sensing image super-resolution” [Image and Vision Computing, Vol 148 (2024), 105108]","authors":"Zhikang Zhao , Yongcheng Wang , Ning Zhang , Yuxi Zhang , Zheng Li , Chi Chen","doi":"10.1016/j.imavis.2024.105275","DOIUrl":"10.1016/j.imavis.2024.105275","url":null,"abstract":"","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105275"},"PeriodicalIF":4.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SVC: Sight view constraint for robust point cloud registration SVC:用于稳健点云注册的视景约束
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-31 DOI: 10.1016/j.imavis.2024.105315
Yaojie Zhang , Weijun Wang , Tianlun Huang , Zhiyong Wang , Wei Feng
Partial to Partial Point Cloud Registration (partial PCR) remains a challenging task, particularly when dealing with a low overlap rate. In comparison to the full-to-full registration task, we find that the objective of partial PCR is still not well-defined, indicating no metric can reliably identify the true transformation. We identify this as the most fundamental challenge in partial PCR tasks. In this paper, instead of directly seeking the optimal transformation, we propose a novel and general Sight View Constraint (SVC) to conclusively identify incorrect transformations, thereby enhancing the robustness of existing PCR methods. Extensive experiments validate the effectiveness of SVC on both indoor and outdoor scenes. On the challenging 3DLoMatch dataset, our approach increases the registration recall from 78% to 82%, achieving the state-of-the-art result. This research also highlights the significance of the decision version problem of partial PCR, which has the potential to provide novel insights into the partial PCR problem. Code will be available at: https://github.com/pppyj-m/SVC.
部分到部分点云注册(PCR)仍然是一项具有挑战性的任务,尤其是在处理低重叠率时。与全部到全部的注册任务相比,我们发现部分 PCR 的目标仍然没有明确定义,这表明没有任何指标可以可靠地识别真正的转换。我们认为这是部分 PCR 任务中最根本的挑战。在本文中,我们没有直接寻求最优变换,而是提出了一种新颖、通用的视景约束(SVC)来确定不正确的变换,从而增强现有 PCR 方法的鲁棒性。大量实验验证了 SVC 在室内和室外场景中的有效性。在具有挑战性的 3DLoMatch 数据集上,我们的方法将注册召回率从 78% 提高到了 82%,达到了最先进的结果。这项研究还强调了部分 PCR 的决策版本问题的重要性,有可能为部分 PCR 问题提供新的见解。代码见:https://github.com/pppyj-m/SVC。
{"title":"SVC: Sight view constraint for robust point cloud registration","authors":"Yaojie Zhang ,&nbsp;Weijun Wang ,&nbsp;Tianlun Huang ,&nbsp;Zhiyong Wang ,&nbsp;Wei Feng","doi":"10.1016/j.imavis.2024.105315","DOIUrl":"10.1016/j.imavis.2024.105315","url":null,"abstract":"<div><div>Partial to Partial Point Cloud Registration (partial PCR) remains a challenging task, particularly when dealing with a low overlap rate. In comparison to the full-to-full registration task, we find that the objective of partial PCR is still not well-defined, indicating no metric can reliably identify the true transformation. We identify this as the most fundamental challenge in partial PCR tasks. In this paper, instead of directly seeking the optimal transformation, we propose a novel and general Sight View Constraint (SVC) to conclusively identify incorrect transformations, thereby enhancing the robustness of existing PCR methods. Extensive experiments validate the effectiveness of SVC on both indoor and outdoor scenes. On the challenging 3DLoMatch dataset, our approach increases the registration recall from 78% to 82%, achieving the state-of-the-art result. This research also highlights the significance of the decision version problem of partial PCR, which has the potential to provide novel insights into the partial PCR problem. Code will be available at: <span><span>https://github.com/pppyj-m/SVC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105315"},"PeriodicalIF":4.2,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions DeepArUco++:在极具挑战性的照明条件下改进方形靶标的检测
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-30 DOI: 10.1016/j.imavis.2024.105313
Rafael Berral-Soler , Rafael Muñoz-Salinas , Rafael Medina-Carnicer , Manuel J. Marín-Jiménez
Fiducial markers are a computer vision tool used for object pose estimation and detection. These markers are highly useful in fields such as industry, medicine and logistics. However, optimal lighting conditions are not always available, and other factors such as blur or sensor noise can affect image quality. Classical computer vision techniques that precisely locate and decode fiducial markers often fail under difficult illumination conditions (e.g. extreme variations of lighting within the same frame). Hence, we propose DeepArUco++, a deep learning-based framework that leverages the robustness of Convolutional Neural Networks to perform marker detection and decoding in challenging lighting conditions. The framework is based on a pipeline using different Neural Network models at each step, namely marker detection, corner refinement and marker decoding. Additionally, we propose a simple method for generating synthetic data for training the different models that compose the proposed pipeline, and we present a second, real-life dataset of ArUco markers in challenging lighting conditions used to evaluate our system. The developed method outperforms other state-of-the-art methods in such tasks and remains competitive even when testing on the datasets used to develop those methods. Code available in GitHub: https://github.com/AVAuco/deeparuco/.
靶标是一种计算机视觉工具,用于物体姿态估计和检测。这些标记在工业、医疗和物流等领域非常有用。然而,最佳照明条件并不总是存在,模糊或传感器噪声等其他因素也会影响图像质量。精确定位和解码靶标的经典计算机视觉技术在光照条件恶劣的情况下(如同一帧内光照变化极大)往往会失效。因此,我们提出了 DeepArUco++,这是一种基于深度学习的框架,利用卷积神经网络的鲁棒性,在具有挑战性的光照条件下执行标记检测和解码。该框架基于一个流水线,每个步骤都使用不同的神经网络模型,即标记检测、角细化和标记解码。此外,我们还提出了一种生成合成数据的简单方法,用于训练构成拟议流水线的不同模型,我们还展示了在具有挑战性的照明条件下 ArUco 标记的第二个真实数据集,用于评估我们的系统。所开发的方法在此类任务中的表现优于其他最先进的方法,即使在用于开发这些方法的数据集上进行测试,也能保持竞争力。代码见 GitHub:https://github.com/AVAuco/deeparuco/。
{"title":"DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions","authors":"Rafael Berral-Soler ,&nbsp;Rafael Muñoz-Salinas ,&nbsp;Rafael Medina-Carnicer ,&nbsp;Manuel J. Marín-Jiménez","doi":"10.1016/j.imavis.2024.105313","DOIUrl":"10.1016/j.imavis.2024.105313","url":null,"abstract":"<div><div>Fiducial markers are a computer vision tool used for object pose estimation and detection. These markers are highly useful in fields such as industry, medicine and logistics. However, optimal lighting conditions are not always available, and other factors such as blur or sensor noise can affect image quality. Classical computer vision techniques that precisely locate and decode fiducial markers often fail under difficult illumination conditions (e.g. extreme variations of lighting within the same frame). Hence, we propose DeepArUco++, a deep learning-based framework that leverages the robustness of Convolutional Neural Networks to perform marker detection and decoding in challenging lighting conditions. The framework is based on a pipeline using different Neural Network models at each step, namely marker detection, corner refinement and marker decoding. Additionally, we propose a simple method for generating synthetic data for training the different models that compose the proposed pipeline, and we present a second, real-life dataset of ArUco markers in challenging lighting conditions used to evaluate our system. The developed method outperforms other state-of-the-art methods in such tasks and remains competitive even when testing on the datasets used to develop those methods. Code available in GitHub: <span><span>https://github.com/AVAuco/deeparuco/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105313"},"PeriodicalIF":4.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142655973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAVE: Encoding spatial interactions for vision transformers SAVE:为视觉转换器的空间互动编码
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-30 DOI: 10.1016/j.imavis.2024.105312
Xiao Ma , Zetian Zhang , Rong Yu , Zexuan Ji , Mingchao Li , Yuhan Zhang , Qiang Chen
Transformers have achieved impressive performance in visual tasks. Position encoding, which equips vectors (elements of input tokens, queries, keys, or values) with sequence specificity, effectively alleviates the lack of permutation relation in transformers. In this work, we first clarify that both position encoding and additional position-specific operations will introduce positional information when participating in self-attention. On this basis, most existing position encoding methods are equivalent to special affine transformations. However, this encoding method lacks the correlation of vector content interaction. We further propose Spatial Aggregation Vector Encoding (SAVE) that employs transition matrices to recombine vectors. We design two simple yet effective modes to merge other vectors, with each one serving as an anchor. The aggregated vectors control spatial contextual connections by establishing two-dimensional relationships. Our SAVE can be plug-and-play in vision transformers, even with other position encoding methods. Comparative results on three image classification datasets show that the proposed SAVE performs comparably to current position encoding methods. Experiments on detection tasks show that the SAVE improves the downstream performance of transformer-based methods. Code is available at https://github.com/maxiao0234/SAVE.
变换器在视觉任务中的表现令人印象深刻。位置编码使向量(输入标记、查询、键或值的元素)具有序列特异性,有效缓解了变换器中缺乏排列关系的问题。在这项工作中,我们首先澄清,位置编码和额外的特定位置操作在参与自我关注时都会引入位置信息。在此基础上,现有的大多数位置编码方法都等同于特殊仿射变换。然而,这种编码方法缺乏向量内容交互的相关性。我们进一步提出了空间聚合矢量编码(SAVE),它采用过渡矩阵来重组矢量。我们设计了两种简单而有效的模式来合并其他矢量,每种模式都是一个锚点。聚合向量通过建立二维关系来控制空间上下文连接。我们的 SAVE 可以在视觉转换器中即插即用,甚至可以与其他位置编码方法一起使用。在三个图像分类数据集上的比较结果表明,所提出的 SAVE 的性能可与当前的位置编码方法相媲美。检测任务的实验表明,SAVE 提高了基于变换器方法的下游性能。代码见 https://github.com/maxiao0234/SAVE。
{"title":"SAVE: Encoding spatial interactions for vision transformers","authors":"Xiao Ma ,&nbsp;Zetian Zhang ,&nbsp;Rong Yu ,&nbsp;Zexuan Ji ,&nbsp;Mingchao Li ,&nbsp;Yuhan Zhang ,&nbsp;Qiang Chen","doi":"10.1016/j.imavis.2024.105312","DOIUrl":"10.1016/j.imavis.2024.105312","url":null,"abstract":"<div><div>Transformers have achieved impressive performance in visual tasks. Position encoding, which equips vectors (elements of input tokens, queries, keys, or values) with sequence specificity, effectively alleviates the lack of permutation relation in transformers. In this work, we first clarify that both position encoding and additional position-specific operations will introduce positional information when participating in self-attention. On this basis, most existing position encoding methods are equivalent to special affine transformations. However, this encoding method lacks the correlation of vector content interaction. We further propose Spatial Aggregation Vector Encoding (SAVE) that employs transition matrices to recombine vectors. We design two simple yet effective modes to merge other vectors, with each one serving as an anchor. The aggregated vectors control spatial contextual connections by establishing two-dimensional relationships. Our SAVE can be plug-and-play in vision transformers, even with other position encoding methods. Comparative results on three image classification datasets show that the proposed SAVE performs comparably to current position encoding methods. Experiments on detection tasks show that the SAVE improves the downstream performance of transformer-based methods. Code is available at <span><span>https://github.com/maxiao0234/SAVE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105312"},"PeriodicalIF":4.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3DPSR: An innovative approach for pose and shape refinement in 3D human meshes from a single 2D image 3DPSR:从单张二维图像细化三维人体网格姿态和形状的创新方法
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-30 DOI: 10.1016/j.imavis.2024.105311
Mohit Kushwaha, Jaytrilok Choudhary , Dhirendra Pratap Singh
In the era of computer vision, 3D human models are gaining a lot of interest in the gaming industry, cloth parsing, avatar creations, and many more applications. In these fields, having a precise 3D human model with accurate shape and pose is crucial for realistic and high-quality results. We proposed an approach called 3DPSR that uses a single 2D image and reconstructs precise 3D human meshes with better alignment of pose and shape. 3DPSR is referred to as 3D Pose and Shape Refinements. 3DPSR contains two modules (mesh deformation using pose-fitting and shape-fitting), in which mesh deformation using shape-fitting acts as a refinement module. Compared to existing methods, the proposed method, 3DPSR, delivers more enhanced MPVE and PA-MPJPE results, as well as more accurate 3D models of humans. 3DPSR significantly outperforms state-of-the-art human mesh reconstruction methods on challenging and standard datasets such as SURREAL, Human3.6M, and 3DPW across different scenarios with complex poses, establishing a new benchmark.
在计算机视觉时代,三维人体模型在游戏行业、布料解析、头像创建以及其他许多应用领域都受到了广泛关注。在这些领域中,拥有一个具有精确形状和姿势的三维人体模型对于获得逼真和高质量的结果至关重要。我们提出了一种名为 3DPSR 的方法,它使用单张二维图像重建精确的三维人体网格,并更好地对齐姿势和形状。3DPSR 全称为 3D Pose and Shape Refinements。3DPSR 包含两个模块(使用姿态拟合的网格变形和形状拟合),其中使用形状拟合的网格变形作为细化模块。与现有方法相比,所提出的 3DPSR 方法能提供更强的 MPVE 和 PA-MPJPE 结果,以及更精确的人体 3D 模型。在 SURREAL、Human3.6M 和 3DPW 等具有挑战性的标准数据集上,3DPSR 在不同场景和复杂姿势下的表现明显优于最先进的人体网格重建方法,树立了新的标杆。
{"title":"3DPSR: An innovative approach for pose and shape refinement in 3D human meshes from a single 2D image","authors":"Mohit Kushwaha,&nbsp;Jaytrilok Choudhary ,&nbsp;Dhirendra Pratap Singh","doi":"10.1016/j.imavis.2024.105311","DOIUrl":"10.1016/j.imavis.2024.105311","url":null,"abstract":"<div><div>In the era of computer vision, 3D human models are gaining a lot of interest in the gaming industry, cloth parsing, avatar creations, and many more applications. In these fields, having a precise 3D human model with accurate shape and pose is crucial for realistic and high-quality results. We proposed an approach called 3DPSR that uses a single 2D image and reconstructs precise 3D human meshes with better alignment of pose and shape. 3DPSR is referred to as <strong>3D P</strong>ose and <strong>S</strong>hape <strong>R</strong>efinements. 3DPSR contains two modules (mesh deformation using pose-fitting and shape-fitting), in which mesh deformation using shape-fitting acts as a refinement module. Compared to existing methods, the proposed method, 3DPSR, delivers more enhanced MPVE and PA-MPJPE results, as well as more accurate 3D models of humans. 3DPSR significantly outperforms state-of-the-art human mesh reconstruction methods on challenging and standard datasets such as SURREAL, Human3.6M, and 3DPW across different scenarios with complex poses, establishing a new benchmark.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105311"},"PeriodicalIF":4.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds CWGA-Net:用于从点云检测 3D 物体的中心加权图注意力网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-29 DOI: 10.1016/j.imavis.2024.105314
Jun Shu , Qi Wu , Liang Tan , Xinyi Shu , Fengchun Wan
The precision of 3D object detection from unevenly distributed outdoor point clouds is critical in autonomous driving perception systems. Current point-based detectors employ self-attention and graph convolution to establish contextual relationships between point clouds; however, they often introduce weakly correlated redundant information, leading to blurred geometric details and false detections. To address this issue, a novel Center-weighted Graph Attention Network (CWGA-Net) has been proposed to fuse geometric and semantic similarities for weighting cross-attention scores, thereby capturing precise fine-grained geometric features. CWGA-Net initially constructs and encodes local graphs between foreground points, establishing connections between point clouds from geometric and semantic dimensions. Subsequently, center-weighted cross-attention is utilized to compute the contextual relationships between vertices within the graph, and geometric and semantic similarities between vertices are fused to weight attention scores, thereby extracting strongly related geometric shape features. Finally, a cross-feature fusion Module is introduced to deeply fuse high and low-resolution features to compensate for the information loss during downsampling. Experiments conducted on the KITTI and Waymo datasets demonstrate that the network achieves superior detection capabilities, outperforming state-of-the-art point-based single-stage methods in terms of average precision metrics while maintaining good speed.
在自动驾驶感知系统中,从分布不均的室外点云中检测三维物体的精度至关重要。目前基于点的检测器采用自我注意和图卷积来建立点云之间的上下文关系;然而,它们经常引入弱相关的冗余信息,导致几何细节模糊和错误检测。为解决这一问题,我们提出了一种新颖的中心加权图注意网络(CWGA-Net),它融合了几何和语义相似性,用于加权交叉注意得分,从而精确捕捉细粒度几何特征。CWGA-Net 首先构建并编码前景点之间的局部图,从几何和语义两个维度建立点云之间的联系。随后,利用中心加权交叉注意力计算图中顶点之间的上下文关系,并将顶点之间的几何和语义相似性融合为加权注意力分数,从而提取出关联性强的几何形状特征。最后,还引入了交叉特征融合模块,对高分辨率和低分辨率特征进行深度融合,以弥补降采样过程中的信息损失。在 KITTI 和 Waymo 数据集上进行的实验表明,该网络实现了卓越的检测能力,在平均精度指标方面优于最先进的基于点的单级方法,同时保持了良好的速度。
{"title":"CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds","authors":"Jun Shu ,&nbsp;Qi Wu ,&nbsp;Liang Tan ,&nbsp;Xinyi Shu ,&nbsp;Fengchun Wan","doi":"10.1016/j.imavis.2024.105314","DOIUrl":"10.1016/j.imavis.2024.105314","url":null,"abstract":"<div><div>The precision of 3D object detection from unevenly distributed outdoor point clouds is critical in autonomous driving perception systems. Current point-based detectors employ self-attention and graph convolution to establish contextual relationships between point clouds; however, they often introduce weakly correlated redundant information, leading to blurred geometric details and false detections. To address this issue, a novel Center-weighted Graph Attention Network (CWGA-Net) has been proposed to fuse geometric and semantic similarities for weighting cross-attention scores, thereby capturing precise fine-grained geometric features. CWGA-Net initially constructs and encodes local graphs between foreground points, establishing connections between point clouds from geometric and semantic dimensions. Subsequently, center-weighted cross-attention is utilized to compute the contextual relationships between vertices within the graph, and geometric and semantic similarities between vertices are fused to weight attention scores, thereby extracting strongly related geometric shape features. Finally, a cross-feature fusion Module is introduced to deeply fuse high and low-resolution features to compensate for the information loss during downsampling. Experiments conducted on the KITTI and Waymo datasets demonstrate that the network achieves superior detection capabilities, outperforming state-of-the-art point-based single-stage methods in terms of average precision metrics while maintaining good speed.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105314"},"PeriodicalIF":4.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Occlusion-related graph convolutional neural network for multi-object tracking 用于多目标跟踪的遮挡相关图卷积神经网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-28 DOI: 10.1016/j.imavis.2024.105317
Yubo Zhang , Liying Zheng , Qingming Huang
Multi-Object Tracking (MOT) has recently been improved by Graph Convolutional Neural Networks (GCNNs) for its good performance in characterizing interactive features. However, GCNNs prefer assigning smaller proportions to node features if a node has more neighbors, presenting challenges in distinguishing objects with similar neighbors which is common in dense scenes. This paper designs an Occlusion-Related GCNN (OR-GCNN) based on which an interactive similarity module is further built. Specifically, the interactive similarity module first uses learnable weights to calculate the edge weights between tracklets and detection objects, which balances the appearance cosine similarity and Intersection over Union (IoU). Then, the module determines the proportion of node features with the help of an occlusion weight comes from a MultiLayer Perceptron (MLP). These occlusion weights, the edge weights, and the node features are then served to our OR-GCNN to obtain interactive features. Finally, by integrating interactive similarity into a common MOT framework, such as BoT-SORT, one gets a tracker that efficiently alleviates the issues in dense MOT task. The experimental results on MOT16 and MOT17 benchmarks show that our model achieves the MOTA of 80.6 and 81.1 and HOTA of 65.3 and 65.1 on MOT16 and MOT17, respectively, which outperforms the state-of-the-art trackers, including ByteTrack, BoT-SORT, GCNNMatch, GNMOT, and GSM.
最近,图形卷积神经网络(GCNN)对多目标跟踪(MOT)进行了改进,因为它在描述交互特征方面表现出色。然而,如果一个节点有更多的邻居,GCNNs 会倾向于为节点特征分配较小的比例,这给区分具有相似邻居的物体带来了挑战,而这在密集场景中很常见。本文设计了一种与遮挡相关的 GCNN(OR-GCNN),并在此基础上进一步构建了交互式相似性模块。具体来说,交互式相似性模块首先使用可学习权重来计算小轨迹和检测对象之间的边缘权重,从而平衡外观余弦相似性和交集大于联合(IoU)。然后,该模块借助多层感知器(MLP)产生的闭塞权重确定节点特征的比例。然后,这些闭塞权重、边缘权重和节点特征将提供给我们的 OR-GCNN 以获得交互特征。最后,通过将交互式相似性集成到一个通用的 MOT 框架(如 BoT-SORT)中,我们得到了一个跟踪器,它能有效缓解密集 MOT 任务中的问题。在 MOT16 和 MOT17 基准上的实验结果表明,我们的模型在 MOT16 和 MOT17 上的 MOTA 分别为 80.6 和 81.1,HOTA 分别为 65.3 和 65.1,优于 ByteTrack、BoT-SORT、GCNNMatch、GNMOT 和 GSM 等最先进的跟踪器。
{"title":"Occlusion-related graph convolutional neural network for multi-object tracking","authors":"Yubo Zhang ,&nbsp;Liying Zheng ,&nbsp;Qingming Huang","doi":"10.1016/j.imavis.2024.105317","DOIUrl":"10.1016/j.imavis.2024.105317","url":null,"abstract":"<div><div>Multi-Object Tracking (MOT) has recently been improved by Graph Convolutional Neural Networks (GCNNs) for its good performance in characterizing interactive features. However, GCNNs prefer assigning smaller proportions to node features if a node has more neighbors, presenting challenges in distinguishing objects with similar neighbors which is common in dense scenes. This paper designs an Occlusion-Related GCNN (OR-GCNN) based on which an interactive similarity module is further built. Specifically, the interactive similarity module first uses learnable weights to calculate the edge weights between tracklets and detection objects, which balances the appearance cosine similarity and Intersection over Union (IoU). Then, the module determines the proportion of node features with the help of an occlusion weight comes from a MultiLayer Perceptron (MLP). These occlusion weights, the edge weights, and the node features are then served to our OR-GCNN to obtain interactive features. Finally, by integrating interactive similarity into a common MOT framework, such as BoT-SORT, one gets a tracker that efficiently alleviates the issues in dense MOT task. The experimental results on MOT16 and MOT17 benchmarks show that our model achieves the MOTA of 80.6 and 81.1 and HOTA of 65.3 and 65.1 on MOT16 and MOT17, respectively, which outperforms the state-of-the-art trackers, including ByteTrack, BoT-SORT, GCNNMatch, GNMOT, and GSM.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105317"},"PeriodicalIF":4.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-label classification method based on transformer for deepfake detection 基于变压器的多标签分类方法用于深度伪造检测
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-28 DOI: 10.1016/j.imavis.2024.105319
Liwei Deng , Yunlong Zhu , Dexu Zhao , Fei Chen
With the continuous development of hardware and deep learning technologies, existing forgery techniques are capable of more refined facial manipulations, making detection tasks increasingly challenging. Therefore, forgery detection cannot be viewed merely as a traditional binary classification task. To achieve finer forgery detection, we propose a method based on multi-label detection classification capable of identifying the presence of forgery in multiple facial components. Initially, the dataset undergoes preprocessing to meet the requirements of this task. Subsequently, we introduce a Detail-Enhancing Attention Module into the network to amplify subtle forgery traces in shallow feature maps and enhance the network's feature extraction capabilities. Additionally, we employ a Global–Local Transformer Decoder to improve the network's ability to focus on local information. Finally, extensive experiments demonstrate that our approach achieves 92.45% mAP and 90.23% mAUC, enabling precise detection of facial components in images, thus validating the effectiveness of our proposed method.
随着硬件和深度学习技术的不断发展,现有的伪造技术能够对面部进行更精细的处理,使检测任务变得越来越具有挑战性。因此,赝品检测不能仅仅被视为传统的二元分类任务。为了实现更精细的伪造检测,我们提出了一种基于多标签检测分类的方法,能够识别多个面部组件中是否存在伪造。首先,对数据集进行预处理,以满足这项任务的要求。随后,我们在网络中引入了细节增强注意模块,以放大浅层特征图中细微的伪造痕迹,增强网络的特征提取能力。此外,我们还采用了全局-局部变换解码器,以提高网络关注局部信息的能力。最后,大量实验证明,我们的方法实现了 92.45% 的 mAP 和 90.23% 的 mAUC,能够精确检测图像中的面部成分,从而验证了我们所提方法的有效性。
{"title":"A multi-label classification method based on transformer for deepfake detection","authors":"Liwei Deng ,&nbsp;Yunlong Zhu ,&nbsp;Dexu Zhao ,&nbsp;Fei Chen","doi":"10.1016/j.imavis.2024.105319","DOIUrl":"10.1016/j.imavis.2024.105319","url":null,"abstract":"<div><div>With the continuous development of hardware and deep learning technologies, existing forgery techniques are capable of more refined facial manipulations, making detection tasks increasingly challenging. Therefore, forgery detection cannot be viewed merely as a traditional binary classification task. To achieve finer forgery detection, we propose a method based on multi-label detection classification capable of identifying the presence of forgery in multiple facial components. Initially, the dataset undergoes preprocessing to meet the requirements of this task. Subsequently, we introduce a Detail-Enhancing Attention Module into the network to amplify subtle forgery traces in shallow feature maps and enhance the network's feature extraction capabilities. Additionally, we employ a Global–Local Transformer Decoder to improve the network's ability to focus on local information. Finally, extensive experiments demonstrate that our approach achieves 92.45% mAP and 90.23% mAUC, enabling precise detection of facial components in images, thus validating the effectiveness of our proposed method.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105319"},"PeriodicalIF":4.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MinoritySalMix and adaptive semantic weight compensation for long-tailed classification 用于长尾分类的 MinoritySalMix 和自适应语义权重补偿
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-25 DOI: 10.1016/j.imavis.2024.105307
Wu Zeng, Zheng-ying Xiao
In real-world datasets, the widespread presence of a long-tailed distribution often leads models to become overly biased towards majority class samples while ignoring minority class samples. We propose a strategy called MASW (MinoritySalMix and adaptive semantic weight compensation) to improve this problem. First, we propose a data augmentation method called MinoritySalMix (minority-saliency-mixing), which uses significance detection techniques to select significant regions from minority class samples as cropping regions and paste them into the same regions of majority class samples to generate brand new samples, thereby amplifying images containing important regions of minority class samples. Second, in order to make the label value information of the newly generated samples more consistent with the image content of the newly generated samples, we propose an adaptive semantic compensation factor. This factor provides more label value compensation for minority samples based on the different cropping areas, thereby making the new label values closer to the content of the newly generated samples. Improve model performance by generating more accurate new label value information. Finally, considering that some current re-sampling strategies generally lack flexibility in handling class sampling weight allocation and frequently require manual adjustment. We designed an adaptive weight function and incorporated it into the re-sampling strategy to achieve better sampling. The experimental results on three long-tailed datasets show that our method can effectively improve the performance of the model and is superior to most advanced long-tailed methods. Furthermore, we extended MinoritySalMix’s strategy to three balanced datasets for experimentation, and the results indicated that our method surpassed several advanced data augmentation techniques.
在现实世界的数据集中,长尾分布的广泛存在往往会导致模型过度偏向多数类样本,而忽略少数类样本。我们提出了一种名为 MASW(MinoritySalMix 和自适应语义权重补偿)的策略来改善这一问题。首先,我们提出了一种名为 MinoritySalMix(少数-稀释-混合)的数据扩增方法,该方法利用显著性检测技术从少数类样本中选取重要区域作为裁剪区域,并将其粘贴到多数类样本的相同区域中生成全新样本,从而放大包含少数类样本重要区域的图像。其次,为了使新生成样本的标签值信息与新生成样本的图像内容更加一致,我们提出了自适应语义补偿因子。该因子根据不同的裁剪区域为少数族群样本提供更多的标签值补偿,从而使新标签值更接近新生成样本的内容。通过生成更准确的新标签值信息来提高模型性能。最后,考虑到目前的一些再采样策略在处理类采样权重分配时普遍缺乏灵活性,经常需要人工调整。我们设计了一种自适应权重函数,并将其纳入再采样策略,以实现更好的采样效果。在三个长尾数据集上的实验结果表明,我们的方法能有效提高模型的性能,并优于大多数先进的长尾方法。此外,我们还将 MinoritySalMix 的策略扩展到三个平衡数据集上进行实验,结果表明我们的方法超越了几种先进的数据增强技术。
{"title":"MinoritySalMix and adaptive semantic weight compensation for long-tailed classification","authors":"Wu Zeng,&nbsp;Zheng-ying Xiao","doi":"10.1016/j.imavis.2024.105307","DOIUrl":"10.1016/j.imavis.2024.105307","url":null,"abstract":"<div><div>In real-world datasets, the widespread presence of a long-tailed distribution often leads models to become overly biased towards majority class samples while ignoring minority class samples. We propose a strategy called MASW (MinoritySalMix and adaptive semantic weight compensation) to improve this problem. First, we propose a data augmentation method called MinoritySalMix (minority-saliency-mixing), which uses significance detection techniques to select significant regions from minority class samples as cropping regions and paste them into the same regions of majority class samples to generate brand new samples, thereby amplifying images containing important regions of minority class samples. Second, in order to make the label value information of the newly generated samples more consistent with the image content of the newly generated samples, we propose an adaptive semantic compensation factor. This factor provides more label value compensation for minority samples based on the different cropping areas, thereby making the new label values closer to the content of the newly generated samples. Improve model performance by generating more accurate new label value information. Finally, considering that some current re-sampling strategies generally lack flexibility in handling class sampling weight allocation and frequently require manual adjustment. We designed an adaptive weight function and incorporated it into the re-sampling strategy to achieve better sampling. The experimental results on three long-tailed datasets show that our method can effectively improve the performance of the model and is superior to most advanced long-tailed methods. Furthermore, we extended MinoritySalMix’s strategy to three balanced datasets for experimentation, and the results indicated that our method surpassed several advanced data augmentation techniques.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105307"},"PeriodicalIF":4.2,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Image and Vision Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1