首页 > 最新文献

Image and Vision Computing最新文献

英文 中文
MVD-NeRF: Multi-View Deblurring Neural Radiance Fields from Defocused Images MVD-NeRF:多视图去模糊神经辐射场从散焦图像
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-17 DOI: 10.1016/j.imavis.2025.105876
Zhenyu Yin , Xiaohui Wang , Feiqing Zhang , Xiaoqiang Shi , Dan Feng
Neural Radiance Fields (NeRF) have demonstrated exceptional three-dimensional (3D) reconstruction quality by synthesizing novel views from multi-view images. However, NeRF algorithms typically require clear and static images to function effectively, and little attention has been given to suboptimal scenarios involving noise such as reflections and blur. Although blurred images are common in real-world situations, few studies have explored NeRF for handling blur, particularly defocus blur. Correctly simulating the formation of defocus blur is the key to deblurring and helps to accurately synthesize new perspectives from blurred images. Therefore, this paper proposes a Multi-View Deblurring Neural Radiance Fields from Defocused Images (MVD-NeRF), a framework for 3D reconstruction from defocus-blurred images. The framework ensures consistency in 3D geometry and appearance by modeling the formation of defocus blur. MVD-NeRF introduces the Defocus Modeling Approach (DMA), a novel method for simulating defocused scenes. When the view is fixed, DMA assumes that the pixel is rendered by multiple rays emitted from the same light source. Additionally, MVD-NeRF proposes a new Multi-view Panning Algorithm (MPA), which simulates light source movement through slight shifts in the camera center across different views, thereby generating blur effects similar to those in real photography. Together, DMA and MPA enhance MVD-NeRF’s ability to capture intricate scene details. Our experimental results validate that MVD-NeRF achieves significant improvements in Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). The source code for MVD-NeRF is available at the following URL: https://github.com/luckhui0505/MVD-NeRF.
神经辐射场(NeRF)通过从多视图图像合成新视图,展示了卓越的三维(3D)重建质量。然而,NeRF算法通常需要清晰和静态的图像才能有效地工作,并且很少关注涉及反射和模糊等噪声的次优场景。虽然模糊的图像在现实世界中很常见,但很少有研究探索NeRF处理模糊,特别是散焦模糊。正确模拟离焦模糊的形成是去模糊的关键,有助于从模糊图像中准确合成新的视角。为此,本文提出了一种基于离焦图像的多视图去模糊神经辐射场(MVD-NeRF),这是一种从离焦模糊图像中进行三维重建的框架。该框架通过建模散焦模糊的形成来确保3D几何形状和外观的一致性。MVD-NeRF引入了散焦建模方法(DMA),这是一种模拟散焦场景的新方法。当视图固定时,DMA假定像素是由同一光源发出的多条光线渲染的。此外,MVD-NeRF提出了一种新的多视图平移算法(MPA),该算法通过相机中心在不同视图中的轻微移动来模拟光源的移动,从而产生类似于真实摄影中的模糊效果。DMA和MPA一起增强了MVD-NeRF捕捉复杂场景细节的能力。我们的实验结果验证了MVD-NeRF在峰值信噪比(PSNR)、结构相似指数测量(SSIM)和学习感知图像斑块相似度(LPIPS)方面取得了显着改善。MVD-NeRF的源代码可从以下网址获得:https://github.com/luckhui0505/MVD-NeRF。
{"title":"MVD-NeRF: Multi-View Deblurring Neural Radiance Fields from Defocused Images","authors":"Zhenyu Yin ,&nbsp;Xiaohui Wang ,&nbsp;Feiqing Zhang ,&nbsp;Xiaoqiang Shi ,&nbsp;Dan Feng","doi":"10.1016/j.imavis.2025.105876","DOIUrl":"10.1016/j.imavis.2025.105876","url":null,"abstract":"<div><div>Neural Radiance Fields (NeRF) have demonstrated exceptional three-dimensional (3D) reconstruction quality by synthesizing novel views from multi-view images. However, NeRF algorithms typically require clear and static images to function effectively, and little attention has been given to suboptimal scenarios involving noise such as reflections and blur. Although blurred images are common in real-world situations, few studies have explored NeRF for handling blur, particularly defocus blur. Correctly simulating the formation of defocus blur is the key to deblurring and helps to accurately synthesize new perspectives from blurred images. Therefore, this paper proposes a Multi-View Deblurring Neural Radiance Fields from Defocused Images (MVD-NeRF), a framework for 3D reconstruction from defocus-blurred images. The framework ensures consistency in 3D geometry and appearance by modeling the formation of defocus blur. MVD-NeRF introduces the Defocus Modeling Approach (DMA), a novel method for simulating defocused scenes. When the view is fixed, DMA assumes that the pixel is rendered by multiple rays emitted from the same light source. Additionally, MVD-NeRF proposes a new Multi-view Panning Algorithm (MPA), which simulates light source movement through slight shifts in the camera center across different views, thereby generating blur effects similar to those in real photography. Together, DMA and MPA enhance MVD-NeRF’s ability to capture intricate scene details. Our experimental results validate that MVD-NeRF achieves significant improvements in Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). The source code for MVD-NeRF is available at the following URL: <span><span>https://github.com/luckhui0505/MVD-NeRF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"166 ","pages":"Article 105876"},"PeriodicalIF":4.2,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical texture-aware image inpainting via contextual attention and multi-scale fusion 基于上下文关注和多尺度融合的分层纹理感知图像绘画
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.imavis.2025.105875
Runing Li , Jiangyan Dai , Qibing Qin , Chengduan Wang , Yugen Yi , Jianzhong Wang
Image inpainting aims to restore missing regions in images with visually coherent and semantically plausible content. Although deep learning methods have achieved significant progress, current approaches still face challenges in handling large-area image inpainting tasks, often producing blurred textures or structurally inconsistent results. These limitations primarily stem from the insufficient exploitation of long-range dependencies and inadequate texture priors. To address these issues, we propose a novel two-stage image inpainting framework that integrates multi-directional texture priors with contextual information. In the first stage, we extract rich texture features from corrupted images using Gabor filters, which simulate human visual perception. These features are then fused to guide a texture inpainting network, where a Multi-Scale Dense Skip Connection (MSDSC) module is introduced to bridge semantic gaps across different feature levels. In the second stage, we design a hierarchical texture-aware guided image completion network that utilizes the repaired textures as auxiliary guidance. Specifically, a contextual attention module is incorporated to capture long-range spatial dependencies and enhance structural consistency. Extensive experiments conducted on three challenging benchmarks, such as CelebA-HQ, Places2, and Paris Street View, demonstrate that our method outperforms existing state-of-the-art approaches in both quantitative metrics and visual quality. The proposed framework significantly improves the realism and coherence of inpainting results, particularly for images with large missing regions or complex textures. The code is available at https://github.com/Runing-Lab/HTA2I.git.
图像修复的目的是恢复图像中缺失的区域,使其具有视觉连贯和语义可信的内容。尽管深度学习方法已经取得了重大进展,但目前的方法在处理大面积图像绘制任务时仍然面临挑战,通常会产生模糊的纹理或结构不一致的结果。这些限制主要源于对长期依赖关系的开发不足和不充分的纹理先验。为了解决这些问题,我们提出了一种新的两阶段图像绘制框架,该框架将多向纹理先验与上下文信息相结合。在第一阶段,我们使用Gabor滤波器从损坏的图像中提取丰富的纹理特征,模拟人类的视觉感知。然后将这些特征融合到纹理绘制网络中,其中引入了多尺度密集跳过连接(MSDSC)模块来弥合不同特征级别之间的语义差距。在第二阶段,我们设计了一个分层的纹理感知引导图像补全网络,利用修复后的纹理作为辅助引导。具体地说,我们采用了一个上下文注意模块来捕捉远程空间依赖性和增强结构一致性。在三个具有挑战性的基准测试(如CelebA-HQ、Places2和巴黎街景)上进行的大量实验表明,我们的方法在定量指标和视觉质量方面都优于现有的最先进的方法。提出的框架显著提高了绘制结果的真实感和一致性,特别是对于具有大缺失区域或复杂纹理的图像。代码可在https://github.com/Runing-Lab/HTA2I.git上获得。
{"title":"Hierarchical texture-aware image inpainting via contextual attention and multi-scale fusion","authors":"Runing Li ,&nbsp;Jiangyan Dai ,&nbsp;Qibing Qin ,&nbsp;Chengduan Wang ,&nbsp;Yugen Yi ,&nbsp;Jianzhong Wang","doi":"10.1016/j.imavis.2025.105875","DOIUrl":"10.1016/j.imavis.2025.105875","url":null,"abstract":"<div><div>Image inpainting aims to restore missing regions in images with visually coherent and semantically plausible content. Although deep learning methods have achieved significant progress, current approaches still face challenges in handling large-area image inpainting tasks, often producing blurred textures or structurally inconsistent results. These limitations primarily stem from the insufficient exploitation of long-range dependencies and inadequate texture priors. To address these issues, we propose a novel two-stage image inpainting framework that integrates multi-directional texture priors with contextual information. In the first stage, we extract rich texture features from corrupted images using Gabor filters, which simulate human visual perception. These features are then fused to guide a texture inpainting network, where a Multi-Scale Dense Skip Connection (MSDSC) module is introduced to bridge semantic gaps across different feature levels. In the second stage, we design a hierarchical texture-aware guided image completion network that utilizes the repaired textures as auxiliary guidance. Specifically, a contextual attention module is incorporated to capture long-range spatial dependencies and enhance structural consistency. Extensive experiments conducted on three challenging benchmarks, such as CelebA-HQ, Places2, and Paris Street View, demonstrate that our method outperforms existing state-of-the-art approaches in both quantitative metrics and visual quality. The proposed framework significantly improves the realism and coherence of inpainting results, particularly for images with large missing regions or complex textures. The code is available at <span><span>https://github.com/Runing-Lab/HTA2I.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105875"},"PeriodicalIF":4.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient 6DoF pose estimation for multi-instance objects from a single image 单幅图像中多实例对象的高效6DoF姿态估计
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.imavis.2025.105882
Wen-Nung Lie , Lee Aing
Estimating 6 degrees of freedom poses for multiple objects from a single image and making it practical in industry is difficult since several metrics, like accuracy, speed and complexity must be traded. This study adopts a fast bottom-up approach to estimate poses for multi-instance objects in an image simultaneously. We design a convolutional neural network with simple end-to-end training to output 4 feature maps: error mask, semantic mask, center vector map and 6D coordinate map (6DCM). Specifically, 6DCM is capable of providing the rear-side 3D object point clouds information that are originally invisible from the camera's viewpoint. This procedure enriches shape information about target objects which can be used to construct each instance's 2D-3D correspondences for pose parameter estimation. Experimental results show that our proposed bottom-up approach is fast and can process a single image containing 7 objects at 25 frames per second with competitive accuracy to other top-down methods.
从一张图像中估计多个物体的6个自由度姿势并使其在工业中实用是很困难的,因为必须权衡精度、速度和复杂性等几个指标。本研究采用快速自底向上的方法同时估计图像中多实例物体的姿态。我们设计了一个简单的端到端训练卷积神经网络,输出4个特征图:错误掩码、语义掩码、中心向量图和6D坐标图(6DCM)。具体来说,6DCM能够提供最初从相机视点看不见的后侧3D物体点云信息。该过程丰富了目标物体的形状信息,可用于构建每个实例的2D-3D对应关系,用于姿态参数估计。实验结果表明,我们提出的自底向上方法速度快,可以以每秒25帧的速度处理包含7个物体的单幅图像,并且精度与其他自顶向下方法相当。
{"title":"Efficient 6DoF pose estimation for multi-instance objects from a single image","authors":"Wen-Nung Lie ,&nbsp;Lee Aing","doi":"10.1016/j.imavis.2025.105882","DOIUrl":"10.1016/j.imavis.2025.105882","url":null,"abstract":"<div><div>Estimating 6 degrees of freedom poses for multiple objects from a single image and making it practical in industry is difficult since several metrics, like accuracy, speed and complexity must be traded. This study adopts a fast bottom-up approach to estimate poses for multi-instance objects in an image simultaneously. We design a convolutional neural network with simple end-to-end training to output 4 feature maps: error mask, semantic mask, center vector map and 6D coordinate map (6DCM). Specifically, 6DCM is capable of providing the rear-side 3D object point clouds information that are originally invisible from the camera's viewpoint. This procedure enriches shape information about target objects which can be used to construct each instance's 2D-3D correspondences for pose parameter estimation. Experimental results show that our proposed bottom-up approach is fast and can process a single image containing 7 objects at 25 frames per second with competitive accuracy to other top-down methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"166 ","pages":"Article 105882"},"PeriodicalIF":4.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HEL-Net: Heterogeneous Ensemble Learning for comprehensive diabetic retinopathy multi-lesion segmentation via Mamba-UNet hell - net:基于Mamba-UNet的糖尿病视网膜病变多病灶分割的异构集成学习
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.imavis.2025.105879
Lingyu Wu , Haiying Xia , Shuxiang Song , Yang Lan
Diabetic Retinopathy (DR) is the leading cause of blindness in adults with diabetes. Early automated detection of DR lesions is crucial for preventing vision loss and assisting ophthalmologists in treatment. However, accurately segmenting multiple types of DR lesions poses significant challenges due to their large diversity in size, shape, and location, as well as the conflict in feature modeling between local details and long-range dependencies. To address these issues, we propose a novel Heterogeneous Ensemble Learning Network (HEL-Net) specifically designed for four-lesion segmentation. HEL-Net comprises two ensemble stages: the first stage utilizes Mamba-UNet to generate coarse multi-lesion prediction results, which serve as contextual priors for the second stage, forming a multi-perspective lesion navigation strategy. The second stage employs a heterogeneous structure, integrating specialized networks (Mamba-UNet and U-Net) tailored to different lesion characteristics. Mamba-UNet excels in capturing large lesions by modeling long-range dependencies, while U-Net focuses on small lesions with significant local features. The heterogeneous ensemble framework leverages their complementary strengths to promote comprehensive lesion feature learning. Extensive quantitative and qualitative evaluations on two public datasets (IDRiD and DDR) demonstrate that our HEL-Net achieves competitive performance compared to state-of-the-art methods, achieving an mAUPR of 69.52%, mDice of 67.40%, and mIoU of 51.99% on the IDRiD dataset.
糖尿病视网膜病变(DR)是导致成人糖尿病患者失明的主要原因。早期自动检测DR病变对于预防视力丧失和协助眼科医生治疗至关重要。然而,由于多种类型的DR病变在大小、形状和位置上的巨大差异,以及局部细节和远程依赖关系之间的特征建模冲突,准确分割多种类型的DR病变带来了巨大的挑战。为了解决这些问题,我们提出了一种新的异构集成学习网络(hell - net),专门用于四病灶分割。HEL-Net包括两个集成阶段:第一阶段利用Mamba-UNet生成粗糙的多病变预测结果,作为第二阶段的上下文先验,形成多视角病变导航策略。第二阶段采用异质结构,整合针对不同病变特征定制的专用网络(Mamba-UNet和U-Net)。Mamba-UNet擅长通过建模远程依赖关系来捕获大型病变,而U-Net则专注于具有重要局部特征的小病变。异构集成框架利用它们的互补优势,促进全面的病变特征学习。在两个公共数据集(IDRiD和DDR)上进行的大量定量和定性评估表明,与最先进的方法相比,我们的hell - net的性能具有竞争力,在IDRiD数据集上实现了69.52%的mAUPR, 67.40%的mdevice和51.99%的mIoU。
{"title":"HEL-Net: Heterogeneous Ensemble Learning for comprehensive diabetic retinopathy multi-lesion segmentation via Mamba-UNet","authors":"Lingyu Wu ,&nbsp;Haiying Xia ,&nbsp;Shuxiang Song ,&nbsp;Yang Lan","doi":"10.1016/j.imavis.2025.105879","DOIUrl":"10.1016/j.imavis.2025.105879","url":null,"abstract":"<div><div>Diabetic Retinopathy (DR) is the leading cause of blindness in adults with diabetes. Early automated detection of DR lesions is crucial for preventing vision loss and assisting ophthalmologists in treatment. However, accurately segmenting multiple types of DR lesions poses significant challenges due to their large diversity in size, shape, and location, as well as the conflict in feature modeling between local details and long-range dependencies. To address these issues, we propose a novel Heterogeneous Ensemble Learning Network (<em>HEL-Net</em>) specifically designed for four-lesion segmentation. HEL-Net comprises two ensemble stages: the first stage utilizes Mamba-UNet to generate coarse multi-lesion prediction results, which serve as contextual priors for the second stage, forming a multi-perspective lesion navigation strategy. The second stage employs a heterogeneous structure, integrating specialized networks (Mamba-UNet and U-Net) tailored to different lesion characteristics. Mamba-UNet excels in capturing large lesions by modeling long-range dependencies, while U-Net focuses on small lesions with significant local features. The heterogeneous ensemble framework leverages their complementary strengths to promote comprehensive lesion feature learning. Extensive quantitative and qualitative evaluations on two public datasets (IDRiD and DDR) demonstrate that our HEL-Net achieves competitive performance compared to state-of-the-art methods, achieving an mAUPR of 69.52%, mDice of 67.40%, and mIoU of 51.99% on the IDRiD dataset.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"166 ","pages":"Article 105879"},"PeriodicalIF":4.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PASS: Peer-agreement based sample selection for training with instance dependent noisy labels PASS:基于同行协议的样本选择,用于与实例相关的噪声标签的训练
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.imavis.2025.105877
Arpit Garg , Cuong Nguyen , Rafael Felix , Thanh-Toan Do , Gustavo Carneiro
Deep learning encounters significant challenges in the form of noisy-label samples, which can cause the overfitting of trained models. A primary challenge in learning with noisy-label (LNL) techniques is their ability to differentiate between hard samples (clean-label samples near the decision boundary) and instance-dependent noisy (IDN) label samples to allow these samples to be treated differently during training. Existing methodologies to identify IDN samples, including the small-loss hypothesis and feature-based selection, have demonstrated limited efficacy, thus impeding their effectiveness in dealing with real-world label noise. We present Peer-Agreement-based Sample Selection (PASS), a novel approach that utilises three classifiers, where a consensus-driven agreement between two models accurately differentiates between clean and noisy-label IDN samples to train the third model. In contrast to current techniques, PASS is specifically designed to address the complexities of IDN, where noise patterns are correlated with instance features. Our approach seamlessly integrates with existing LNL algorithms to enhance the accuracy of detecting both noisy and clean samples. Comprehensive experiments conducted on simulated benchmarks (CIFAR-100 and Red mini-ImageNet) and real-world datasets (Animal-10N, CIFAR-N, Clothing1M, and mini-WebVision) demonstrated that PASS substantially improved the performance of multiple state-of-the-art methods. This technique achieves superior classification accuracy, particularly in scenarios with high noise levels.1
深度学习在噪声标签样本方面遇到了重大挑战,这可能导致训练模型的过拟合。使用噪声标签(LNL)技术学习的主要挑战是区分硬样本(靠近决策边界的干净标签样本)和实例相关噪声(IDN)标签样本的能力,以便在训练期间对这些样本进行不同的处理。现有的识别IDN样本的方法,包括小损失假设和基于特征的选择,已经证明了有限的有效性,从而阻碍了它们在处理现实世界标签噪声方面的有效性。我们提出了基于同行协议的样本选择(PASS),这是一种利用三个分类器的新方法,其中两个模型之间的共识驱动协议准确区分干净和噪声标签的IDN样本,以训练第三个模型。与目前的技术相比,PASS是专门为解决IDN的复杂性而设计的,其中噪声模式与实例特征相关。我们的方法与现有的LNL算法无缝集成,以提高检测噪声和干净样本的准确性。在模拟基准测试(CIFAR-100和Red mini-ImageNet)和真实数据集(Animal-10N、CIFAR-N、Clothing1M和mini-WebVision)上进行的综合实验表明,PASS大大提高了多种最先进方法的性能。该技术实现了更高的分类精度,特别是在高噪声水平的情况下
{"title":"PASS: Peer-agreement based sample selection for training with instance dependent noisy labels","authors":"Arpit Garg ,&nbsp;Cuong Nguyen ,&nbsp;Rafael Felix ,&nbsp;Thanh-Toan Do ,&nbsp;Gustavo Carneiro","doi":"10.1016/j.imavis.2025.105877","DOIUrl":"10.1016/j.imavis.2025.105877","url":null,"abstract":"<div><div>Deep learning encounters significant challenges in the form of noisy-label samples, which can cause the overfitting of trained models. A primary challenge in learning with noisy-label (LNL) techniques is their ability to differentiate between hard samples (clean-label samples near the decision boundary) and instance-dependent noisy (IDN) label samples to allow these samples to be treated differently during training. Existing methodologies to identify IDN samples, including the small-loss hypothesis and feature-based selection, have demonstrated limited efficacy, thus impeding their effectiveness in dealing with real-world label noise. We present Peer-Agreement-based Sample Selection (PASS), a novel approach that utilises three classifiers, where a consensus-driven agreement between two models accurately differentiates between clean and noisy-label IDN samples to train the third model. In contrast to current techniques, PASS is specifically designed to address the complexities of IDN, where noise patterns are correlated with instance features. Our approach seamlessly integrates with existing LNL algorithms to enhance the accuracy of detecting both noisy and clean samples. Comprehensive experiments conducted on simulated benchmarks (CIFAR-100 and Red mini-ImageNet) and real-world datasets (Animal-10N, CIFAR-N, Clothing1M, and mini-WebVision) demonstrated that PASS substantially improved the performance of multiple state-of-the-art methods. This technique achieves superior classification accuracy, particularly in scenarios with high noise levels.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"166 ","pages":"Article 105877"},"PeriodicalIF":4.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When Mamba meets CNN: A hybrid architecture for skin lesion segmentation 当曼巴遇到CNN:皮肤病变分割的混合架构
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.imavis.2025.105880
Yun Xiao , Caijuan Shi , Jinghao Jia , Ao Cai , Yinan Zhang , Meiwen Zhang
As an important means of computer-aided diagnosis and treatment of skin cancer, skin lesion segmentation has recently received extensive research based on different deep models. Because of the limitations of each single deep model, the hybrid architectures, especially the Mamba–CNN based methods, have become a research hotspot. However, the segmentation accuracy of existing Mamba–CNN methods is still limited, especially for lesions of varying sizes and blurry boundaries, meanwhile, the model’s computational complexity remains high. Therefore, to address these issues and improve the skin lesion segmentation performance, we propose a new Mamba–CNN based model, named Feature Fusion and Boundary Awareness Visual Mamba (FFBA-VM). Specifically, the designed Multi-scale Hybrid Attention Interaction (MHAI) module can enhance the multi-scale feature representation with the powerful capability of long-range dependency modeling to obtain rich local and global information. The designed Region Localization and Boundary Enhancement (RLBE) module can effectively explore the local information to alleviate inaccurate skin lesion localization and boundary blurring. The Lightweight Visual State Space (LVSS) module is designed to reduce the model’s computational complexity. Extensive experiments are conducted on four datasets, and our model FFBA-VM effectively boosts the segmentation accuracy in terms of multiple evaluation metrics. For example, FFBA-VM achieves mIoU and DSC of 80.28% and 89.06% on the ISIC17 dataset, and reaches mIoU and DSC of 80.47% and 89.17% on the ISIC18 dataset. The experimental results indicate that our proposed FFBA-VM can outperform the existing state-of-the-art methods, validating its effectiveness and practicality for skin lesion segmentation.
作为计算机辅助皮肤癌诊断和治疗的重要手段,基于不同深度模型的皮肤病灶分割近年来得到了广泛的研究。由于单个深度模型的局限性,混合结构,特别是基于Mamba-CNN的方法成为研究热点。然而,现有的Mamba-CNN方法的分割精度仍然有限,特别是对于大小不一、边界模糊的病变,同时模型的计算复杂度仍然很高。因此,为了解决这些问题并提高皮肤损伤分割性能,我们提出了一种新的基于Mamba - cnn的模型,命名为Feature Fusion and Boundary Awareness Visual Mamba (FFBA-VM)。具体而言,设计的多尺度混合注意交互(MHAI)模块可以增强多尺度特征表示,具有强大的远程依赖建模能力,从而获得丰富的局部和全局信息。设计的区域定位和边界增强(RLBE)模块可以有效地挖掘局部信息,以缓解皮肤病变定位不准确和边界模糊的问题。轻量级视觉状态空间(LVSS)模块旨在降低模型的计算复杂度。在4个数据集上进行了大量的实验,我们的模型FFBA-VM在多个评价指标上有效地提高了分割精度。例如,FFBA-VM在ISIC17数据集中mIoU和DSC分别达到80.28%和89.06%,在ISIC18数据集中mIoU和DSC分别达到80.47%和89.17%。实验结果表明,我们提出的FFBA-VM可以优于现有的最先进的方法,验证了其对皮肤病变分割的有效性和实用性。
{"title":"When Mamba meets CNN: A hybrid architecture for skin lesion segmentation","authors":"Yun Xiao ,&nbsp;Caijuan Shi ,&nbsp;Jinghao Jia ,&nbsp;Ao Cai ,&nbsp;Yinan Zhang ,&nbsp;Meiwen Zhang","doi":"10.1016/j.imavis.2025.105880","DOIUrl":"10.1016/j.imavis.2025.105880","url":null,"abstract":"<div><div>As an important means of computer-aided diagnosis and treatment of skin cancer, skin lesion segmentation has recently received extensive research based on different deep models. Because of the limitations of each single deep model, the hybrid architectures, especially the Mamba–CNN based methods, have become a research hotspot. However, the segmentation accuracy of existing Mamba–CNN methods is still limited, especially for lesions of varying sizes and blurry boundaries, meanwhile, the model’s computational complexity remains high. Therefore, to address these issues and improve the skin lesion segmentation performance, we propose a new Mamba–CNN based model, named Feature Fusion and Boundary Awareness Visual Mamba (FFBA-VM). Specifically, the designed Multi-scale Hybrid Attention Interaction (MHAI) module can enhance the multi-scale feature representation with the powerful capability of long-range dependency modeling to obtain rich local and global information. The designed Region Localization and Boundary Enhancement (RLBE) module can effectively explore the local information to alleviate inaccurate skin lesion localization and boundary blurring. The Lightweight Visual State Space (LVSS) module is designed to reduce the model’s computational complexity. Extensive experiments are conducted on four datasets, and our model FFBA-VM effectively boosts the segmentation accuracy in terms of multiple evaluation metrics. For example, FFBA-VM achieves mIoU and DSC of 80.28% and 89.06% on the ISIC17 dataset, and reaches mIoU and DSC of 80.47% and 89.17% on the ISIC18 dataset. The experimental results indicate that our proposed FFBA-VM can outperform the existing state-of-the-art methods, validating its effectiveness and practicality for skin lesion segmentation.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"166 ","pages":"Article 105880"},"PeriodicalIF":4.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CONXA: A CONvnext and CROSS-attention combination network for Semantic Edge Detection CONXA:一种用于语义边缘检测的卷积和交叉注意组合网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-15 DOI: 10.1016/j.imavis.2025.105867
Gwangsoo Kim , Hyuk-jae Lee , Hyunmin Jung
Semantic Edge Detection (SED) is an advanced edge detection technique that simultaneously detects edges in an image and classifies them according to their semantics. It is expected to be applied in various fields, such as medical imaging, satellite imagery, and smart manufacturing. Although previous research on SED has significantly improved performance, further advancements are still needed. In particular, existing studies have typically focused on specific types of dataset, limiting the broader applicability of SED techniques. Motivated by this, our paper makes three key contributions. First, we propose a novel network for SED, called CONXA. CONXA improves SED accuracy by leveraging the powerful feature extraction of ConvNeXt and the effective feature combination of cross-attention. Second, we introduce a novel loss function, Inverted Dice (I-Dice) loss, which calculates loss based on a sufficient number of non-edge pixels rather than edge pixels. This helps balance false positives and false negatives, enabling more stable training. Third, unlike previous studies that typically use only one type of dataset, we validate our method using two distinct types of datasets commonly used in SED. Experimental results demonstrate that our approach significantly outperforms existing state-of-the-art (SOTA) methods on datasets that define semantics by edge types, and achieves comparable performance to SOTA methods on datasets where semantics are defined by object boundaries. This indicates that our method can be effectively applied across diverse datasets regardless of the semantic characteristics of edges, contributing to the generalization of SED. Code is available at https://github.com/GSKIM13/CONXA/.
语义边缘检测(Semantic Edge Detection, SED)是一种先进的边缘检测技术,它可以同时检测图像中的边缘,并根据其语义对边缘进行分类。它有望应用于医学成像、卫星图像和智能制造等各个领域。虽然先前的研究已经显著提高了SED的性能,但仍需要进一步的发展。特别是,现有的研究通常集中在特定类型的数据集上,限制了SED技术的广泛适用性。在此激励下,本文做出了三个关键贡献。首先,我们提出了一个新的SED网络,称为CONXA。CONXA通过利用ConvNeXt强大的特征提取和交叉注意的有效特征组合来提高SED的准确性。其次,我们引入了一种新的损失函数,倒骰子(I-Dice)损失,它基于足够数量的非边缘像素而不是边缘像素来计算损失。这有助于平衡假阳性和假阴性,使训练更稳定。第三,与以前的研究通常只使用一种类型的数据集不同,我们使用SED中常用的两种不同类型的数据集来验证我们的方法。实验结果表明,我们的方法在按边缘类型定义语义的数据集上显著优于现有的最先进的(SOTA)方法,并且在按对象边界定义语义的数据集上实现了与SOTA方法相当的性能。这表明我们的方法可以有效地应用于不同的数据集,而不考虑边缘的语义特征,有助于SED的泛化。代码可从https://github.com/GSKIM13/CONXA/获得。
{"title":"CONXA: A CONvnext and CROSS-attention combination network for Semantic Edge Detection","authors":"Gwangsoo Kim ,&nbsp;Hyuk-jae Lee ,&nbsp;Hyunmin Jung","doi":"10.1016/j.imavis.2025.105867","DOIUrl":"10.1016/j.imavis.2025.105867","url":null,"abstract":"<div><div>Semantic Edge Detection (SED) is an advanced edge detection technique that simultaneously detects edges in an image and classifies them according to their semantics. It is expected to be applied in various fields, such as medical imaging, satellite imagery, and smart manufacturing. Although previous research on SED has significantly improved performance, further advancements are still needed. In particular, existing studies have typically focused on specific types of dataset, limiting the broader applicability of SED techniques. Motivated by this, our paper makes three key contributions. First, we propose a novel network for SED, called CONXA. CONXA improves SED accuracy by leveraging the powerful feature extraction of ConvNeXt and the effective feature combination of cross-attention. Second, we introduce a novel loss function, Inverted Dice (I-Dice) loss, which calculates loss based on a sufficient number of non-edge pixels rather than edge pixels. This helps balance false positives and false negatives, enabling more stable training. Third, unlike previous studies that typically use only one type of dataset, we validate our method using two distinct types of datasets commonly used in SED. Experimental results demonstrate that our approach significantly outperforms existing state-of-the-art (SOTA) methods on datasets that define semantics by edge types, and achieves comparable performance to SOTA methods on datasets where semantics are defined by object boundaries. This indicates that our method can be effectively applied across diverse datasets regardless of the semantic characteristics of edges, contributing to the generalization of SED. Code is available at <span><span>https://github.com/GSKIM13/CONXA/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"166 ","pages":"Article 105867"},"PeriodicalIF":4.2,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SEAGNet: Spatial–Epipolar–Angular–Global feature learning for light field super-resolution 面向光场超分辨率的空间-极-角-全局特征学习
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-13 DOI: 10.1016/j.imavis.2025.105866
Xingzheng Wang, Haotian Zhang, Yuhang Lin, Yuanbo Huang, Jiahao Lin
In light field (LF) image super-resolution (SR), comprehensive learning of LF information is crucial for accurately recovering image details. Because 4D LF structures are so complex, current methods usually use special convolutions and modules to separately extract different LF characteristics (like spatial, angular, and EPI features) before combining them. But these methods focus too much on local LF information and not enough on global 4D LF features. This makes it hard for them to get better. To overcome this issue, we suggest a straightforward yet effective Global Feature Extraction Module (GFEM). This module extracts the global information from the entire 4D light field. Our method does this by using all of these features together. We also introduce a tool called the Progressive Angular Feature Extractor (PAFE), which gradually expands the area that extracts features to make sure it can extract features at different angles. We also designed a Spatial Gated Feed-forward Network (SGFN) to replace the standard feed-forward network in Transformer. This has resulted in our new Intra-Transformer architecture, which optimizes feature flow and enhances local detail extraction. We did a lot of experiments on different public datasets, and these showed that our method is better than other methods that are currently available.
在光场图像超分辨率(SR)中,全面学习光场信息是准确恢复图像细节的关键。由于4D LF结构非常复杂,目前的方法通常使用特殊的卷积和模块分别提取不同的LF特征(如空间特征、角度特征和EPI特征),然后再组合。但这些方法过于关注局部LF信息,而对全局4D LF特征关注不够。这使得他们很难变得更好。为了克服这个问题,我们提出了一个简单而有效的全局特征提取模块(GFEM)。该模块从整个4D光场中提取全局信息。我们的方法通过将所有这些特征结合在一起来实现这一点。我们还引入了渐进式角度特征提取器(Progressive Angular Feature Extractor, PAFE)工具,它逐步扩大提取特征的区域,以确保能够提取不同角度的特征。我们还设计了一个空间门控前馈网络(SGFN)来取代变压器中的标准前馈网络。这导致了我们新的Intra-Transformer架构,它优化了特征流并增强了局部细节提取。我们在不同的公共数据集上做了大量的实验,这些实验表明我们的方法比目前可用的其他方法更好。
{"title":"SEAGNet: Spatial–Epipolar–Angular–Global feature learning for light field super-resolution","authors":"Xingzheng Wang,&nbsp;Haotian Zhang,&nbsp;Yuhang Lin,&nbsp;Yuanbo Huang,&nbsp;Jiahao Lin","doi":"10.1016/j.imavis.2025.105866","DOIUrl":"10.1016/j.imavis.2025.105866","url":null,"abstract":"<div><div>In light field (LF) image super-resolution (SR), comprehensive learning of LF information is crucial for accurately recovering image details. Because 4D LF structures are so complex, current methods usually use special convolutions and modules to separately extract different LF characteristics (like spatial, angular, and EPI features) before combining them. But these methods focus too much on local LF information and not enough on global 4D LF features. This makes it hard for them to get better. To overcome this issue, we suggest a straightforward yet effective Global Feature Extraction Module (GFEM). This module extracts the global information from the entire 4D light field. Our method does this by using all of these features together. We also introduce a tool called the Progressive Angular Feature Extractor (PAFE), which gradually expands the area that extracts features to make sure it can extract features at different angles. We also designed a Spatial Gated Feed-forward Network (SGFN) to replace the standard feed-forward network in Transformer. This has resulted in our new Intra-Transformer architecture, which optimizes feature flow and enhances local detail extraction. We did a lot of experiments on different public datasets, and these showed that our method is better than other methods that are currently available.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"166 ","pages":"Article 105866"},"PeriodicalIF":4.2,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ShadowMamba: State-space model with boundary-region selective scan for shadow removal ShadowMamba:具有边界区域选择性扫描的状态空间模型,用于阴影去除
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-12 DOI: 10.1016/j.imavis.2025.105872
Xiujin Zhu , Chee-Onn Chow , Joon Huang Chuah
Image shadow removal is a typical low-level vision task, as shadows introduce abrupt local brightness variations that degrade the performance of downstream tasks. Due to the quadratic complexity of Transformers, many existing methods adopt local attention to balance accuracy and efficiency. However, restricting attention to local windows prevents true long-range dependency modeling and limits shadow removal performance. Recently, Mamba has shown strong ability in vision tasks by achieving global modeling with linear complexity. Despite this advantage, existing scanning mechanisms in the Mamba architecture are not suitable for shadow removal because they ignore the semantic continuity within the same region. To address this, a boundary-region selective scanning mechanism is proposed that captures local details while enhancing continuity among semantically related pixels, effectively improving shadow removal performance. In addition, a shadow mask denoising preprocessing method is introduced to improve the accuracy of the scanning mechanism and further enhance the data quality. Based on this, this paper presents ShadowMamba, the first Mamba-based model for shadow removal. Experimental results show that the proposed method outperforms existing mainstream approaches on the AISTD, ISTD, SRD, and WSRD+ datasets, and demonstrates good generalization ability in cross-dataset testing on USR and SBU. Meanwhile, the model also has significant advantages in parameter efficiency and computational complexity. Code is available at: https://github.com/ZHUXIUJINChris/ShadowMamba.
图像阴影去除是一项典型的低层次视觉任务,因为阴影会引入突然的局部亮度变化,从而降低下游任务的性能。由于变压器的二次复杂度,现有的许多方法都是局部关注平衡精度和效率。然而,限制对本地窗口的关注妨碍了真正的远程依赖关系建模,并限制了阴影去除的性能。最近,曼巴在视觉任务中表现出了很强的能力,实现了线性复杂性的全局建模。尽管有这样的优势,Mamba架构中现有的扫描机制并不适合阴影去除,因为它们忽略了同一区域内的语义连续性。为了解决这个问题,提出了一种边界区域选择性扫描机制,在捕获局部细节的同时增强语义相关像素之间的连续性,有效提高阴影去除性能。此外,为了提高扫描机构的精度,进一步提高数据质量,还引入了一种阴影掩模去噪预处理方法。基于此,本文提出了ShadowMamba,这是第一个基于mamba的阴影去除模型。实验结果表明,该方法在AISTD、ISTD、SRD和WSRD+数据集上优于现有主流方法,并在USR和SBU上表现出良好的跨数据集测试泛化能力。同时,该模型在参数效率和计算复杂度方面也具有显著的优势。代码可从https://github.com/ZHUXIUJINChris/ShadowMamba获得。
{"title":"ShadowMamba: State-space model with boundary-region selective scan for shadow removal","authors":"Xiujin Zhu ,&nbsp;Chee-Onn Chow ,&nbsp;Joon Huang Chuah","doi":"10.1016/j.imavis.2025.105872","DOIUrl":"10.1016/j.imavis.2025.105872","url":null,"abstract":"<div><div>Image shadow removal is a typical low-level vision task, as shadows introduce abrupt local brightness variations that degrade the performance of downstream tasks. Due to the quadratic complexity of Transformers, many existing methods adopt local attention to balance accuracy and efficiency. However, restricting attention to local windows prevents true long-range dependency modeling and limits shadow removal performance. Recently, Mamba has shown strong ability in vision tasks by achieving global modeling with linear complexity. Despite this advantage, existing scanning mechanisms in the Mamba architecture are not suitable for shadow removal because they ignore the semantic continuity within the same region. To address this, a boundary-region selective scanning mechanism is proposed that captures local details while enhancing continuity among semantically related pixels, effectively improving shadow removal performance. In addition, a shadow mask denoising preprocessing method is introduced to improve the accuracy of the scanning mechanism and further enhance the data quality. Based on this, this paper presents ShadowMamba, the first Mamba-based model for shadow removal. Experimental results show that the proposed method outperforms existing mainstream approaches on the AISTD, ISTD, SRD, and WSRD+ datasets, and demonstrates good generalization ability in cross-dataset testing on USR and SBU. Meanwhile, the model also has significant advantages in parameter efficiency and computational complexity. Code is available at: <span><span>https://github.com/ZHUXIUJINChris/ShadowMamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"166 ","pages":"Article 105872"},"PeriodicalIF":4.2,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LoGA-Attack: Local geometry-aware adversarial attack on 3D point clouds LoGA-Attack:对3D点云的局部几何感知对抗性攻击
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.imavis.2025.105871
Jia Yuan , Jun Chen , Chongshou Li , Pedro Alonso , Xinke Li , Tianrui Li
Adversarial attacks on 3D point clouds are increasingly critical for safety-sensitive domains like autonomous driving. Most existing methods ignore local geometric structure, yielding perturbations that harm imperceptibility and geometric consistency. We introduce local geometry-aware adversarial attack (LoGA-Attack), a local geometry-aware approach that exploits topological and geometric cues to craft refined perturbations. A Neighborhood Centrality (NC) score partitions points into contour and flat points sets. Contour points receive gradient-based iterative updates to maximize attack strength, while flat points use an Optimal Neighborhood-based Attack (ONA) that projects gradients onto the most consistent local geometric direction. Experiments on ModelNet40 and ScanObjectNN show higher attack success with lower perceptual distortion, demonstrating superior performance and strong transferability. Our code is available at: https://github.com/yuanjiachn/LoGA-Attack.
对3D点云的对抗性攻击在自动驾驶等安全敏感领域变得越来越重要。大多数现有的方法忽略了局部几何结构,产生了损害不可感知性和几何一致性的扰动。我们介绍了局部几何感知对抗性攻击(LoGA-Attack),这是一种利用拓扑和几何线索来制作精细扰动的局部几何感知方法。邻域中心性(NC)评分将点划分为轮廓点集和平面点集。轮廓点接收基于梯度的迭代更新以最大化攻击强度,而平坦点使用基于最优邻域的攻击(ONA),将梯度投影到最一致的局部几何方向上。在ModelNet40和ScanObjectNN上的实验表明,攻击成功率高,感知失真小,性能优越,可移植性强。我们的代码可在:https://github.com/yuanjiachn/LoGA-Attack。
{"title":"LoGA-Attack: Local geometry-aware adversarial attack on 3D point clouds","authors":"Jia Yuan ,&nbsp;Jun Chen ,&nbsp;Chongshou Li ,&nbsp;Pedro Alonso ,&nbsp;Xinke Li ,&nbsp;Tianrui Li","doi":"10.1016/j.imavis.2025.105871","DOIUrl":"10.1016/j.imavis.2025.105871","url":null,"abstract":"<div><div>Adversarial attacks on 3D point clouds are increasingly critical for safety-sensitive domains like autonomous driving. Most existing methods ignore local geometric structure, yielding perturbations that harm imperceptibility and geometric consistency. We introduce local geometry-aware adversarial attack (LoGA-Attack), a local geometry-aware approach that exploits topological and geometric cues to craft refined perturbations. A Neighborhood Centrality (NC) score partitions points into contour and flat points sets. Contour points receive gradient-based iterative updates to maximize attack strength, while flat points use an Optimal Neighborhood-based Attack (ONA) that projects gradients onto the most consistent local geometric direction. Experiments on ModelNet40 and ScanObjectNN show higher attack success with lower perceptual distortion, demonstrating superior performance and strong transferability. Our code is available at: <span><span>https://github.com/yuanjiachn/LoGA-Attack</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"166 ","pages":"Article 105871"},"PeriodicalIF":4.2,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Image and Vision Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1