首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Self-Adaptive Vision-Language Tracking With Context Prompting 带有上下文提示的自适应视觉语言跟踪
IF 13.7 Pub Date : 2025-12-08 DOI: 10.1109/TIP.2025.3635016
Jie Zhao;Xin Chen;Shengming Li;Chunjuan Bo;Dong Wang;Huchuan Lu
Due to the substantial gap between vision and language modalities, along with the mismatch problem between fixed language descriptions and dynamic visual information, existing vision-language tracking methods exhibit performance on par with or slightly worse than vision-only tracking. Effectively exploiting the rich semantics of language to enhance tracking robustness remains an open challenge. To address these issues, we propose a self-adaptive vision-language tracking framework that leverages the pre-trained multi-modal CLIP model to obtain well-aligned visual-language representations. A novel context-aware prompting mechanism is introduced to dynamically adapt linguistic cues based on the evolving visual context during tracking. Specifically, our context prompter extracts dynamic visual features from the current search image and integrates them into the text encoding process, enabling self-updating language embeddings. Furthermore, our framework employs a unified one-stream Transformer architecture, supporting joint training for both vision-only and vision-language tracking scenarios. Our method not only bridges the modality gap but also enhances robustness by allowing language features to evolve with visual context. Extensive experiments on four vision-language tracking benchmarks demonstrate that our method effectively leverages the advantages of language to enhance visual tracking. Our large model can obtain 55.0% AUC on $text {LaSOT}_{text {EXT}}$ and 69.0% AUC on TNL2K. Additionally, our language-only tracking model achieves performance comparable to that of state-of-the-art vision-only tracking methods on TNL2K. Code is available at https://github.com/zj5559/SAVLT
由于视觉和语言模式之间的巨大差距,以及固定语言描述与动态视觉信息之间的不匹配问题,现有的视觉语言跟踪方法的性能与仅视觉跟踪相当或略差。有效地利用语言的丰富语义来增强跟踪鲁棒性仍然是一个悬而未决的挑战。为了解决这些问题,我们提出了一种自适应视觉语言跟踪框架,该框架利用预训练的多模态CLIP模型来获得对齐良好的视觉语言表示。在跟踪过程中,引入了一种新的上下文感知提示机制,根据不断变化的视觉上下文动态适应语言线索。具体来说,我们的上下文提示器从当前搜索图像中提取动态视觉特征,并将其集成到文本编码过程中,从而实现语言嵌入的自更新。此外,我们的框架采用了统一的单流Transformer体系结构,支持纯视觉和视觉语言跟踪场景的联合训练。我们的方法不仅弥补了语态的差距,而且通过允许语言特征随着视觉上下文的发展而增强了鲁棒性。在四个视觉语言跟踪基准上的大量实验表明,我们的方法有效地利用了语言的优势来增强视觉跟踪。我们的大型模型在$text {LaSOT}_{text {EXT}}$上可以获得55.0%的AUC,在TNL2K上可以获得69.0%的AUC。此外,我们的纯语言跟踪模型在TNL2K上实现了与最先进的纯视觉跟踪方法相当的性能。代码可从https://github.com/zj5559/SAVLT获得
{"title":"Self-Adaptive Vision-Language Tracking With Context Prompting","authors":"Jie Zhao;Xin Chen;Shengming Li;Chunjuan Bo;Dong Wang;Huchuan Lu","doi":"10.1109/TIP.2025.3635016","DOIUrl":"10.1109/TIP.2025.3635016","url":null,"abstract":"Due to the substantial gap between vision and language modalities, along with the mismatch problem between fixed language descriptions and dynamic visual information, existing vision-language tracking methods exhibit performance on par with or slightly worse than vision-only tracking. Effectively exploiting the rich semantics of language to enhance tracking robustness remains an open challenge. To address these issues, we propose a self-adaptive vision-language tracking framework that leverages the pre-trained multi-modal CLIP model to obtain well-aligned visual-language representations. A novel context-aware prompting mechanism is introduced to dynamically adapt linguistic cues based on the evolving visual context during tracking. Specifically, our context prompter extracts dynamic visual features from the current search image and integrates them into the text encoding process, enabling self-updating language embeddings. Furthermore, our framework employs a unified one-stream Transformer architecture, supporting joint training for both vision-only and vision-language tracking scenarios. Our method not only bridges the modality gap but also enhances robustness by allowing language features to evolve with visual context. Extensive experiments on four vision-language tracking benchmarks demonstrate that our method effectively leverages the advantages of language to enhance visual tracking. Our large model can obtain 55.0% AUC on <inline-formula> <tex-math>$text {LaSOT}_{text {EXT}}$ </tex-math></inline-formula> and 69.0% AUC on TNL2K. Additionally, our language-only tracking model achieves performance comparable to that of state-of-the-art vision-only tracking methods on TNL2K. Code is available at <uri>https://github.com/zj5559/SAVLT</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8046-8058"},"PeriodicalIF":13.7,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145704003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WMRNet: Wavelet Mamba With Reversible Structure for Infrared Small Target Detection 基于可逆结构的小波曼巴红外小目标检测
IF 13.7 Pub Date : 2025-12-05 DOI: 10.1109/TIP.2025.3637729
Mingjin Zhang;Xiaolong Li;Jie Guo;Yunsong Li;Xinbo Gao
Infrared small target detection (IRSTD) is of great practical significance in many real-world applications, such as maritime rescue and early warning systems, benefiting from the unique and excellent infrared imaging ability in adverse weather and low-light conditions. Nevertheless, segmenting small targets from the background remains a challenge. When the subsampling frequency during image processing does not satisfy the Nyquist criterion, the aliasing effect occurs, which makes it extremely difficult to identify small targets. To address this challenge, we propose a novel Wavelet Mamba with Reversible Structure Network (WMRNet) for infrared small target detection in this paper. Specifically, WMRNet consists of a Discrete Wavelet Mamba (DW-Mamba) module and a Third-order Difference Equation guided Reversible (TDE-Rev) structure. DW-Mamba employs the Discrete Wavelet Transform to decompose images into multiple subbands, integrating this information into the state equations of a state space model. This method minimizes frequency interference while preserving a global perspective, thereby effectively reducing background aliasing. The TDE-Rev aims to suppress edge aliasing effects by refining the target edges, which first processes features with an explicit neural structure derived from the second-order difference equations and then promotes feature interactions through a reversible structure. Extensive experiments on the public IRSTD-1k and SIRST datasets demonstrate that the proposed WMRNet outperforms the state-of-the-art methods.
红外小目标探测(IRSTD)在恶劣天气和弱光条件下具有独特而优异的红外成像能力,在海上救援和预警系统等许多实际应用中具有重要的现实意义。然而,从背景中分割小目标仍然是一个挑战。当图像处理过程中的次采样频率不满足奈奎斯特准则时,就会产生混叠效应,使小目标的识别变得极其困难。为了解决这一问题,本文提出了一种用于红外小目标检测的小波曼巴可逆结构网络(WMRNet)。具体来说,WMRNet由一个离散小波曼巴(DW-Mamba)模块和一个三阶差分方程引导可逆(TDE-Rev)结构组成。DW-Mamba使用离散小波变换将图像分解成多个子带,将这些信息整合到状态空间模型的状态方程中。该方法在保持全局视角的同时最大限度地减少了频率干扰,从而有效地减少了背景混叠。TDE-Rev的目的是通过细化目标边缘来抑制边缘混叠效应,首先用二阶差分方程导出的显式神经结构处理特征,然后通过可逆结构促进特征的相互作用。在公共IRSTD-1k和SIRST数据集上进行的大量实验表明,所提出的WMRNet优于最先进的方法。
{"title":"WMRNet: Wavelet Mamba With Reversible Structure for Infrared Small Target Detection","authors":"Mingjin Zhang;Xiaolong Li;Jie Guo;Yunsong Li;Xinbo Gao","doi":"10.1109/TIP.2025.3637729","DOIUrl":"10.1109/TIP.2025.3637729","url":null,"abstract":"Infrared small target detection (IRSTD) is of great practical significance in many real-world applications, such as maritime rescue and early warning systems, benefiting from the unique and excellent infrared imaging ability in adverse weather and low-light conditions. Nevertheless, segmenting small targets from the background remains a challenge. When the subsampling frequency during image processing does not satisfy the Nyquist criterion, the aliasing effect occurs, which makes it extremely difficult to identify small targets. To address this challenge, we propose a novel Wavelet Mamba with Reversible Structure Network (WMRNet) for infrared small target detection in this paper. Specifically, WMRNet consists of a Discrete Wavelet Mamba (DW-Mamba) module and a Third-order Difference Equation guided Reversible (TDE-Rev) structure. DW-Mamba employs the Discrete Wavelet Transform to decompose images into multiple subbands, integrating this information into the state equations of a state space model. This method minimizes frequency interference while preserving a global perspective, thereby effectively reducing background aliasing. The TDE-Rev aims to suppress edge aliasing effects by refining the target edges, which first processes features with an explicit neural structure derived from the second-order difference equations and then promotes feature interactions through a reversible structure. Extensive experiments on the public IRSTD-1k and SIRST datasets demonstrate that the proposed WMRNet outperforms the state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8229-8242"},"PeriodicalIF":13.7,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
URDM: Hyperspectral Unmixing Regularized by Diffusion Models 用扩散模型正则化的高光谱解混
IF 13.7 Pub Date : 2025-12-05 DOI: 10.1109/TIP.2025.3638151
Min Zhao;Linruize Tang;Jie Chen;Bo Huang
Hyperspectral unmixing aims to decompose the mixed pixels into pure spectra and calculate their corresponding fractional abundances. It holds a critical position in hyperspectral image processing. Traditional model-based unmixing methods use convex optimization to iteratively solve the unmixing problem with hand-crafted regularizers. While their performance is limited by these manually designed constraints, which may not fully capture the structural information of the data. Recently, deep learning-based unmixing methods have shown remarkable capability for this task. However, they have limited generalizability and lack interpretability. In this paper, we propose a novel hyperspectral unmixing method regularized by a diffusion model (URDM) to overcome these shortcomings. Our method leverages the advantages of both conventional optimization algorithms and deep generative models. Specifically, we formulate the unmixing objective function from a variational perspective and integrate it into a diffusion sampling process to introduce generative priors from a denoising diffusion probabilistic model (DDPM). Since the original objective function is challenging to optimize, we introduce a splitting-based strategy to decouple it into simpler subproblems. Extensive experiment results conducted on both synthetic and real datasets demonstrate the efficiency and superior performance of our proposed method.
高光谱解混的目的是将混合像元分解为纯光谱,并计算其对应的分数丰度。它在高光谱图像处理中占有重要地位。传统的基于模型的解混方法采用凸优化方法迭代求解手工正则化的解混问题。虽然它们的性能受到这些手动设计的约束的限制,这些约束可能无法完全捕获数据的结构信息。近年来,基于深度学习的解混方法在这方面表现出了显著的能力。然而,它们具有有限的通用性和缺乏可解释性。本文提出了一种基于扩散模型(URDM)正则化的高光谱解混方法来克服这些缺点。我们的方法利用了传统优化算法和深度生成模型的优点。具体而言,我们从变分的角度制定解混目标函数,并将其集成到扩散采样过程中,以引入来自去噪扩散概率模型(DDPM)的生成先验。由于原始目标函数难以优化,我们引入了一种基于分裂的策略将其解耦为更简单的子问题。在合成数据集和真实数据集上进行的大量实验结果证明了我们提出的方法的效率和优越的性能。
{"title":"URDM: Hyperspectral Unmixing Regularized by Diffusion Models","authors":"Min Zhao;Linruize Tang;Jie Chen;Bo Huang","doi":"10.1109/TIP.2025.3638151","DOIUrl":"10.1109/TIP.2025.3638151","url":null,"abstract":"Hyperspectral unmixing aims to decompose the mixed pixels into pure spectra and calculate their corresponding fractional abundances. It holds a critical position in hyperspectral image processing. Traditional model-based unmixing methods use convex optimization to iteratively solve the unmixing problem with hand-crafted regularizers. While their performance is limited by these manually designed constraints, which may not fully capture the structural information of the data. Recently, deep learning-based unmixing methods have shown remarkable capability for this task. However, they have limited generalizability and lack interpretability. In this paper, we propose a novel hyperspectral unmixing method regularized by a diffusion model (URDM) to overcome these shortcomings. Our method leverages the advantages of both conventional optimization algorithms and deep generative models. Specifically, we formulate the unmixing objective function from a variational perspective and integrate it into a diffusion sampling process to introduce generative priors from a denoising diffusion probabilistic model (DDPM). Since the original objective function is challenging to optimize, we introduce a splitting-based strategy to decouple it into simpler subproblems. Extensive experiment results conducted on both synthetic and real datasets demonstrate the efficiency and superior performance of our proposed method.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8072-8085"},"PeriodicalIF":13.7,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Perception CNN for Facial Expression Recognition 一种用于面部表情识别的感知CNN
IF 13.7 Pub Date : 2025-12-05 DOI: 10.1109/TIP.2025.3637715
Chunwei Tian;Jingyuan Xie;Lingjun Li;Wangmeng Zuo;Yanning Zhang;David Zhang
Convolutional neural networks (CNNs) can automatically learn data patterns to express face images for facial expression recognition (FER). However, they may ignore effect of facial segmentation of FER. In this paper, we propose a perception CNN for FER as well as PCNN. Firstly, PCNN can use five parallel networks to simultaneously learn local facial features based on eyes, cheeks and mouth to realize the sensitive capture of the subtle changes in FER. Secondly, we utilize a multi-domain interaction mechanism to register and fuse between local sense organ features and global facial structural features to better express face images for FER. Finally, we design a two-phase loss function to restrict accuracy of obtained sense information and reconstructed face images to guarantee performance of obtained PCNN in FER. Experimental results show that our PCNN achieves superior results on several lab and real-world FER benchmarks: CK+, JAFFE, FER2013, FERPlus, RAF-DB and Occlusion and Pose Variant Dataset. Its code is available at https://github.com/hellloxiaotian/PCNN
卷积神经网络(cnn)可以自动学习数据模式来表达人脸图像,用于面部表情识别(FER)。然而,他们可能忽略了人脸图像分割的作用。在本文中,我们提出了一种针对FER和PCNN的感知CNN。首先,PCNN可以利用5个并行网络同时学习基于眼睛、脸颊和嘴巴的局部人脸特征,实现对FER细微变化的敏感捕捉。其次,利用多域交互机制将局部感觉器官特征与全局面部结构特征进行配准融合,更好地表达人脸图像。最后,我们设计了一个两相损失函数来限制所获得的感官信息的准确性,并重建了人脸图像,以保证所获得的PCNN在FER中的性能。实验结果表明,我们的PCNN在几个实验室和现实世界的FER基准上取得了优异的结果:CK+, JAFFE, FER2013, FERPlus, RAF-DB和遮挡和姿态变量数据集。其代码可从https://github.com/hellloxiaotian/PCNN获得
{"title":"A Perception CNN for Facial Expression Recognition","authors":"Chunwei Tian;Jingyuan Xie;Lingjun Li;Wangmeng Zuo;Yanning Zhang;David Zhang","doi":"10.1109/TIP.2025.3637715","DOIUrl":"10.1109/TIP.2025.3637715","url":null,"abstract":"Convolutional neural networks (CNNs) can automatically learn data patterns to express face images for facial expression recognition (FER). However, they may ignore effect of facial segmentation of FER. In this paper, we propose a perception CNN for FER as well as PCNN. Firstly, PCNN can use five parallel networks to simultaneously learn local facial features based on eyes, cheeks and mouth to realize the sensitive capture of the subtle changes in FER. Secondly, we utilize a multi-domain interaction mechanism to register and fuse between local sense organ features and global facial structural features to better express face images for FER. Finally, we design a two-phase loss function to restrict accuracy of obtained sense information and reconstructed face images to guarantee performance of obtained PCNN in FER. Experimental results show that our PCNN achieves superior results on several lab and real-world FER benchmarks: CK+, JAFFE, FER2013, FERPlus, RAF-DB and Occlusion and Pose Variant Dataset. Its code is available at <uri>https://github.com/hellloxiaotian/PCNN</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8101-8113"},"PeriodicalIF":13.7,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransDiff: Unsupervised Non-Line-of-Sight Imaging With Aperture-Limited Relay Surfaces TransDiff:无监督非视线成像与孔径有限的继电器表面
IF 13.7 Pub Date : 2025-12-04 DOI: 10.1109/TIP.2025.3637694
Xingyu Cui;Huanjing Yue;Shida Sun;Yue Li;Yusen Hou;Zhiwei Xiong;Jingyu Yang
Non-line-of-sight (NLOS) imaging aims to reconstruct scenes hidden from direct view and has broad applications in robotic vision, rescue operations, autonomous driving, and remote sensing. However, most existing methods rely on densely sampled transients from large, continuous relay surfaces, which limits their practicality in real-world scenarios with aperture constraints. To address this limitation, we propose an unsupervised zero-shot framework tailored for confocal NLOS imaging with aperture-limited relay surfaces. Our method leverages latent diffusion models to recover fully-sampled transients from undersampled versions by enforcing measurement consistency during the sampling process. To further improve recovered transient quality, we introduce a progressive recovery strategy that incrementally recovers missing transient values, effectively mitigating the impact of severe aperture limitations. In addition, to suppress error propagation during recovery, we develop a backpropagation-based error correction reconstruction algorithm that refines intermediate recovered transients by enforcing sparsity regularization in the voxel domain, enabling high-fidelity final reconstructions. Extensive experiments on both simulated and real-world datasets validate the robustness and generalization capability of our method across diverse aperture-limited relay surfaces. Notably, our method follows a zero-shot paradigm, requiring only a single pretraining stage without paired data or pattern-specific retraining, which makes it a more practical and generalizable framework for NLOS imaging.
非视距成像(NLOS)旨在重建隐藏在直接视野之外的场景,在机器人视觉、救援行动、自动驾驶和遥感等领域有着广泛的应用。然而,大多数现有方法依赖于从大型连续继电器表面密集采样的瞬态,这限制了它们在具有孔径约束的现实场景中的实用性。为了解决这一限制,我们提出了一个无监督的零射击框架,专门用于具有光圈限制的继电器表面的共聚焦NLOS成像。我们的方法利用潜在扩散模型,通过在采样过程中加强测量一致性,从欠采样版本中恢复完全采样的瞬态。为了进一步提高恢复的瞬态质量,我们引入了一种渐进恢复策略,该策略可以逐步恢复缺失的瞬态值,有效减轻严重孔径限制的影响。此外,为了抑制恢复过程中的错误传播,我们开发了一种基于反向传播的纠错重建算法,该算法通过在体素域强制稀疏正则化来细化中间恢复瞬态,从而实现高保真的最终重建。在模拟和现实世界数据集上进行的大量实验验证了我们的方法在不同孔径限制继电器表面上的鲁棒性和泛化能力。值得注意的是,我们的方法遵循零射击范式,只需要一个单独的预训练阶段,而不需要配对数据或特定模式的再训练,这使得它成为一个更实用和可推广的NLOS成像框架。
{"title":"TransDiff: Unsupervised Non-Line-of-Sight Imaging With Aperture-Limited Relay Surfaces","authors":"Xingyu Cui;Huanjing Yue;Shida Sun;Yue Li;Yusen Hou;Zhiwei Xiong;Jingyu Yang","doi":"10.1109/TIP.2025.3637694","DOIUrl":"10.1109/TIP.2025.3637694","url":null,"abstract":"Non-line-of-sight (NLOS) imaging aims to reconstruct scenes hidden from direct view and has broad applications in robotic vision, rescue operations, autonomous driving, and remote sensing. However, most existing methods rely on densely sampled transients from large, continuous relay surfaces, which limits their practicality in real-world scenarios with aperture constraints. To address this limitation, we propose an unsupervised zero-shot framework tailored for confocal NLOS imaging with aperture-limited relay surfaces. Our method leverages latent diffusion models to recover fully-sampled transients from undersampled versions by enforcing measurement consistency during the sampling process. To further improve recovered transient quality, we introduce a progressive recovery strategy that incrementally recovers missing transient values, effectively mitigating the impact of severe aperture limitations. In addition, to suppress error propagation during recovery, we develop a backpropagation-based error correction reconstruction algorithm that refines intermediate recovered transients by enforcing sparsity regularization in the voxel domain, enabling high-fidelity final reconstructions. Extensive experiments on both simulated and real-world datasets validate the robustness and generalization capability of our method across diverse aperture-limited relay surfaces. Notably, our method follows a zero-shot paradigm, requiring only a single pretraining stage without paired data or pattern-specific retraining, which makes it a more practical and generalizable framework for NLOS imaging.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8018-8031"},"PeriodicalIF":13.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Better Image Filter for Pansharpening 更好的图像滤镜的泛锐化
IF 13.7 Pub Date : 2025-12-04 DOI: 10.1109/TIP.2025.3637675
Anjing Guo;Renwei Dian;Nan Wang;Shutao Li
The modulation transfer function tailored image filter (MTF-TIF) has long been regarded as the optimal filter for multispectral image pansharpening. It excels at simulating the camera’s frequency response, thereby capturing finer image details and significantly improving pansharpening performance. However, we are skeptical about whether the pre-measured MTF is sufficient to describe the characteristics of actually acquired panchromatic image (PAN) and multispectral image (MSI). For example, any image resampling operations in geometric correction or image registration inevitably change the sharpness of acquired PAN and MSI, and the processed images no longer conform to the camera’s MTF. Further, following the Wald protocol, in deep learning (DL) methods using MTF-TIF for downsampling images to construct training data does not satisfy the generalization consistency of training and testing. To prove our point, we propose a pair of symmetric frameworks based on DL in this paper, to find better image filters suitable for both traditional and DL pansharpening methods. We embed two learnable filters into the frameworks to simulate the optimal image filter, namely anisotropic Gaussian image filter and arbitrary image filter. Further, the proposed frameworks can capture subtle offsets between images and maintain the smoothness of the global deformation field. Extensive experiments on various satellite datasets demonstrate that the proposed frameworks can find better image filters than MTF-TIFs, which can achieve better pansharpening performance with stronger generalization ability.
调制传递函数定制图像滤波器(MTF-TIF)一直被认为是多光谱图像泛锐化的最佳滤波器。它擅长模拟相机的频率响应,从而捕捉更精细的图像细节,并显着提高泛锐化性能。然而,我们怀疑预测的MTF是否足以描述实际获取的全色图像(PAN)和多光谱图像(MSI)的特征。例如,几何校正或图像配准中的任何图像重采样操作都不可避免地改变了所获取的PAN和MSI的清晰度,处理后的图像不再符合相机的MTF。此外,根据Wald协议,在深度学习(DL)方法中,使用MTF-TIF对下采样图像构建训练数据不满足训练和测试的泛化一致性。为了证明我们的观点,本文提出了一对基于深度学习的对称框架,以寻找适合传统和深度学习泛锐化方法的更好的图像滤波器。我们在框架中嵌入两个可学习的滤波器来模拟最优图像滤波器,即各向异性高斯图像滤波器和任意图像滤波器。此外,所提出的框架可以捕获图像之间的细微偏移,并保持全局变形场的平滑性。在各种卫星数据集上的大量实验表明,该框架可以找到比MTF-TIFs更好的图像滤波器,具有更好的泛锐化性能和更强的泛化能力。
{"title":"Better Image Filter for Pansharpening","authors":"Anjing Guo;Renwei Dian;Nan Wang;Shutao Li","doi":"10.1109/TIP.2025.3637675","DOIUrl":"10.1109/TIP.2025.3637675","url":null,"abstract":"The modulation transfer function tailored image filter (MTF-TIF) has long been regarded as the optimal filter for multispectral image pansharpening. It excels at simulating the camera’s frequency response, thereby capturing finer image details and significantly improving pansharpening performance. However, we are skeptical about whether the pre-measured MTF is sufficient to describe the characteristics of actually acquired panchromatic image (PAN) and multispectral image (MSI). For example, any image resampling operations in geometric correction or image registration inevitably change the sharpness of acquired PAN and MSI, and the processed images no longer conform to the camera’s MTF. Further, following the Wald protocol, in deep learning (DL) methods using MTF-TIF for downsampling images to construct training data does not satisfy the generalization consistency of training and testing. To prove our point, we propose a pair of symmetric frameworks based on DL in this paper, to find better image filters suitable for both traditional and DL pansharpening methods. We embed two learnable filters into the frameworks to simulate the optimal image filter, namely anisotropic Gaussian image filter and arbitrary image filter. Further, the proposed frameworks can capture subtle offsets between images and maintain the smoothness of the global deformation field. Extensive experiments on various satellite datasets demonstrate that the proposed frameworks can find better image filters than MTF-TIFs, which can achieve better pansharpening performance with stronger generalization ability.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8171-8184"},"PeriodicalIF":13.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
InfoARD: Enhancing Adversarial Robustness Distillation With Attack-Strength Adaptation and Mutual-Information Maximization InfoARD:利用攻击强度自适应和互信息最大化增强对抗鲁棒性蒸馏
IF 13.7 Pub Date : 2025-12-04 DOI: 10.1109/TIP.2025.3637689
Ruihan Liu;Jieyi Cai;Yishu Liu;Sudong Cai;Bingzhi Chen;Yulan Guo;Mohammed Bennamoun
Adversarial distillation (AD) aims to mitigate deep neural networks’ inherent vulnerability to adversarial attacks, thereby providing robust protection for compact models through teacher-student interactions. Despite advancements, existing AD studies still suffer from insufficient robustness due to the limitations of fixed attack strength and attention region shifts. To address these challenges, we propose a strength-adaptive Info-maximizing Adversarial Robustness Distillation paradigm, namely “InfoARD”, which strategically incorporates the Attack-Strength Adaptation (ASA) and Mutual-Information Maximization (MIM) to enhance adversarial robustness against adversarial attacks and perturbations. Unlike previous adversarial training (AT) methods that utilize fixed attack strength, the ASA mechanism is designed to capture smoother and generalized classification boundaries by dynamically tailoring the attack strength based on the characteristics of individual instances. Benefiting from mutual information constraints, our MIM strategy ensures the student model effectively learns from various levels of feature representations and attention patterns, thereby deepening the student model’s understanding of the teacher model’s decision-making processes. Furthermore, a comprehensive multi-granularity distillation is conducted to capture knowledge across multiple dimensions, enabling a more effective transfer of knowledge from the teacher model to the student model. Note that our InfoARD can be seamlessly integrated into existing AD frameworks, further boosting the adversarial robustness of deep learning models. Extensive experiments on various challenging datasets consistently demonstrate the effectiveness and robustness of our InfoARD, surpassing previous state-of-the-art methods.
对抗性蒸馏(AD)旨在减轻深度神经网络对对抗性攻击的固有脆弱性,从而通过师生交互为紧凑模型提供强大的保护。现有的AD研究虽然取得了一定的进展,但由于固定攻击强度和注意区域转移的限制,仍然存在鲁棒性不足的问题。为了应对这些挑战,我们提出了一种强度自适应的信息最大化对抗鲁棒性提炼范式,即“InfoARD”,该范式战略性地结合了攻击强度适应(ASA)和互信息最大化(MIM),以增强对抗攻击和扰动的对抗鲁棒性。与以前使用固定攻击强度的对抗性训练(AT)方法不同,ASA机制旨在通过基于单个实例的特征动态调整攻击强度来捕获更平滑和广义的分类边界。得益于相互信息约束,我们的MIM策略确保学生模型有效地从不同层次的特征表示和注意模式中学习,从而加深学生模型对教师模型决策过程的理解。此外,还进行了全面的多粒度蒸馏,以跨多个维度捕获知识,从而更有效地将知识从教师模型转移到学生模型。请注意,我们的InfoARD可以无缝集成到现有的AD框架中,进一步提高深度学习模型的对抗鲁棒性。在各种具有挑战性的数据集上进行的大量实验一致证明了我们的InfoARD的有效性和鲁棒性,超越了以前最先进的方法。
{"title":"InfoARD: Enhancing Adversarial Robustness Distillation With Attack-Strength Adaptation and Mutual-Information Maximization","authors":"Ruihan Liu;Jieyi Cai;Yishu Liu;Sudong Cai;Bingzhi Chen;Yulan Guo;Mohammed Bennamoun","doi":"10.1109/TIP.2025.3637689","DOIUrl":"10.1109/TIP.2025.3637689","url":null,"abstract":"Adversarial distillation (AD) aims to mitigate deep neural networks’ inherent vulnerability to adversarial attacks, thereby providing robust protection for compact models through teacher-student interactions. Despite advancements, existing AD studies still suffer from insufficient robustness due to the limitations of fixed attack strength and attention region shifts. To address these challenges, we propose a strength-adaptive Info-maximizing Adversarial Robustness Distillation paradigm, namely “InfoARD”, which strategically incorporates the Attack-Strength Adaptation (ASA) and Mutual-Information Maximization (MIM) to enhance adversarial robustness against adversarial attacks and perturbations. Unlike previous adversarial training (AT) methods that utilize fixed attack strength, the ASA mechanism is designed to capture smoother and generalized classification boundaries by dynamically tailoring the attack strength based on the characteristics of individual instances. Benefiting from mutual information constraints, our MIM strategy ensures the student model effectively learns from various levels of feature representations and attention patterns, thereby deepening the student model’s understanding of the teacher model’s decision-making processes. Furthermore, a comprehensive multi-granularity distillation is conducted to capture knowledge across multiple dimensions, enabling a more effective transfer of knowledge from the teacher model to the student model. Note that our InfoARD can be seamlessly integrated into existing AD frameworks, further boosting the adversarial robustness of deep learning models. Extensive experiments on various challenging datasets consistently demonstrate the effectiveness and robustness of our InfoARD, surpassing previous state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"276-289"},"PeriodicalIF":13.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Motion Attention-Guided Relational Reasoning for Weakly Supervised Group Activity Recognition 弱监督群体活动识别的运动注意引导关系推理。
IF 13.7 Pub Date : 2025-12-03 DOI: 10.1109/TIP.2025.3636094
Yihao Zheng;Zhuming Wang;Lifang Wu;Liang Wang;Chang Wen Chen
The existing attention-based label-free weakly supervised group activity recognition methods can automatically learn tokens related to the actors. And they have difficulties generating sufficiently diverse token embeddings. To address these issues, we automatically obtain the grayscale motion mask of all the moving objects based on the motion direction not the motion amplitude. A Motion-Guided Mask Generator module (MGMG) is proposed to estimate the attention region mask under the supervision of the grayscale motion mask. MGMG involves four parts. A correlation layer measures the relative displacement between two adjacent feature maps. A cosine attention mechanism is designed to reduce the module’s sensitivity to feature amplitude changes. A mask generator is built to generate the attention region mask. And a specifically designed activation function is used to refine the attention region mask and to enhance its focus on actor motion regions. We also customize a normalized relative error loss function for MGMG module. This loss can address the value range mismatch problem for the estimated attention mask as well as the grayscale motion mask. Furthermore, a Motion Attention-Guided Relational Reasoning (MAGRR) framework is presented for the weakly supervised condition. It uses the MGMG module to estimate the attention region automatically, and a Spatial-temporal Aggregation Stack (SAS) module to activate the attention regions of the features at the spatial level, then transform them into multiple tokens, which are further captured by the attention mechanism for their temporal dependencies and interrelationships. MAGRR is experimented on the Collective Activity dataset and the Collective Activity Extension dataset, achieving state-of-the-art performance and competitive performance on the Volleyball and the NBA datasets.
现有的基于注意的无标签弱监督群体活动识别方法可以自动学习与参与者相关的标记。而且它们难以生成足够多样化的令牌嵌入。为了解决这些问题,我们基于运动方向而不是运动幅度自动获得所有运动物体的灰度运动蒙版。在灰度运动蒙版的监督下,提出了一种运动引导蒙版生成模块(MGMG)来估计注意区域的蒙版。mmg包括四个部分。相关层测量两个相邻特征映射之间的相对位移。设计了余弦注意机制来降低模块对特征振幅变化的灵敏度。构建了一个掩码生成器来生成注意区域掩码。并利用特别设计的激活函数对注意区域蒙版进行细化,增强其对行动者运动区域的关注。我们还为MGMG模块定制了一个归一化的相对误差损失函数。这种损失可以解决估计的注意力蒙版和灰度运动蒙版的值范围不匹配问题。在此基础上,提出了一个弱监督条件下的运动注意引导关系推理框架。利用mmg模块自动估计注意区域,利用SAS (spatial -temporal Aggregation Stack)模块在空间层面激活特征的注意区域,并将其转化为多个令牌,由注意机制捕获它们的时间依赖性和相互关系。MAGRR在集体活动数据集和集体活动扩展数据集上进行了实验,在排球和NBA数据集上获得了最先进的性能和竞争性能。
{"title":"Motion Attention-Guided Relational Reasoning for Weakly Supervised Group Activity Recognition","authors":"Yihao Zheng;Zhuming Wang;Lifang Wu;Liang Wang;Chang Wen Chen","doi":"10.1109/TIP.2025.3636094","DOIUrl":"10.1109/TIP.2025.3636094","url":null,"abstract":"The existing attention-based label-free weakly supervised group activity recognition methods can automatically learn tokens related to the actors. And they have difficulties generating sufficiently diverse token embeddings. To address these issues, we automatically obtain the grayscale motion mask of all the moving objects based on the motion direction not the motion amplitude. A Motion-Guided Mask Generator module (MGMG) is proposed to estimate the attention region mask under the supervision of the grayscale motion mask. MGMG involves four parts. A correlation layer measures the relative displacement between two adjacent feature maps. A cosine attention mechanism is designed to reduce the module’s sensitivity to feature amplitude changes. A mask generator is built to generate the attention region mask. And a specifically designed activation function is used to refine the attention region mask and to enhance its focus on actor motion regions. We also customize a normalized relative error loss function for MGMG module. This loss can address the value range mismatch problem for the estimated attention mask as well as the grayscale motion mask. Furthermore, a Motion Attention-Guided Relational Reasoning (MAGRR) framework is presented for the weakly supervised condition. It uses the MGMG module to estimate the attention region automatically, and a Spatial-temporal Aggregation Stack (SAS) module to activate the attention regions of the features at the spatial level, then transform them into multiple tokens, which are further captured by the attention mechanism for their temporal dependencies and interrelationships. MAGRR is experimented on the Collective Activity dataset and the Collective Activity Extension dataset, achieving state-of-the-art performance and competitive performance on the Volleyball and the NBA datasets.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8159-8170"},"PeriodicalIF":13.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145663921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterospectral Structure Compensation Sampling for Hyperspectral Fusion Computational Imaging 高光谱融合计算成像的异质光谱结构补偿采样。
IF 13.7 Pub Date : 2025-12-03 DOI: 10.1109/TIP.2025.3636676
Jinyang Liu;Shutao Li;Heng Yang;Renwei Dian;Yuanye Liu
Existing hyperspectral fusion computational imaging methods primarily rely on using high-resolution multispectral images (HRMSI) to provide spatial details for low-resolution hyperspectral images (LRHSI), thereby enabling the reconstruction of hyperspectral images. However, these methods are often limited by the low spectral resolution of the HRMSI, making the sampled tensors unable to provide effective information for the LRHSI in a finer spectral range. To achieve more accurate computational imaging results, we propose a Heterospectral Structure Compensation Sampling (HSC-sampling) mechanism. Unlike traditional spatial sampling methods, which directly calculate the interpolation between adjacent pixels, this mechanism analyzes the structural complementarity among different bands in LRHSI. It utilizes the information from other bands to compensate for the missing details in the current band. Additionally, a novel Multi-phase Mixed Modeling (M2M) approach is designed, expanding the model’s analytical capabilities into multiple phases to accommodate the high-dimensional nature of HSI data. Specifically, it extracts fusion features from three phases and organizes the generated features along with the input features into a multi-variate mixed cube based on phase relationships, thereby capturing feature correlations across different phases. Based on the HSC-sampling mechanism and the M2M approach, we construct a Merging Residual Concatenation (MRC) hyperspectral fusion computational imaging network. Compared to other state-of-the-art methods, this network achieves significant improvements in fusion performance across multiple datasets. Moreover, the effectiveness of the HSC-sampling mechanism has been demonstrated in various hyperspectral imaging tasks. Code is available at: https://github.com/1318133/HSC-Sampling
现有的高光谱融合计算成像方法主要依赖于利用高分辨率多光谱图像(HRMSI)为低分辨率高光谱图像(LRHSI)提供空间细节,从而实现高光谱图像的重建。然而,这些方法往往受到HRMSI低光谱分辨率的限制,使得采样的张量无法在更细的光谱范围内为LRHSI提供有效的信息。为了获得更精确的计算成像结果,我们提出了一种异质光谱结构补偿采样(hsc -采样)机制。与传统的空间采样方法直接计算相邻像素之间的插值不同,该机制分析了LRHSI中不同波段之间的结构互补性。它利用其他频段的信息来补偿当前频段中缺失的细节。此外,设计了一种新的多相混合建模(M2M)方法,将模型的分析能力扩展到多个阶段,以适应HSI数据的高维特性。具体来说,它从三个阶段提取融合特征,并根据阶段关系将生成的特征与输入特征组织成一个多变量混合立方体,从而捕获不同阶段之间的特征相关性。基于hsc采样机制和M2M方法,构建了一个合并残差级联(MRC)高光谱融合计算成像网络。与其他最先进的方法相比,该网络在跨多数据集的融合性能方面取得了显着改善。此外,hsc采样机制的有效性已经在各种高光谱成像任务中得到了证明。代码可从https://github.com/1318133/HSC-Sampling获得。
{"title":"Heterospectral Structure Compensation Sampling for Hyperspectral Fusion Computational Imaging","authors":"Jinyang Liu;Shutao Li;Heng Yang;Renwei Dian;Yuanye Liu","doi":"10.1109/TIP.2025.3636676","DOIUrl":"10.1109/TIP.2025.3636676","url":null,"abstract":"Existing hyperspectral fusion computational imaging methods primarily rely on using high-resolution multispectral images (HRMSI) to provide spatial details for low-resolution hyperspectral images (LRHSI), thereby enabling the reconstruction of hyperspectral images. However, these methods are often limited by the low spectral resolution of the HRMSI, making the sampled tensors unable to provide effective information for the LRHSI in a finer spectral range. To achieve more accurate computational imaging results, we propose a Heterospectral Structure Compensation Sampling (HSC-sampling) mechanism. Unlike traditional spatial sampling methods, which directly calculate the interpolation between adjacent pixels, this mechanism analyzes the structural complementarity among different bands in LRHSI. It utilizes the information from other bands to compensate for the missing details in the current band. Additionally, a novel Multi-phase Mixed Modeling (M2M) approach is designed, expanding the model’s analytical capabilities into multiple phases to accommodate the high-dimensional nature of HSI data. Specifically, it extracts fusion features from three phases and organizes the generated features along with the input features into a multi-variate mixed cube based on phase relationships, thereby capturing feature correlations across different phases. Based on the HSC-sampling mechanism and the M2M approach, we construct a Merging Residual Concatenation (MRC) hyperspectral fusion computational imaging network. Compared to other state-of-the-art methods, this network achieves significant improvements in fusion performance across multiple datasets. Moreover, the effectiveness of the HSC-sampling mechanism has been demonstrated in various hyperspectral imaging tasks. Code is available at: <uri>https://github.com/1318133/HSC-Sampling</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"7930-7942"},"PeriodicalIF":13.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145663922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Face Recognition via Adaptive Mining and Margining of Noise and Hard Samples 基于噪声和硬样本自适应挖掘和边缘的鲁棒人脸识别。
IF 13.7 Pub Date : 2025-12-03 DOI: 10.1109/TIP.2025.3634979
Yang Xin;Xiang Zhong;Yu Zhou;Jianmin Jiang
At present, deep face recognition models working on millions of images are confronted with the challenge that such large-scale datasets are often corrupted with noises and mislabeled identities yet most deep models are primarily designed for clean datasets. In this paper, we propose a robust deep face recognition model by exploiting the advantage of integrating the strength of margin-based learning models with the strength of mining-based approaches to effectively mitigate the impact of noises during training. By monitoring the recognition performances at a batch level to provide optimization-oriented feedback, we introduce a noise-adaptive mining strategy to dynamically adjust the emphasis balance between hard and noise samples, enabling direct training on noisy datasets without the requirement of pre-training. With a novel anti-noise loss function, learning is empowered for direct and robust training on noisy datasets yet its effectiveness over clean datasets is still preserved, sustaining effective mining of both clean and noisy samples whilst weakening its learning intensiveness over noisy samples. Extensive experiments reveal that: (i) our proposed achieves competitive performances in comparison with representative existing SoTA models when trained with clean datasets; (ii) when trained with both real-world and synthesized noisy datasets, our proposed significantly outperforms the existing models, especially when the synthesized datasets are corrupted with both close-set and open-set noises; (iii) while the existing deep models suffer from an average performance drop of around 20% over noise-corrupted large scale datasets, our proposed still delivers accuracy rates of more than 95%. Our source codes are publicly available on GitHub.
目前,处理数百万张图像的深度人脸识别模型面临着这种大规模数据集经常被噪声和错误标记的身份所破坏的挑战,而大多数深度模型主要是针对干净的数据集设计的。在本文中,我们提出了一种鲁棒的深度人脸识别模型,利用基于边缘的学习模型的强度与基于挖掘的方法的强度相结合的优势,有效地减轻了训练过程中噪声的影响。通过在批处理级别监测识别性能以提供面向优化的反馈,我们引入了一种自适应噪声挖掘策略来动态调整硬样本和噪声样本之间的重点平衡,从而实现在不需要预训练的情况下直接对噪声数据集进行训练。通过一种新的抗噪声损失函数,学习可以在有噪声的数据集上进行直接和鲁棒的训练,但它在干净数据集上的有效性仍然保持不变,保持对干净和有噪声样本的有效挖掘,同时削弱其在有噪声样本上的学习强度。大量的实验表明:(i)当使用干净的数据集进行训练时,与具有代表性的现有SoTA模型相比,我们提出的模型实现了具有竞争力的性能;(ii)当使用真实世界和合成噪声数据集进行训练时,我们提出的模型明显优于现有模型,特别是当合成数据集被封闭集和开集噪声破坏时;(iii)虽然现有的深度模型在受噪声破坏的大规模数据集上平均性能下降约20%,但我们提出的模型仍然提供95%以上的准确率。我们的源代码在GitHub上是公开的。
{"title":"Robust Face Recognition via Adaptive Mining and Margining of Noise and Hard Samples","authors":"Yang Xin;Xiang Zhong;Yu Zhou;Jianmin Jiang","doi":"10.1109/TIP.2025.3634979","DOIUrl":"10.1109/TIP.2025.3634979","url":null,"abstract":"At present, deep face recognition models working on millions of images are confronted with the challenge that such large-scale datasets are often corrupted with noises and mislabeled identities yet most deep models are primarily designed for clean datasets. In this paper, we propose a robust deep face recognition model by exploiting the advantage of integrating the strength of margin-based learning models with the strength of mining-based approaches to effectively mitigate the impact of noises during training. By monitoring the recognition performances at a batch level to provide optimization-oriented feedback, we introduce a noise-adaptive mining strategy to dynamically adjust the emphasis balance between hard and noise samples, enabling direct training on noisy datasets without the requirement of pre-training. With a novel anti-noise loss function, learning is empowered for direct and robust training on noisy datasets yet its effectiveness over clean datasets is still preserved, sustaining effective mining of both clean and noisy samples whilst weakening its learning intensiveness over noisy samples. Extensive experiments reveal that: (i) our proposed achieves competitive performances in comparison with representative existing SoTA models when trained with clean datasets; (ii) when trained with both real-world and synthesized noisy datasets, our proposed significantly outperforms the existing models, especially when the synthesized datasets are corrupted with both close-set and open-set noises; (iii) while the existing deep models suffer from an average performance drop of around 20% over noise-corrupted large scale datasets, our proposed still delivers accuracy rates of more than 95%. Our source codes are publicly available on GitHub.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8114-8129"},"PeriodicalIF":13.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145663920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1