首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Positional Encoding Image Prior. 位置编码图像先验。
IF 13.7 Pub Date : 2026-02-06 DOI: 10.1109/TIP.2026.3653206
Nimrod Shabtay, Eli Schwartz, Raja Giryes

In Deep Image Prior (DIP), a Convolutional Neural Network (CNN) is fitted to map a latent space to a degraded (e.g. noisy) image but in the process learns to reconstruct the clean image. This phenomenon is attributed to CNN's internal image prior. We revisit the DIP framework, examining it from the perspective of a neural implicit representation. Motivated by this perspective, we replace the random latent with Fourier-Features (Positional Encoding). We empirically demonstrate that the convolution layers in DIP can be replaced with simple pixel-level MLPs thanks to the Fourier features properties. We also prove that they are equivalent in the case of linear networks. We name our scheme "Positional Encoding Image Prior" (PIP) and exhibit that it performs very similar to DIP on various image-reconstruction tasks with much fewer parameters. Furthermore, we demonstrate that PIP can be easily extended to videos, an area where methods based on image-priors and certain INR approaches face challenges with stability. Code and additional examples for all tasks, including videos, are available on the project page nimrodshabtay.github.io/PIP.

在深度图像先验(DIP)中,卷积神经网络(CNN)拟合将潜在空间映射到退化(例如噪声)图像,但在此过程中学习重建干净图像。这种现象归因于CNN的内部图像先验。我们重新审视DIP框架,从神经隐式表示的角度来考察它。基于这一观点,我们用傅里叶特征(位置编码)代替了随机潜函数。我们通过经验证明,由于傅里叶特征的性质,DIP中的卷积层可以被简单的像素级mlp所取代。我们还证明了它们在线性网络的情况下是等价的。我们将我们的方案命名为“位置编码图像先验”(PIP),并证明它在各种图像重建任务上的性能与DIP非常相似,参数要少得多。此外,我们证明PIP可以很容易地扩展到视频,这是一个基于图像先验和某些INR方法面临稳定性挑战的领域。所有任务的代码和其他示例,包括视频,都可以在项目页面nimrodshabay .github.io/PIP上获得。
{"title":"Positional Encoding Image Prior.","authors":"Nimrod Shabtay, Eli Schwartz, Raja Giryes","doi":"10.1109/TIP.2026.3653206","DOIUrl":"https://doi.org/10.1109/TIP.2026.3653206","url":null,"abstract":"<p><p>In Deep Image Prior (DIP), a Convolutional Neural Network (CNN) is fitted to map a latent space to a degraded (e.g. noisy) image but in the process learns to reconstruct the clean image. This phenomenon is attributed to CNN's internal image prior. We revisit the DIP framework, examining it from the perspective of a neural implicit representation. Motivated by this perspective, we replace the random latent with Fourier-Features (Positional Encoding). We empirically demonstrate that the convolution layers in DIP can be replaced with simple pixel-level MLPs thanks to the Fourier features properties. We also prove that they are equivalent in the case of linear networks. We name our scheme \"Positional Encoding Image Prior\" (PIP) and exhibit that it performs very similar to DIP on various image-reconstruction tasks with much fewer parameters. Furthermore, we demonstrate that PIP can be easily extended to videos, an area where methods based on image-priors and certain INR approaches face challenges with stability. Code and additional examples for all tasks, including videos, are available on the project page nimrodshabtay.github.io/PIP.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":""},"PeriodicalIF":13.7,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Confident Block Diagonal Analysis for Multi-View Palmprint Recognition in Unrestrained Environment 无约束环境下多视点掌纹识别的高置信度块对角分析
IF 13.7 Pub Date : 2026-02-04 DOI: 10.1109/TIP.2026.3659325
Shuping Zhao;Lunke Fei;Tingting Chai;Jie Wen;Bob Zhang;Jinrong Cui
Unrestrained palmprint recognition refers to a comprehensive identity authentication technology, that performs personal authentication based on the palmprint images captured in uncontrolled environments, i.e., smartphone cameras, surveillance footage, or near-infrared scenarios. However, unrestrained palmprint recognition faces significant challenges due to the variability in image quality, lighting conditions, and hand poses present in such settings. We observed that many existing methods utilize the subspace structure as a prior, where the block diagonal property of the data has been proved. In this paper, we consider a unified learning model to guarantee the consensus block diagonal property for all views, named high-confident block diagonal analysis for multi-view palmprint recognition (HCBDA_MPR). Particularly, this paper proposed a multi-view block diagonal regularizer to guide that all views learn a consensus block diagonal structure. In such a manner, the main discriminant features from each view can be preserved while the learning of the strict block diagonal structure across all views. Experimental results on a number of real-world unrestrained palmprint databases proved the superiority of the proposed method, where the highest recognition accuracies were obtained in comparison with the other state-of-the-art related methods.
无约束掌纹识别是指一种综合身份认证技术,它基于在不受控制的环境下(如智能手机摄像头、监控录像、近红外场景)采集的掌纹图像进行个人身份认证。然而,由于图像质量、光照条件和手部姿势的可变性,无约束掌纹识别面临着巨大的挑战。我们观察到许多现有的方法利用子空间结构作为先验,其中数据的块对角线性质已被证明。在本文中,我们考虑了一个统一的学习模型,以保证所有视图的一致块对角属性,称为高置信度块对角分析多视图掌纹识别(HCBDA_MPR)。特别地,本文提出了一个多视图块对角正则化器,以指导所有视图学习一致的块对角结构。通过这种方式,在学习所有视图的严格块对角结构的同时,可以保留每个视图的主要判别特征。在多个真实掌纹数据库上的实验结果证明了该方法的优越性,与其他先进的相关方法相比,该方法具有最高的识别精度。
{"title":"High-Confident Block Diagonal Analysis for Multi-View Palmprint Recognition in Unrestrained Environment","authors":"Shuping Zhao;Lunke Fei;Tingting Chai;Jie Wen;Bob Zhang;Jinrong Cui","doi":"10.1109/TIP.2026.3659325","DOIUrl":"10.1109/TIP.2026.3659325","url":null,"abstract":"Unrestrained palmprint recognition refers to a comprehensive identity authentication technology, that performs personal authentication based on the palmprint images captured in uncontrolled environments, i.e., smartphone cameras, surveillance footage, or near-infrared scenarios. However, unrestrained palmprint recognition faces significant challenges due to the variability in image quality, lighting conditions, and hand poses present in such settings. We observed that many existing methods utilize the subspace structure as a prior, where the block diagonal property of the data has been proved. In this paper, we consider a unified learning model to guarantee the consensus block diagonal property for all views, named high-confident block diagonal analysis for multi-view palmprint recognition (HCBDA_MPR). Particularly, this paper proposed a multi-view block diagonal regularizer to guide that all views learn a consensus block diagonal structure. In such a manner, the main discriminant features from each view can be preserved while the learning of the strict block diagonal structure across all views. Experimental results on a number of real-world unrestrained palmprint databases proved the superiority of the proposed method, where the highest recognition accuracies were obtained in comparison with the other state-of-the-art related methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1621-1635"},"PeriodicalIF":13.7,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146115782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers 基于弱监督残差变压器的精确工业异常检测与定位
IF 13.7 Pub Date : 2026-02-04 DOI: 10.1109/TIP.2026.3659337
Hanxi Li;Jingqi Wu;Deyin Liu;Lin Yuanbo Wu;Hao Chen;Chunhua Shen
Recent advancements in industrial anomaly detection (AD) have demonstrated that incorporating a small number of anomalous samples during training can significantly enhance accuracy. However, this improvement often comes at the cost of extensive annotation efforts, which are impractical for many real-world applications. In this paper, we introduce a novel framework, “Weakly-supervised RESidual $T$ ransformer” (WeakREST), designed to achieve high anomaly detection accuracy while minimizing the reliance on manual annotations. First, we reformulate the pixel-wise anomaly localization task into a block-wise classification problem. Second, we introduce a residual-based feature representation called “Positional $F$ ast $A$ nomaly $R$ esiduals” (PosFAR) which captures anomalous patterns more effectively. To leverage this feature, we adapt the Swin Transformer for enhanced anomaly detection and localization. Additionally, we propose a weak annotation approach utilizing bounding boxes and image tags to define anomalous regions. This approach establishes a semi-supervised learning context that reduces the dependency on precise pixel-level labels. To further improve the learning process, we develop a novel ResMixMatch algorithm, capable of handling the interplay between weak labels and residual-based representations. On the benchmark dataset MVTec-AD, our method achieves an Average Precision (AP) of 83.0%, surpassing the previous best result of 82.7% in the unsupervised setting. In the supervised AD setting, WeakREST attains an AP of 87.6%, outperforming the previous best of 86.0%. Notably, even when using weaker annotations such as bounding boxes, WeakREST exceeds the performance of leading methods relying on pixel-wise supervision, achieving an AP of 87.1% compared to the prior best of 86.0% on MVTec-AD. This superior performance is consistently replicated across other well-established AD datasets, including MVTec 3D, KSDD2 and Real-IAD. Code is available at: https://github.com/BeJane/Semi_REST
工业异常检测(AD)的最新进展表明,在训练过程中加入少量异常样本可以显著提高准确性。然而,这种改进通常是以大量注释工作为代价的,这对于许多实际应用程序来说是不切实际的。在本文中,我们引入了一个新的框架,“弱监督残差T变换”(WeakREST),旨在实现高异常检测精度,同时最大限度地减少对人工注释的依赖。首先,我们将逐像素异常定位任务重新表述为逐块分类问题。其次,我们引入了一种基于残差的特征表示,称为“位置$F$ ast $ a $ normal $R$残差”(PosFAR),它可以更有效地捕获异常模式。为了利用这一特性,我们对Swin Transformer进行了调整,以增强异常检测和定位。此外,我们提出了一种弱标注方法,利用边界框和图像标签来定义异常区域。这种方法建立了半监督学习环境,减少了对精确像素级标签的依赖。为了进一步改进学习过程,我们开发了一种新的ResMixMatch算法,能够处理弱标签和基于残差的表示之间的相互作用。在基准数据集MVTec-AD上,我们的方法实现了83.0%的平均精度(AP),超过了之前在无监督设置下82.7%的最佳结果。在有监督的AD设置中,WeakREST达到了87.6%的AP,超过了之前最好的86.0%。值得注意的是,即使在使用较弱的注释(如边界框)时,WeakREST的性能也超过了依赖于逐像素监督的领先方法,实现了87.1%的AP,而MVTec-AD的最佳AP为86.0%。这种卓越的性能在其他成熟的AD数据集上也得到了一致的复制,包括MVTec 3D、KSDD2和Real-IAD。代码可从https://github.com/BeJane/Semi_REST获得
{"title":"Accurate Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers","authors":"Hanxi Li;Jingqi Wu;Deyin Liu;Lin Yuanbo Wu;Hao Chen;Chunhua Shen","doi":"10.1109/TIP.2026.3659337","DOIUrl":"10.1109/TIP.2026.3659337","url":null,"abstract":"Recent advancements in industrial anomaly detection (AD) have demonstrated that incorporating a small number of anomalous samples during training can significantly enhance accuracy. However, this improvement often comes at the cost of extensive annotation efforts, which are impractical for many real-world applications. In this paper, we introduce a novel framework, “Weakly-supervised RESidual <inline-formula> <tex-math>$T$ </tex-math></inline-formula>ransformer” (WeakREST), designed to achieve high anomaly detection accuracy while minimizing the reliance on manual annotations. First, we reformulate the pixel-wise anomaly localization task into a block-wise classification problem. Second, we introduce a residual-based feature representation called “Positional <inline-formula> <tex-math>$F$ </tex-math></inline-formula>ast <inline-formula> <tex-math>$A$ </tex-math></inline-formula>nomaly <inline-formula> <tex-math>$R$ </tex-math></inline-formula>esiduals” (PosFAR) which captures anomalous patterns more effectively. To leverage this feature, we adapt the Swin Transformer for enhanced anomaly detection and localization. Additionally, we propose a weak annotation approach utilizing bounding boxes and image tags to define anomalous regions. This approach establishes a semi-supervised learning context that reduces the dependency on precise pixel-level labels. To further improve the learning process, we develop a novel ResMixMatch algorithm, capable of handling the interplay between weak labels and residual-based representations. On the benchmark dataset MVTec-AD, our method achieves an Average Precision (AP) of 83.0%, surpassing the previous best result of 82.7% in the unsupervised setting. In the supervised AD setting, WeakREST attains an AP of 87.6%, outperforming the previous best of 86.0%. Notably, even when using weaker annotations such as bounding boxes, WeakREST exceeds the performance of leading methods relying on pixel-wise supervision, achieving an AP of 87.1% compared to the prior best of 86.0% on MVTec-AD. This superior performance is consistently replicated across other well-established AD datasets, including MVTec 3D, KSDD2 and Real-IAD. Code is available at: <uri>https://github.com/BeJane/Semi_REST</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1551-1566"},"PeriodicalIF":13.7,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146115777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Unsupervised Ultrasonic Image Anomaly Detection via Frequency-Spatial Feature Filtering and Gaussian Mixture Modeling 基于频率-空间特征滤波和高斯混合建模的无监督超声图像异常检测
IF 13.7 Pub Date : 2026-02-04 DOI: 10.1109/TIP.2026.3659292
Wenjing Zhang;Ke Lu;Jinbao Wang;Hao Liang;Can Gao;Jian Xue
Ultrasonic image anomaly detection faces significant challenges due to limited labeled data, strong structural and random noise, and highly diverse defect manifestations. To overcome these obstacles, we introduce UltraChip, a new large-scale C-scan benchmark containing about 8,000 real-world images from various chip packaging types, each meticulously annotated with pixel-level masks for cracks, holes, and layers. Building on this resource, we present FSGM-Net, a fully unsupervised framework tailored for anomaly detection. FSGM-Net leverages an adaptive Frequency-Spatial feature filtering mechanism: a learnable FFT-Spatial patch filter first suppresses noise and dynamically assigns normality weights to Vision Transformer (ViT) patch features. Subsequently, an Adaptive Gaussian Mixture Model (Ada-GMM) captures the distribution of normal features and guides a deep–shallow multi-scale interaction decoder for accurate, pixel-level anomaly inference. In addition, we propose a filter loss that enforces encoder–filter consistency and entropy-based sparse gating, together with a distributional loss that encourages both feature reconstruction and confident Gaussian mixture modeling. Extensive experiments demonstrate that FSGM-Net not only achieves state-of-the-art results on UltraChip but also exhibits superior cross-domain generalization to MVTec-AD and VisA, while supporting real-time inference on a single GPU. Together, the dataset and framework advance robust, annotation-free ultrasonic NDT in practical applications. The UltraChip dataset can be obtained via https://iiplab.net/ultrachip/
超声图像异常检测由于标记数据有限、结构和随机噪声强、缺陷表现多样等特点,面临着很大的挑战。为了克服这些障碍,我们引入了UltraChip,这是一种新的大规模c扫描基准,包含来自各种芯片封装类型的约8,000张真实图像,每张图像都用像素级掩模对裂缝,孔和层进行了精心注释。在此基础上,我们提出了FSGM-Net,这是一个为异常检测量身定制的完全无监督框架。FSGM-Net利用自适应频率-空间特征滤波机制:可学习的fft -空间斑块滤波器首先抑制噪声,并动态地为视觉变压器(ViT)斑块特征分配正态权值。随后,自适应高斯混合模型(Ada-GMM)捕获正态特征的分布,并指导深浅多尺度相互作用解码器进行精确的像素级异常推断。此外,我们提出了一种滤波器损失,它强制编码器-滤波器一致性和基于熵的稀疏门控,以及一种分布损失,它鼓励特征重建和自信的高斯混合建模。大量实验表明,FSGM-Net不仅在UltraChip上取得了最先进的结果,而且在单个GPU上支持实时推理的同时,也表现出优于MVTec-AD和VisA的跨域泛化。总之,数据集和框架在实际应用中推进了鲁棒性,无注释的超声无损检测。UltraChip数据集可通过https://iiplab.net/ultrachip/获取
{"title":"Improving Unsupervised Ultrasonic Image Anomaly Detection via Frequency-Spatial Feature Filtering and Gaussian Mixture Modeling","authors":"Wenjing Zhang;Ke Lu;Jinbao Wang;Hao Liang;Can Gao;Jian Xue","doi":"10.1109/TIP.2026.3659292","DOIUrl":"10.1109/TIP.2026.3659292","url":null,"abstract":"Ultrasonic image anomaly detection faces significant challenges due to limited labeled data, strong structural and random noise, and highly diverse defect manifestations. To overcome these obstacles, we introduce UltraChip, a new large-scale C-scan benchmark containing about 8,000 real-world images from various chip packaging types, each meticulously annotated with pixel-level masks for cracks, holes, and layers. Building on this resource, we present FSGM-Net, a fully unsupervised framework tailored for anomaly detection. FSGM-Net leverages an adaptive Frequency-Spatial feature filtering mechanism: a learnable FFT-Spatial patch filter first suppresses noise and dynamically assigns normality weights to Vision Transformer (ViT) patch features. Subsequently, an Adaptive Gaussian Mixture Model (Ada-GMM) captures the distribution of normal features and guides a deep–shallow multi-scale interaction decoder for accurate, pixel-level anomaly inference. In addition, we propose a filter loss that enforces encoder–filter consistency and entropy-based sparse gating, together with a distributional loss that encourages both feature reconstruction and confident Gaussian mixture modeling. Extensive experiments demonstrate that FSGM-Net not only achieves state-of-the-art results on UltraChip but also exhibits superior cross-domain generalization to MVTec-AD and VisA, while supporting real-time inference on a single GPU. Together, the dataset and framework advance robust, annotation-free ultrasonic NDT in practical applications. The UltraChip dataset can be obtained via <uri>https://iiplab.net/ultrachip/</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1567-1581"},"PeriodicalIF":13.7,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146115780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attack-Augmented Mixing-Contrastive Skeletal Representation Learning 攻击增强混合对比骨架表征学习
IF 13.7 Pub Date : 2026-02-04 DOI: 10.1109/TIP.2026.3659331
Binqian Xu;Xiangbo Shu;Jiachao Zhang;Rui Yan;Guo-Sen Xie
Contrastive learning facilitates the acquisition of informative skeleton representations for unsupervised action recognition by leveraging effective positive and negative sample pairs. However, most existing methods construct these pairs through weak or strong data augmentations, which typically rely on random appearance alterations of skeletons. While such augmentations are somewhat effective, they introduce semantic variations only indirectly and face two inherent limitations. First, simply modifying the appearance of skeletons often fails to reflect meaningful semantic variations. Second, random perturbations can unintentionally blur the boundary between positive and negative pairs, weakening the contrastive objective. To address these challenges, we propose an attack-driven augmentation framework that explicitly introduces semantic-level perturbations. This approach facilitates the generation of hard positives while guiding the model to mine more informative hard negatives. Building on this idea, we present Attack-Augmented Mixing-Contrastive Skeletal Representation Learning (A2MC), a novel framework that focuses on contrasting hard positive and hard negative samples for more robust representation learning. Within A2MC, we design an Attack-Augmentation (Att-Aug) module that integrates both targeted (attack-based) and untargeted (augmentation-based) perturbations to generate informative hard positive samples. In parallel, we propose the Positive-Negative Mixer (PNM), which blends hard positive and negative features to synthesize challenging hard negatives. These are then used to update a mixed memory bank for more effective contrastive learning. Comprehensive evaluations across three public benchmarks demonstrate that our approach, termed A2MC, achieves performance on par with or exceeding existing state-of-the-art methods.
对比学习通过利用有效的正、负样本对,促进了无监督动作识别信息骨架表征的获取。然而,大多数现有方法通过弱或强数据增强来构建这些对,这通常依赖于骨骼的随机外观变化。虽然这种增强在某种程度上是有效的,但它们只是间接地引入了语义变化,并且面临两个固有的限制。首先,简单地修改骨架的外观往往不能反映有意义的语义变化。其次,随机扰动会无意中模糊正负对之间的界限,削弱对比目标。为了应对这些挑战,我们提出了一个攻击驱动的增强框架,该框架明确地引入了语义级扰动。这种方法有助于生成硬阳性,同时指导模型挖掘更多信息的硬阴性。基于这个想法,我们提出了攻击增强混合对比骨架表征学习(A2MC),这是一个新的框架,专注于对比硬正和硬负样本,以实现更稳健的表征学习。在A2MC中,我们设计了一个攻击增强(at - aug)模块,该模块集成了目标(基于攻击)和非目标(基于增强)扰动,以生成信息丰富的硬阳性样本。同时,我们提出了正负混合器(PNM),它混合了硬正负特征来合成具有挑战性的硬负极。然后用这些数据来更新混合记忆库,以实现更有效的对比学习。对三个公共基准的综合评估表明,我们的方法,称为A2MC,达到了与现有最先进的方法相当或超过的性能。
{"title":"Attack-Augmented Mixing-Contrastive Skeletal Representation Learning","authors":"Binqian Xu;Xiangbo Shu;Jiachao Zhang;Rui Yan;Guo-Sen Xie","doi":"10.1109/TIP.2026.3659331","DOIUrl":"10.1109/TIP.2026.3659331","url":null,"abstract":"Contrastive learning facilitates the acquisition of informative skeleton representations for unsupervised action recognition by leveraging effective positive and negative sample pairs. However, most existing methods construct these pairs through weak or strong data augmentations, which typically rely on random appearance alterations of skeletons. While such augmentations are somewhat effective, they introduce semantic variations only indirectly and face two inherent limitations. First, simply modifying the appearance of skeletons often fails to reflect meaningful semantic variations. Second, random perturbations can unintentionally blur the boundary between positive and negative pairs, weakening the contrastive objective. To address these challenges, we propose an attack-driven augmentation framework that explicitly introduces semantic-level perturbations. This approach facilitates the generation of hard positives while guiding the model to mine more informative hard negatives. Building on this idea, we present Attack-Augmented Mixing-Contrastive Skeletal Representation Learning (A2MC), a novel framework that focuses on contrasting hard positive and hard negative samples for more robust representation learning. Within A2MC, we design an Attack-Augmentation (Att-Aug) module that integrates both targeted (attack-based) and untargeted (augmentation-based) perturbations to generate informative hard positive samples. In parallel, we propose the Positive-Negative Mixer (PNM), which blends hard positive and negative features to synthesize challenging hard negatives. These are then used to update a mixed memory bank for more effective contrastive learning. Comprehensive evaluations across three public benchmarks demonstrate that our approach, termed A2MC, achieves performance on par with or exceeding existing state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1521-1534"},"PeriodicalIF":13.7,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146115781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complementary Mixture-of-Experts and Complementary Cross-Attention for Single Image Reflection Separation in the Wild 野外单幅图像反射分离的互补混合专家和互补交叉注意
IF 13.7 Pub Date : 2026-02-04 DOI: 10.1109/TIP.2026.3659334
Jonghyuk Park;Jae-Young Sim
Single Image Reflection Separation (SIRS) aims to reconstruct both the transmitted and reflected images from a single image that contains a superimposition of both, captured through a glass-like reflective surface. Recent learning-based methods of SIRS have significantly improved performance on typical images with mild reflection artifacts; however, they often struggle with diverse images containing challenging reflections captured in the wild. In this paper, we propose a universal SIRS framework based on a flexible dual-stream architecture, capable of handling diverse reflection artifacts. Specifically, we incorporate a Mixture-of-Experts mechanism that dynamically assigns specialized experts to image patches based on spatially heterogeneous reflection characteristics. The assigned experts then cooperate to extract complementary features between the transmission and reflection streams in an adaptive manner. In addition, we leverage the multi-head attention mechanism of Transformers to simultaneously exploit both high and low cross-correlations, which are then complementarily used to facilitate adaptive inter-stream feature interactions. Experimental results evaluated on diverse real-world datasets demonstrate that the proposed method significantly outperforms existing state-of-the-art methods qualitatively and quantitatively.
单图像反射分离(SIRS)旨在通过玻璃状反射表面捕获的包含两者叠加的单个图像重建透射和反射图像。最近基于学习的SIRS方法在具有轻微反射伪影的典型图像上显著提高了性能;然而,他们经常在各种各样的图像中挣扎,这些图像包含了在野外拍摄的具有挑战性的反射。在本文中,我们提出了一个基于灵活的双流架构的通用SIRS框架,能够处理各种反射工件。具体来说,我们结合了一个混合专家机制,根据空间异构反射特征动态分配专业专家到图像补丁。然后,指定的专家以自适应的方式合作提取传输流和反射流之间的互补特征。此外,我们利用变形金刚的多头注意机制同时利用高相关性和低相关性,然后互补用于促进自适应流间特征交互。在不同的真实世界数据集上评估的实验结果表明,所提出的方法在定性和定量上都明显优于现有的最先进的方法。
{"title":"Complementary Mixture-of-Experts and Complementary Cross-Attention for Single Image Reflection Separation in the Wild","authors":"Jonghyuk Park;Jae-Young Sim","doi":"10.1109/TIP.2026.3659334","DOIUrl":"10.1109/TIP.2026.3659334","url":null,"abstract":"Single Image Reflection Separation (SIRS) aims to reconstruct both the transmitted and reflected images from a single image that contains a superimposition of both, captured through a glass-like reflective surface. Recent learning-based methods of SIRS have significantly improved performance on typical images with mild reflection artifacts; however, they often struggle with diverse images containing challenging reflections captured in the wild. In this paper, we propose a universal SIRS framework based on a flexible dual-stream architecture, capable of handling diverse reflection artifacts. Specifically, we incorporate a Mixture-of-Experts mechanism that dynamically assigns specialized experts to image patches based on spatially heterogeneous reflection characteristics. The assigned experts then cooperate to extract complementary features between the transmission and reflection streams in an adaptive manner. In addition, we leverage the multi-head attention mechanism of Transformers to simultaneously exploit both high and low cross-correlations, which are then complementarily used to facilitate adaptive inter-stream feature interactions. Experimental results evaluated on diverse real-world datasets demonstrate that the proposed method significantly outperforms existing state-of-the-art methods qualitatively and quantitatively.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1607-1620"},"PeriodicalIF":13.7,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146115779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars avatar化妆:逼真的化妆转移3D动画头部头像
IF 13.7 Pub Date : 2026-02-03 DOI: 10.1109/TIP.2026.3657896
Yiming Zhong;Xiaolin Zhang;Ligang Liu;Yao Zhao;Yunchao Wei
Similar to facial beautification in real life, 3D virtual avatars require personalized customization to enhance their visual appeal, yet this area remains insufficiently explored. Although current 3D Gaussian editing methods can be adapted for facial makeup purposes, these methods fail to meet the fundamental requirements for achieving realistic makeup effects: 1) ensuring a consistent appearance during drivable expressions; 2) preserving the identity throughout the makeup process; and 3) enabling precise control over fine details. To address these, we propose a specialized 3D makeup method named AvatarMakeup, leveraging a pretrained diffusion model to transfer makeup patterns from a single reference photo of any individual. We adopt a coarse-to-fine idea to first maintain the consistent appearance and identity, and then to refine the details. In particular, the diffusion model is employed to generate makeup images as supervision. Due to the uncertainties in diffusion process, the generated images are inconsistent across different viewpoints and expressions. Therefore, we propose a Coherent Duplication method to coarsely apply makeup to the target while ensuring consistency across dynamic and multi-view effects. Coherent Duplication optimizes a global UV map by recoding the averaged facial attributes among the generated makeup images. By querying the global UV map, it easily synthesizes coherent makeup guidance from arbitrary views and expressions to optimize the target avatar. Given the coarse makeup avatar, we further enhance the makeup by incorporating a Refinement Module into the diffusion model to achieve high makeup quality. Experiments demonstrate that AvatarMakeup achieves state-of-the-art makeup transfer quality and consistency throughout animation.
与现实生活中的面部美化一样,3D虚拟化身也需要个性化定制来增强其视觉吸引力,但这一领域的探索还不够充分。虽然目前的3D高斯编辑方法可以适用于面部化妆,但这些方法无法满足实现逼真化妆效果的基本要求:1)确保在驾驶表情时保持一致的外观;2)在整个化妆过程中保持身份;3)实现对细节的精确控制。为了解决这些问题,我们提出了一种专门的3D化妆方法,名为AvatarMakeup,利用预训练的扩散模型从任何个人的单个参考照片中转移化妆模式。我们采用从粗到精的思路,首先保持外观和身份的一致性,然后细化细节。特别地,扩散模型被用于生成化妆图像作为监督。由于扩散过程中的不确定性,不同视点和表达方式生成的图像不一致。因此,我们提出了一种相干复制方法,在保证动态和多视图效果的一致性的同时,对目标进行粗略的化妆。相干复制通过在生成的化妆图像中重新编码平均面部属性来优化全局UV地图。通过查询全局UV图,可以轻松地从任意视图和表情中合成连贯的化妆指导,从而优化目标人物。考虑到粗糙的化妆化身,我们通过在扩散模型中加入一个细化模块来进一步增强化妆,以实现高质量的化妆。实验证明,AvatarMakeup在整个动画中实现了最先进的化妆转移质量和一致性。
{"title":"AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars","authors":"Yiming Zhong;Xiaolin Zhang;Ligang Liu;Yao Zhao;Yunchao Wei","doi":"10.1109/TIP.2026.3657896","DOIUrl":"10.1109/TIP.2026.3657896","url":null,"abstract":"Similar to facial beautification in real life, 3D virtual avatars require personalized customization to enhance their visual appeal, yet this area remains insufficiently explored. Although current 3D Gaussian editing methods can be adapted for facial makeup purposes, these methods fail to meet the fundamental requirements for achieving realistic makeup effects: 1) ensuring a consistent appearance during drivable expressions; 2) preserving the identity throughout the makeup process; and 3) enabling precise control over fine details. To address these, we propose a specialized 3D makeup method named AvatarMakeup, leveraging a pretrained diffusion model to transfer makeup patterns from a single reference photo of any individual. We adopt a coarse-to-fine idea to first maintain the consistent appearance and identity, and then to refine the details. In particular, the diffusion model is employed to generate makeup images as supervision. Due to the uncertainties in diffusion process, the generated images are inconsistent across different viewpoints and expressions. Therefore, we propose a Coherent Duplication method to coarsely apply makeup to the target while ensuring consistency across dynamic and multi-view effects. Coherent Duplication optimizes a global UV map by recoding the averaged facial attributes among the generated makeup images. By querying the global UV map, it easily synthesizes coherent makeup guidance from arbitrary views and expressions to optimize the target avatar. Given the coarse makeup avatar, we further enhance the makeup by incorporating a Refinement Module into the diffusion model to achieve high makeup quality. Experiments demonstrate that AvatarMakeup achieves state-of-the-art makeup transfer quality and consistency throughout animation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1436-1447"},"PeriodicalIF":13.7,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MambaFedCD: Spatial–Spectral–Temporal Collaborative Mamba-Based Active Federated Hyperspectral Change Detection MambaFedCD:空间-光谱-时间协同基于mamba的主动联邦高光谱变化检测
IF 13.7 Pub Date : 2026-02-03 DOI: 10.1109/TIP.2026.3658212
Jiahui Qu;Jingyu Zhao;Wenqian Dong;Lijian Zhang;Yunsong Li
Hyperspectral image (HSI) change detection is a technique that can identify the changes occurring between the bitemporal HSIs covering the same geographic area. The field of change detection has witnessed the proposal and successful implementation of numerous methods. However, a majority of these approaches adhere to the centralized learning paradigm, which requires data transmission to a central server for training. The sensitivity of remote sensing data generally prohibit their sharing across different clients. Furthermore, manual labeling is a costly effort in practically. In this paper, we propose a spatial-spectral-temporal collaborative Mamba-based active federated hyperspectral change detection (MambaFedCD) framework, which utilizes the limited labeled samples from multiple clients to achieve change detection while ensuring the data privacy of each client. Specifically, there are three key characteristics: 1) a spatial-spectral-temporal collaborative Mamba-based change detection ( ${{text {S}}^{2}}{text {TMamba}}$ ) model is proposed to efficiently synergize the temporal and global spatial-spectral information of the bitemporal HSIs for change detection; 2) a difference feature diversity correction-based model aggregation (DFDCMA) strategy is devised to incorporate the diversity of difference features for rational allocation of weight factors among clients and to facilitate effective aggregation of the global model; 3) we propose a multi-decision federated active learning (MDFAL) strategy that selects both error-prone and valuable samples for model training to alleviate the burden of sample labeling. Comprehensive experiments conducted on commonly utilized datasets demonstrate that the proposed method outperforms other state-of-the-art methods. The code is available at https://github.com/Jiahuiqu/MambaFedCD
高光谱图像(HSI)变化检测是一种能够识别覆盖同一地理区域的双时间高光谱图像之间变化的技术。变化检测领域已经见证了许多方法的提出和成功实施。然而,这些方法中的大多数都坚持集中式学习范式,这需要将数据传输到中央服务器进行训练。遥感数据的敏感性一般禁止在不同客户端之间共享这些数据。此外,手工标签在实际操作中是一项昂贵的工作。本文提出了一种基于空-谱-时协同mamba的主动联邦高光谱变化检测框架(MambaFedCD),该框架利用来自多个客户端的有限标记样本实现变化检测,同时保证了每个客户端的数据隐私。具体而言,有三个关键特征:1)提出了一种基于空间-光谱-时间协同的曼巴变化检测模型(${text {S}}^{2}}{text {TMamba}}$),有效地协同双时相hsi的时间和全局空间-光谱信息进行变化检测;2)设计了基于差异特征多样性校正的模型聚合(DFDCMA)策略,将差异特征的多样性纳入到客户间权重因子的合理分配中,促进全局模型的有效聚合;3)提出了一种多决策联邦主动学习(MDFAL)策略,该策略选择易出错和有价值的样本进行模型训练,以减轻样本标记的负担。在常用数据集上进行的综合实验表明,该方法优于其他最先进的方法。代码可在https://github.com/Jiahuiqu/MambaFedCD上获得
{"title":"MambaFedCD: Spatial–Spectral–Temporal Collaborative Mamba-Based Active Federated Hyperspectral Change Detection","authors":"Jiahui Qu;Jingyu Zhao;Wenqian Dong;Lijian Zhang;Yunsong Li","doi":"10.1109/TIP.2026.3658212","DOIUrl":"10.1109/TIP.2026.3658212","url":null,"abstract":"Hyperspectral image (HSI) change detection is a technique that can identify the changes occurring between the bitemporal HSIs covering the same geographic area. The field of change detection has witnessed the proposal and successful implementation of numerous methods. However, a majority of these approaches adhere to the centralized learning paradigm, which requires data transmission to a central server for training. The sensitivity of remote sensing data generally prohibit their sharing across different clients. Furthermore, manual labeling is a costly effort in practically. In this paper, we propose a spatial-spectral-temporal collaborative Mamba-based active federated hyperspectral change detection (MambaFedCD) framework, which utilizes the limited labeled samples from multiple clients to achieve change detection while ensuring the data privacy of each client. Specifically, there are three key characteristics: 1) a spatial-spectral-temporal collaborative Mamba-based change detection (<inline-formula> <tex-math>${{text {S}}^{2}}{text {TMamba}}$ </tex-math></inline-formula>) model is proposed to efficiently synergize the temporal and global spatial-spectral information of the bitemporal HSIs for change detection; 2) a difference feature diversity correction-based model aggregation (DFDCMA) strategy is devised to incorporate the diversity of difference features for rational allocation of weight factors among clients and to facilitate effective aggregation of the global model; 3) we propose a multi-decision federated active learning (MDFAL) strategy that selects both error-prone and valuable samples for model training to alleviate the burden of sample labeling. Comprehensive experiments conducted on commonly utilized datasets demonstrate that the proposed method outperforms other state-of-the-art methods. The code is available at <uri>https://github.com/Jiahuiqu/MambaFedCD</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1478-1492"},"PeriodicalIF":13.7,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating Uncertainty-Guided and Top-k Codebook Matching for Real-World Blind Image Super-Resolution 结合不确定性引导和Top-k码本匹配的真实世界盲图像超分辨率
IF 13.7 Pub Date : 2026-02-03 DOI: 10.1109/TIP.2026.3653547
Weilei Wen;Tianyi Zhang;Qianqian Zhao;Zhaohui Zheng;Chunle Guo;Xiuli Shao;Chongyi Li
Recent advancements in codebook-based real image super-resolution (SR) have shown promising results in real-world applications. The core idea involves matching high-quality image features from a codebook based on low-resolution (LR) image features. However, existing methods face two major challenges: inaccurate feature matching with the codebook and poor texture detail reconstruction. To address these issues, we propose a novel Uncertainty-Guided and Top-k Codebook Matching SR (UGTSR) framework, which incorporates three key components: 1) an uncertainty learning mechanism that guides the model to focus on texture-rich regions, 2) a Top-k feature matching strategy that enhances feature matching accuracy by fusing multiple candidate features, and 3) an Align-Attention module that enhances the alignment of information between LR and HR features. Experimental results demonstrate significant improvements in texture realism and reconstruction fidelity compared to existing methods. The source code can be found at https://github.com/wwlCape/UGTSR-main
基于码本的实像超分辨率(SR)技术的最新进展已经在实际应用中显示出良好的效果。其核心思想是基于低分辨率(LR)图像特征匹配码本中的高质量图像特征。然而,现有方法面临着两个主要挑战:特征与码本匹配不准确和纹理细节重建不理想。为了解决这些问题,我们提出了一种新的不确定性引导和Top-k码本匹配SR (UGTSR)框架,该框架包含三个关键组件:1)不确定性学习机制,引导模型关注纹理丰富的区域;2)Top-k特征匹配策略,通过融合多个候选特征来提高特征匹配精度;3)对齐-注意模块,增强LR和HR特征之间的信息对齐。实验结果表明,与现有方法相比,该方法在纹理真实感和重建保真度方面有显著提高。源代码可以在https://github.com/wwlCape/UGTSR-main上找到
{"title":"Incorporating Uncertainty-Guided and Top-k Codebook Matching for Real-World Blind Image Super-Resolution","authors":"Weilei Wen;Tianyi Zhang;Qianqian Zhao;Zhaohui Zheng;Chunle Guo;Xiuli Shao;Chongyi Li","doi":"10.1109/TIP.2026.3653547","DOIUrl":"10.1109/TIP.2026.3653547","url":null,"abstract":"Recent advancements in codebook-based real image super-resolution (SR) have shown promising results in real-world applications. The core idea involves matching high-quality image features from a codebook based on low-resolution (LR) image features. However, existing methods face two major challenges: inaccurate feature matching with the codebook and poor texture detail reconstruction. To address these issues, we propose a novel Uncertainty-Guided and Top-k Codebook Matching SR (UGTSR) framework, which incorporates three key components: 1) an uncertainty learning mechanism that guides the model to focus on texture-rich regions, 2) a Top-k feature matching strategy that enhances feature matching accuracy by fusing multiple candidate features, and 3) an Align-Attention module that enhances the alignment of information between LR and HR features. Experimental results demonstrate significant improvements in texture realism and reconstruction fidelity compared to existing methods. The source code can be found at <uri>https://github.com/wwlCape/UGTSR-main</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1535-1550"},"PeriodicalIF":13.7,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SuperCL: Superpixel Guided Contrastive Learning for Medical Image Segmentation Pre-Training SuperCL:用于医学图像分割预训练的超像素引导对比学习
IF 13.7 Pub Date : 2026-02-03 DOI: 10.1109/TIP.2026.3657233
Shuang Zeng;Lei Zhu;Xinliang Zhang;Hangzhou He;Yanye Lu
Medical image segmentation is a critical yet challenging task, primarily due to the difficulty of obtaining extensive datasets of high-quality, expert-annotated images. Contrastive learning presents a potential but still problematic solution to this issue. Because most existing methods focus on extracting instance-level or pixel-to-pixel representation, which ignores the characteristics between intra-image similar pixel groups. Moreover, when considering contrastive pairs generation, most SOTA methods mainly rely on manually setting thresholds, which requires a large number of gradient experiments and lacks efficiency and generalization. To address these issues, we propose a novel contrastive learning approach named SuperCL for medical image segmentation pre-training. Specifically, our SuperCL exploits the structural prior and pixel correlation of images by introducing two novel contrastive pairs generation strategies: Intra-image Local Contrastive Pairs (ILCP) Generation and Inter-image Global Contrastive Pairs (IGCP) Generation. Considering superpixel cluster aligns well with the concept of contrastive pairs generation, we utilize the superpixel map to generate pseudo masks for both ILCP and IGCP to guide supervised contrastive learning. Moreover, we also propose two modules named Average SuperPixel Feature Map Generation (ASP) and Connected Components Label Generation (CCL) to better exploit the prior structural information for IGCP. Finally, experiments on 8 medical image datasets indicate our SuperCL outperforms existing 12 methods. i.e. Our SuperCL achieves a superior performance with more precise predictions from visualization figures and 3.15%, 5.44%, 7.89% DSC higher than the previous best results on MMWHS, CHAOS, Spleen with 10% annotations. Our code is released at https://github.com/stevezs315/SuperCL
医学图像分割是一项关键但具有挑战性的任务,主要是因为难以获得大量高质量、专家注释的图像数据集。对比学习为这一问题提供了一个有潜力但仍有问题的解决方案。因为大多数现有的方法侧重于提取实例级或像素到像素的表示,而忽略了图像内相似像素组之间的特征。此外,在考虑对比对生成时,大多数SOTA方法主要依赖于手动设置阈值,这需要大量的梯度实验,缺乏效率和泛化。为了解决这些问题,我们提出了一种新的对比学习方法SuperCL用于医学图像分割的预训练。具体来说,SuperCL通过引入两种新的对比对生成策略:图像内局部对比对(ILCP)生成和图像间全局对比对(IGCP)生成,利用了图像的结构先验和像素相关性。考虑到超像素聚类与对比对生成的概念很好地一致,我们利用超像素映射为ILCP和IGCP生成伪掩码,以指导监督对比学习。此外,我们还提出了平均超像素特征图生成(ASP)和连接组件标签生成(CCL)两个模块,以更好地利用IGCP的先验结构信息。最后,在8个医学图像数据集上的实验表明,SuperCL优于现有的12种方法。我们的SuperCL在可视化图上的预测更加精确,DSC比之前在MMWHS、CHAOS、脾脏上的最佳结果高出3.15%、5.44%和7.89%,并且添加了10%的注释。我们的代码发布在https://github.com/stevezs315/SuperCL
{"title":"SuperCL: Superpixel Guided Contrastive Learning for Medical Image Segmentation Pre-Training","authors":"Shuang Zeng;Lei Zhu;Xinliang Zhang;Hangzhou He;Yanye Lu","doi":"10.1109/TIP.2026.3657233","DOIUrl":"10.1109/TIP.2026.3657233","url":null,"abstract":"Medical image segmentation is a critical yet challenging task, primarily due to the difficulty of obtaining extensive datasets of high-quality, expert-annotated images. Contrastive learning presents a potential but still problematic solution to this issue. Because most existing methods focus on extracting instance-level or pixel-to-pixel representation, which ignores the characteristics between intra-image similar pixel groups. Moreover, when considering contrastive pairs generation, most SOTA methods mainly rely on manually setting thresholds, which requires a large number of gradient experiments and lacks efficiency and generalization. To address these issues, we propose a novel contrastive learning approach named SuperCL for medical image segmentation pre-training. Specifically, our SuperCL exploits the structural prior and pixel correlation of images by introducing two novel contrastive pairs generation strategies: Intra-image Local Contrastive Pairs (ILCP) Generation and Inter-image Global Contrastive Pairs (IGCP) Generation. Considering superpixel cluster aligns well with the concept of contrastive pairs generation, we utilize the superpixel map to generate pseudo masks for both ILCP and IGCP to guide supervised contrastive learning. Moreover, we also propose two modules named Average SuperPixel Feature Map Generation (ASP) and Connected Components Label Generation (CCL) to better exploit the prior structural information for IGCP. Finally, experiments on 8 medical image datasets indicate our SuperCL outperforms existing 12 methods. i.e. Our SuperCL achieves a superior performance with more precise predictions from visualization figures and 3.15%, 5.44%, 7.89% DSC higher than the previous best results on MMWHS, CHAOS, Spleen with 10% annotations. Our code is released at <uri>https://github.com/stevezs315/SuperCL</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1636-1651"},"PeriodicalIF":13.7,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1