首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Complementary Mixture-of-Experts and Complementary Cross-Attention for Single Image Reflection Separation in the Wild 野外单幅图像反射分离的互补混合专家和互补交叉注意
IF 13.7 Pub Date : 2026-02-04 DOI: 10.1109/TIP.2026.3659334
Jonghyuk Park;Jae-Young Sim
Single Image Reflection Separation (SIRS) aims to reconstruct both the transmitted and reflected images from a single image that contains a superimposition of both, captured through a glass-like reflective surface. Recent learning-based methods of SIRS have significantly improved performance on typical images with mild reflection artifacts; however, they often struggle with diverse images containing challenging reflections captured in the wild. In this paper, we propose a universal SIRS framework based on a flexible dual-stream architecture, capable of handling diverse reflection artifacts. Specifically, we incorporate a Mixture-of-Experts mechanism that dynamically assigns specialized experts to image patches based on spatially heterogeneous reflection characteristics. The assigned experts then cooperate to extract complementary features between the transmission and reflection streams in an adaptive manner. In addition, we leverage the multi-head attention mechanism of Transformers to simultaneously exploit both high and low cross-correlations, which are then complementarily used to facilitate adaptive inter-stream feature interactions. Experimental results evaluated on diverse real-world datasets demonstrate that the proposed method significantly outperforms existing state-of-the-art methods qualitatively and quantitatively.
单图像反射分离(SIRS)旨在通过玻璃状反射表面捕获的包含两者叠加的单个图像重建透射和反射图像。最近基于学习的SIRS方法在具有轻微反射伪影的典型图像上显著提高了性能;然而,他们经常在各种各样的图像中挣扎,这些图像包含了在野外拍摄的具有挑战性的反射。在本文中,我们提出了一个基于灵活的双流架构的通用SIRS框架,能够处理各种反射工件。具体来说,我们结合了一个混合专家机制,根据空间异构反射特征动态分配专业专家到图像补丁。然后,指定的专家以自适应的方式合作提取传输流和反射流之间的互补特征。此外,我们利用变形金刚的多头注意机制同时利用高相关性和低相关性,然后互补用于促进自适应流间特征交互。在不同的真实世界数据集上评估的实验结果表明,所提出的方法在定性和定量上都明显优于现有的最先进的方法。
{"title":"Complementary Mixture-of-Experts and Complementary Cross-Attention for Single Image Reflection Separation in the Wild","authors":"Jonghyuk Park;Jae-Young Sim","doi":"10.1109/TIP.2026.3659334","DOIUrl":"10.1109/TIP.2026.3659334","url":null,"abstract":"Single Image Reflection Separation (SIRS) aims to reconstruct both the transmitted and reflected images from a single image that contains a superimposition of both, captured through a glass-like reflective surface. Recent learning-based methods of SIRS have significantly improved performance on typical images with mild reflection artifacts; however, they often struggle with diverse images containing challenging reflections captured in the wild. In this paper, we propose a universal SIRS framework based on a flexible dual-stream architecture, capable of handling diverse reflection artifacts. Specifically, we incorporate a Mixture-of-Experts mechanism that dynamically assigns specialized experts to image patches based on spatially heterogeneous reflection characteristics. The assigned experts then cooperate to extract complementary features between the transmission and reflection streams in an adaptive manner. In addition, we leverage the multi-head attention mechanism of Transformers to simultaneously exploit both high and low cross-correlations, which are then complementarily used to facilitate adaptive inter-stream feature interactions. Experimental results evaluated on diverse real-world datasets demonstrate that the proposed method significantly outperforms existing state-of-the-art methods qualitatively and quantitatively.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1607-1620"},"PeriodicalIF":13.7,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146115779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars avatar化妆:逼真的化妆转移3D动画头部头像
IF 13.7 Pub Date : 2026-02-03 DOI: 10.1109/TIP.2026.3657896
Yiming Zhong;Xiaolin Zhang;Ligang Liu;Yao Zhao;Yunchao Wei
Similar to facial beautification in real life, 3D virtual avatars require personalized customization to enhance their visual appeal, yet this area remains insufficiently explored. Although current 3D Gaussian editing methods can be adapted for facial makeup purposes, these methods fail to meet the fundamental requirements for achieving realistic makeup effects: 1) ensuring a consistent appearance during drivable expressions; 2) preserving the identity throughout the makeup process; and 3) enabling precise control over fine details. To address these, we propose a specialized 3D makeup method named AvatarMakeup, leveraging a pretrained diffusion model to transfer makeup patterns from a single reference photo of any individual. We adopt a coarse-to-fine idea to first maintain the consistent appearance and identity, and then to refine the details. In particular, the diffusion model is employed to generate makeup images as supervision. Due to the uncertainties in diffusion process, the generated images are inconsistent across different viewpoints and expressions. Therefore, we propose a Coherent Duplication method to coarsely apply makeup to the target while ensuring consistency across dynamic and multi-view effects. Coherent Duplication optimizes a global UV map by recoding the averaged facial attributes among the generated makeup images. By querying the global UV map, it easily synthesizes coherent makeup guidance from arbitrary views and expressions to optimize the target avatar. Given the coarse makeup avatar, we further enhance the makeup by incorporating a Refinement Module into the diffusion model to achieve high makeup quality. Experiments demonstrate that AvatarMakeup achieves state-of-the-art makeup transfer quality and consistency throughout animation.
与现实生活中的面部美化一样,3D虚拟化身也需要个性化定制来增强其视觉吸引力,但这一领域的探索还不够充分。虽然目前的3D高斯编辑方法可以适用于面部化妆,但这些方法无法满足实现逼真化妆效果的基本要求:1)确保在驾驶表情时保持一致的外观;2)在整个化妆过程中保持身份;3)实现对细节的精确控制。为了解决这些问题,我们提出了一种专门的3D化妆方法,名为AvatarMakeup,利用预训练的扩散模型从任何个人的单个参考照片中转移化妆模式。我们采用从粗到精的思路,首先保持外观和身份的一致性,然后细化细节。特别地,扩散模型被用于生成化妆图像作为监督。由于扩散过程中的不确定性,不同视点和表达方式生成的图像不一致。因此,我们提出了一种相干复制方法,在保证动态和多视图效果的一致性的同时,对目标进行粗略的化妆。相干复制通过在生成的化妆图像中重新编码平均面部属性来优化全局UV地图。通过查询全局UV图,可以轻松地从任意视图和表情中合成连贯的化妆指导,从而优化目标人物。考虑到粗糙的化妆化身,我们通过在扩散模型中加入一个细化模块来进一步增强化妆,以实现高质量的化妆。实验证明,AvatarMakeup在整个动画中实现了最先进的化妆转移质量和一致性。
{"title":"AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars","authors":"Yiming Zhong;Xiaolin Zhang;Ligang Liu;Yao Zhao;Yunchao Wei","doi":"10.1109/TIP.2026.3657896","DOIUrl":"10.1109/TIP.2026.3657896","url":null,"abstract":"Similar to facial beautification in real life, 3D virtual avatars require personalized customization to enhance their visual appeal, yet this area remains insufficiently explored. Although current 3D Gaussian editing methods can be adapted for facial makeup purposes, these methods fail to meet the fundamental requirements for achieving realistic makeup effects: 1) ensuring a consistent appearance during drivable expressions; 2) preserving the identity throughout the makeup process; and 3) enabling precise control over fine details. To address these, we propose a specialized 3D makeup method named AvatarMakeup, leveraging a pretrained diffusion model to transfer makeup patterns from a single reference photo of any individual. We adopt a coarse-to-fine idea to first maintain the consistent appearance and identity, and then to refine the details. In particular, the diffusion model is employed to generate makeup images as supervision. Due to the uncertainties in diffusion process, the generated images are inconsistent across different viewpoints and expressions. Therefore, we propose a Coherent Duplication method to coarsely apply makeup to the target while ensuring consistency across dynamic and multi-view effects. Coherent Duplication optimizes a global UV map by recoding the averaged facial attributes among the generated makeup images. By querying the global UV map, it easily synthesizes coherent makeup guidance from arbitrary views and expressions to optimize the target avatar. Given the coarse makeup avatar, we further enhance the makeup by incorporating a Refinement Module into the diffusion model to achieve high makeup quality. Experiments demonstrate that AvatarMakeup achieves state-of-the-art makeup transfer quality and consistency throughout animation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1436-1447"},"PeriodicalIF":13.7,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MambaFedCD: Spatial–Spectral–Temporal Collaborative Mamba-Based Active Federated Hyperspectral Change Detection MambaFedCD:空间-光谱-时间协同基于mamba的主动联邦高光谱变化检测
IF 13.7 Pub Date : 2026-02-03 DOI: 10.1109/TIP.2026.3658212
Jiahui Qu;Jingyu Zhao;Wenqian Dong;Lijian Zhang;Yunsong Li
Hyperspectral image (HSI) change detection is a technique that can identify the changes occurring between the bitemporal HSIs covering the same geographic area. The field of change detection has witnessed the proposal and successful implementation of numerous methods. However, a majority of these approaches adhere to the centralized learning paradigm, which requires data transmission to a central server for training. The sensitivity of remote sensing data generally prohibit their sharing across different clients. Furthermore, manual labeling is a costly effort in practically. In this paper, we propose a spatial-spectral-temporal collaborative Mamba-based active federated hyperspectral change detection (MambaFedCD) framework, which utilizes the limited labeled samples from multiple clients to achieve change detection while ensuring the data privacy of each client. Specifically, there are three key characteristics: 1) a spatial-spectral-temporal collaborative Mamba-based change detection ( ${{text {S}}^{2}}{text {TMamba}}$ ) model is proposed to efficiently synergize the temporal and global spatial-spectral information of the bitemporal HSIs for change detection; 2) a difference feature diversity correction-based model aggregation (DFDCMA) strategy is devised to incorporate the diversity of difference features for rational allocation of weight factors among clients and to facilitate effective aggregation of the global model; 3) we propose a multi-decision federated active learning (MDFAL) strategy that selects both error-prone and valuable samples for model training to alleviate the burden of sample labeling. Comprehensive experiments conducted on commonly utilized datasets demonstrate that the proposed method outperforms other state-of-the-art methods. The code is available at https://github.com/Jiahuiqu/MambaFedCD
高光谱图像(HSI)变化检测是一种能够识别覆盖同一地理区域的双时间高光谱图像之间变化的技术。变化检测领域已经见证了许多方法的提出和成功实施。然而,这些方法中的大多数都坚持集中式学习范式,这需要将数据传输到中央服务器进行训练。遥感数据的敏感性一般禁止在不同客户端之间共享这些数据。此外,手工标签在实际操作中是一项昂贵的工作。本文提出了一种基于空-谱-时协同mamba的主动联邦高光谱变化检测框架(MambaFedCD),该框架利用来自多个客户端的有限标记样本实现变化检测,同时保证了每个客户端的数据隐私。具体而言,有三个关键特征:1)提出了一种基于空间-光谱-时间协同的曼巴变化检测模型(${text {S}}^{2}}{text {TMamba}}$),有效地协同双时相hsi的时间和全局空间-光谱信息进行变化检测;2)设计了基于差异特征多样性校正的模型聚合(DFDCMA)策略,将差异特征的多样性纳入到客户间权重因子的合理分配中,促进全局模型的有效聚合;3)提出了一种多决策联邦主动学习(MDFAL)策略,该策略选择易出错和有价值的样本进行模型训练,以减轻样本标记的负担。在常用数据集上进行的综合实验表明,该方法优于其他最先进的方法。代码可在https://github.com/Jiahuiqu/MambaFedCD上获得
{"title":"MambaFedCD: Spatial–Spectral–Temporal Collaborative Mamba-Based Active Federated Hyperspectral Change Detection","authors":"Jiahui Qu;Jingyu Zhao;Wenqian Dong;Lijian Zhang;Yunsong Li","doi":"10.1109/TIP.2026.3658212","DOIUrl":"10.1109/TIP.2026.3658212","url":null,"abstract":"Hyperspectral image (HSI) change detection is a technique that can identify the changes occurring between the bitemporal HSIs covering the same geographic area. The field of change detection has witnessed the proposal and successful implementation of numerous methods. However, a majority of these approaches adhere to the centralized learning paradigm, which requires data transmission to a central server for training. The sensitivity of remote sensing data generally prohibit their sharing across different clients. Furthermore, manual labeling is a costly effort in practically. In this paper, we propose a spatial-spectral-temporal collaborative Mamba-based active federated hyperspectral change detection (MambaFedCD) framework, which utilizes the limited labeled samples from multiple clients to achieve change detection while ensuring the data privacy of each client. Specifically, there are three key characteristics: 1) a spatial-spectral-temporal collaborative Mamba-based change detection (<inline-formula> <tex-math>${{text {S}}^{2}}{text {TMamba}}$ </tex-math></inline-formula>) model is proposed to efficiently synergize the temporal and global spatial-spectral information of the bitemporal HSIs for change detection; 2) a difference feature diversity correction-based model aggregation (DFDCMA) strategy is devised to incorporate the diversity of difference features for rational allocation of weight factors among clients and to facilitate effective aggregation of the global model; 3) we propose a multi-decision federated active learning (MDFAL) strategy that selects both error-prone and valuable samples for model training to alleviate the burden of sample labeling. Comprehensive experiments conducted on commonly utilized datasets demonstrate that the proposed method outperforms other state-of-the-art methods. The code is available at <uri>https://github.com/Jiahuiqu/MambaFedCD</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1478-1492"},"PeriodicalIF":13.7,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating Uncertainty-Guided and Top-k Codebook Matching for Real-World Blind Image Super-Resolution 结合不确定性引导和Top-k码本匹配的真实世界盲图像超分辨率
IF 13.7 Pub Date : 2026-02-03 DOI: 10.1109/TIP.2026.3653547
Weilei Wen;Tianyi Zhang;Qianqian Zhao;Zhaohui Zheng;Chunle Guo;Xiuli Shao;Chongyi Li
Recent advancements in codebook-based real image super-resolution (SR) have shown promising results in real-world applications. The core idea involves matching high-quality image features from a codebook based on low-resolution (LR) image features. However, existing methods face two major challenges: inaccurate feature matching with the codebook and poor texture detail reconstruction. To address these issues, we propose a novel Uncertainty-Guided and Top-k Codebook Matching SR (UGTSR) framework, which incorporates three key components: 1) an uncertainty learning mechanism that guides the model to focus on texture-rich regions, 2) a Top-k feature matching strategy that enhances feature matching accuracy by fusing multiple candidate features, and 3) an Align-Attention module that enhances the alignment of information between LR and HR features. Experimental results demonstrate significant improvements in texture realism and reconstruction fidelity compared to existing methods. The source code can be found at https://github.com/wwlCape/UGTSR-main
基于码本的实像超分辨率(SR)技术的最新进展已经在实际应用中显示出良好的效果。其核心思想是基于低分辨率(LR)图像特征匹配码本中的高质量图像特征。然而,现有方法面临着两个主要挑战:特征与码本匹配不准确和纹理细节重建不理想。为了解决这些问题,我们提出了一种新的不确定性引导和Top-k码本匹配SR (UGTSR)框架,该框架包含三个关键组件:1)不确定性学习机制,引导模型关注纹理丰富的区域;2)Top-k特征匹配策略,通过融合多个候选特征来提高特征匹配精度;3)对齐-注意模块,增强LR和HR特征之间的信息对齐。实验结果表明,与现有方法相比,该方法在纹理真实感和重建保真度方面有显著提高。源代码可以在https://github.com/wwlCape/UGTSR-main上找到
{"title":"Incorporating Uncertainty-Guided and Top-k Codebook Matching for Real-World Blind Image Super-Resolution","authors":"Weilei Wen;Tianyi Zhang;Qianqian Zhao;Zhaohui Zheng;Chunle Guo;Xiuli Shao;Chongyi Li","doi":"10.1109/TIP.2026.3653547","DOIUrl":"10.1109/TIP.2026.3653547","url":null,"abstract":"Recent advancements in codebook-based real image super-resolution (SR) have shown promising results in real-world applications. The core idea involves matching high-quality image features from a codebook based on low-resolution (LR) image features. However, existing methods face two major challenges: inaccurate feature matching with the codebook and poor texture detail reconstruction. To address these issues, we propose a novel Uncertainty-Guided and Top-k Codebook Matching SR (UGTSR) framework, which incorporates three key components: 1) an uncertainty learning mechanism that guides the model to focus on texture-rich regions, 2) a Top-k feature matching strategy that enhances feature matching accuracy by fusing multiple candidate features, and 3) an Align-Attention module that enhances the alignment of information between LR and HR features. Experimental results demonstrate significant improvements in texture realism and reconstruction fidelity compared to existing methods. The source code can be found at <uri>https://github.com/wwlCape/UGTSR-main</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1535-1550"},"PeriodicalIF":13.7,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SuperCL: Superpixel Guided Contrastive Learning for Medical Image Segmentation Pre-Training SuperCL:用于医学图像分割预训练的超像素引导对比学习
IF 13.7 Pub Date : 2026-02-03 DOI: 10.1109/TIP.2026.3657233
Shuang Zeng;Lei Zhu;Xinliang Zhang;Hangzhou He;Yanye Lu
Medical image segmentation is a critical yet challenging task, primarily due to the difficulty of obtaining extensive datasets of high-quality, expert-annotated images. Contrastive learning presents a potential but still problematic solution to this issue. Because most existing methods focus on extracting instance-level or pixel-to-pixel representation, which ignores the characteristics between intra-image similar pixel groups. Moreover, when considering contrastive pairs generation, most SOTA methods mainly rely on manually setting thresholds, which requires a large number of gradient experiments and lacks efficiency and generalization. To address these issues, we propose a novel contrastive learning approach named SuperCL for medical image segmentation pre-training. Specifically, our SuperCL exploits the structural prior and pixel correlation of images by introducing two novel contrastive pairs generation strategies: Intra-image Local Contrastive Pairs (ILCP) Generation and Inter-image Global Contrastive Pairs (IGCP) Generation. Considering superpixel cluster aligns well with the concept of contrastive pairs generation, we utilize the superpixel map to generate pseudo masks for both ILCP and IGCP to guide supervised contrastive learning. Moreover, we also propose two modules named Average SuperPixel Feature Map Generation (ASP) and Connected Components Label Generation (CCL) to better exploit the prior structural information for IGCP. Finally, experiments on 8 medical image datasets indicate our SuperCL outperforms existing 12 methods. i.e. Our SuperCL achieves a superior performance with more precise predictions from visualization figures and 3.15%, 5.44%, 7.89% DSC higher than the previous best results on MMWHS, CHAOS, Spleen with 10% annotations. Our code is released at https://github.com/stevezs315/SuperCL
医学图像分割是一项关键但具有挑战性的任务,主要是因为难以获得大量高质量、专家注释的图像数据集。对比学习为这一问题提供了一个有潜力但仍有问题的解决方案。因为大多数现有的方法侧重于提取实例级或像素到像素的表示,而忽略了图像内相似像素组之间的特征。此外,在考虑对比对生成时,大多数SOTA方法主要依赖于手动设置阈值,这需要大量的梯度实验,缺乏效率和泛化。为了解决这些问题,我们提出了一种新的对比学习方法SuperCL用于医学图像分割的预训练。具体来说,SuperCL通过引入两种新的对比对生成策略:图像内局部对比对(ILCP)生成和图像间全局对比对(IGCP)生成,利用了图像的结构先验和像素相关性。考虑到超像素聚类与对比对生成的概念很好地一致,我们利用超像素映射为ILCP和IGCP生成伪掩码,以指导监督对比学习。此外,我们还提出了平均超像素特征图生成(ASP)和连接组件标签生成(CCL)两个模块,以更好地利用IGCP的先验结构信息。最后,在8个医学图像数据集上的实验表明,SuperCL优于现有的12种方法。我们的SuperCL在可视化图上的预测更加精确,DSC比之前在MMWHS、CHAOS、脾脏上的最佳结果高出3.15%、5.44%和7.89%,并且添加了10%的注释。我们的代码发布在https://github.com/stevezs315/SuperCL
{"title":"SuperCL: Superpixel Guided Contrastive Learning for Medical Image Segmentation Pre-Training","authors":"Shuang Zeng;Lei Zhu;Xinliang Zhang;Hangzhou He;Yanye Lu","doi":"10.1109/TIP.2026.3657233","DOIUrl":"10.1109/TIP.2026.3657233","url":null,"abstract":"Medical image segmentation is a critical yet challenging task, primarily due to the difficulty of obtaining extensive datasets of high-quality, expert-annotated images. Contrastive learning presents a potential but still problematic solution to this issue. Because most existing methods focus on extracting instance-level or pixel-to-pixel representation, which ignores the characteristics between intra-image similar pixel groups. Moreover, when considering contrastive pairs generation, most SOTA methods mainly rely on manually setting thresholds, which requires a large number of gradient experiments and lacks efficiency and generalization. To address these issues, we propose a novel contrastive learning approach named SuperCL for medical image segmentation pre-training. Specifically, our SuperCL exploits the structural prior and pixel correlation of images by introducing two novel contrastive pairs generation strategies: Intra-image Local Contrastive Pairs (ILCP) Generation and Inter-image Global Contrastive Pairs (IGCP) Generation. Considering superpixel cluster aligns well with the concept of contrastive pairs generation, we utilize the superpixel map to generate pseudo masks for both ILCP and IGCP to guide supervised contrastive learning. Moreover, we also propose two modules named Average SuperPixel Feature Map Generation (ASP) and Connected Components Label Generation (CCL) to better exploit the prior structural information for IGCP. Finally, experiments on 8 medical image datasets indicate our SuperCL outperforms existing 12 methods. i.e. Our SuperCL achieves a superior performance with more precise predictions from visualization figures and 3.15%, 5.44%, 7.89% DSC higher than the previous best results on MMWHS, CHAOS, Spleen with 10% annotations. Our code is released at <uri>https://github.com/stevezs315/SuperCL</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1636-1651"},"PeriodicalIF":13.7,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-Decomposed Interaction Network for Stereo Image Restoration 用于立体图像恢复的频率分解交互网络
IF 13.7 Pub Date : 2026-02-02 DOI: 10.1109/TIP.2026.3658219
Xianmin Tian;Jin Xie;Ronghua Xu;Jing Nie;Jiale Cao;Yanwei Pang;Xuelong Li
Stereo image restoration in adverse environments, such as low-light conditions, rain, and low resolution, requires effective exploitation of cross-view complementary information to recover degraded visual content. In monocular image restoration, frequency decomposition has proven effective, where high-frequency components aid in recovering fine textures and reducing blur, while low-frequency components facilitate noise suppression and illumination correction. However, existing stereo restoration methods have yet to explore cross-view interactions by frequency decomposition, which is a promising direction for enhancing restoration quality. To address this, we propose a frequency-aware framework comprising a Frequency Decomposition Module (FDM), Detail Interaction Module (DIM), Structural Interaction Module (SIM), and Adaptive Fusion Module (AFM). FDM employs learnable filters to decompose the image into high- and low-frequency components. DIM enhances the high-frequency branch by capturing local detail cues through deformable convolution. SIM processes the low-frequency branch by modeling global structural correlations via a cross-view row-wise attention mechanism. Finally, AFM adaptively fuses the complementary frequency-specific information to generate high-quality restored images. Extensive experiments demonstrate the efficacy and generalizability of our framework across three diverse stereo restoration tasks, where it achieves state-of-the-art performance in low-light enhancement, rain removal, alongside highly competitive results in super-resolution. Our code is available at https://github.com/C2022J/FDIN
在弱光、降雨和低分辨率等恶劣环境下的立体图像恢复需要有效地利用交叉视点互补信息来恢复退化的视觉内容。在单眼图像恢复中,频率分解已被证明是有效的,其中高频成分有助于恢复精细纹理和减少模糊,而低频成分有助于抑制噪声和照明校正。然而,现有的立体恢复方法尚未通过频率分解来探索交叉视相互作用,这是提高恢复质量的一个有希望的方向。为了解决这个问题,我们提出了一个频率感知框架,包括频率分解模块(FDM)、细节交互模块(DIM)、结构交互模块(SIM)和自适应融合模块(AFM)。FDM采用可学习滤波器将图像分解为高频和低频分量。DIM通过可变形卷积捕获局部细节线索来增强高频分支。SIM通过跨视图行注意机制建模全局结构相关性来处理低频分支。最后,AFM自适应融合互补的特定频率信息,生成高质量的恢复图像。广泛的实验证明了我们的框架在三种不同的立体恢复任务中的有效性和普遍性,其中它在低光增强,除雨方面实现了最先进的性能,同时在超分辨率方面取得了极具竞争力的结果。我们的代码可在https://github.com/C2022J/FDIN上获得
{"title":"Frequency-Decomposed Interaction Network for Stereo Image Restoration","authors":"Xianmin Tian;Jin Xie;Ronghua Xu;Jing Nie;Jiale Cao;Yanwei Pang;Xuelong Li","doi":"10.1109/TIP.2026.3658219","DOIUrl":"10.1109/TIP.2026.3658219","url":null,"abstract":"Stereo image restoration in adverse environments, such as low-light conditions, rain, and low resolution, requires effective exploitation of cross-view complementary information to recover degraded visual content. In monocular image restoration, frequency decomposition has proven effective, where high-frequency components aid in recovering fine textures and reducing blur, while low-frequency components facilitate noise suppression and illumination correction. However, existing stereo restoration methods have yet to explore cross-view interactions by frequency decomposition, which is a promising direction for enhancing restoration quality. To address this, we propose a frequency-aware framework comprising a Frequency Decomposition Module (FDM), Detail Interaction Module (DIM), Structural Interaction Module (SIM), and Adaptive Fusion Module (AFM). FDM employs learnable filters to decompose the image into high- and low-frequency components. DIM enhances the high-frequency branch by capturing local detail cues through deformable convolution. SIM processes the low-frequency branch by modeling global structural correlations via a cross-view row-wise attention mechanism. Finally, AFM adaptively fuses the complementary frequency-specific information to generate high-quality restored images. Extensive experiments demonstrate the efficacy and generalizability of our framework across three diverse stereo restoration tasks, where it achieves state-of-the-art performance in low-light enhancement, rain removal, alongside highly competitive results in super-resolution. Our code is available at <uri>https://github.com/C2022J/FDIN</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1462-1477"},"PeriodicalIF":13.7,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146101789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MDA-MAA: A Collaborative Augmentation Approach for Generalizing Cross-Domain Retrieval 一种泛化跨域检索的协同增强方法
IF 13.7 Pub Date : 2026-02-02 DOI: 10.1109/TIP.2026.3658223
Ming Jin;Richang Hong
In video-text cross-domain retrieval tasks, the generalization ability of the retrieval models is key to improving their performance and is crucial for enhancing their practical applicability. However, existing retrieval models exhibit significant deficiencies in cross-domain generalization. On one hand, models tend to overfit specific training domain data, resulting in poor cross-domain matching and significantly reduced retrieval accuracy when dealing with data from different, new, or mixed domains. On the other hand, although data augmentation is a vital strategy for enhancing model generalization, most existing methods focus on unimodal augmentation and fail to fully exploit the multimodal correlations between video and text. As a result, the augmented data lack semantic diversity, which further limits the model’s ability to understand and perform in complex cross-domain scenarios. To address these challenges, this paper proposes an innovative collaborative augmentation approach named MDA-MAA, which includes two core modules: the Masked Attention Augmentation (MAA) module and the Multimodal Diffusion Augmentation (MDA) module. The MAA module applies masking to the original video frame features and uses an attention mechanism to predict the masked features, effectively reducing overfitting to training data and enhancing model generalization. The MDA module generates subtitles from video frames and uses the LLaMA model to infer comprehensive video captions. These captions, combined with the original video frames, are integrated into a diffusion model for joint learning, ultimately generating semantically enriched augmented video frames. This process leverages the multimodal relationship between video and text to increase the diversity of the training data distribution. Experimental results demonstrate that this collaborative augmentation method significantly improves the performance of video-text cross-domain retrieval models, validating its effectiveness in enhancing model generalization.
在视频文本跨域检索任务中,检索模型的泛化能力是提高检索模型性能的关键,也是增强检索模型实用性的关键。然而,现有的检索模型在跨域泛化方面存在明显的不足。一方面,模型倾向于过度拟合特定的训练领域数据,导致在处理来自不同领域、新领域或混合领域的数据时,跨领域匹配能力差,检索精度显著降低。另一方面,虽然数据增强是增强模型泛化的重要策略,但大多数现有方法都侧重于单模态增强,未能充分利用视频和文本之间的多模态相关性。因此,增强的数据缺乏语义多样性,这进一步限制了模型在复杂的跨域场景中理解和执行的能力。为了解决这些问题,本文提出了一种创新的协同增强方法,称为MDA-MAA,该方法包括两个核心模块:掩面注意力增强(MAA)模块和多模态扩散增强(MDA)模块。MAA模块对原始视频帧特征进行屏蔽,并利用注意机制对被屏蔽的特征进行预测,有效减少了对训练数据的过拟合,增强了模型的泛化能力。MDA模块从视频帧生成字幕,并使用LLaMA模型推断综合视频字幕。这些字幕与原始视频帧相结合,被集成到一个扩散模型中进行联合学习,最终生成语义丰富的增强视频帧。该过程利用视频和文本之间的多模态关系来增加训练数据分布的多样性。实验结果表明,该协同增强方法显著提高了视频文本跨域检索模型的性能,验证了其增强模型泛化的有效性。
{"title":"MDA-MAA: A Collaborative Augmentation Approach for Generalizing Cross-Domain Retrieval","authors":"Ming Jin;Richang Hong","doi":"10.1109/TIP.2026.3658223","DOIUrl":"10.1109/TIP.2026.3658223","url":null,"abstract":"In video-text cross-domain retrieval tasks, the generalization ability of the retrieval models is key to improving their performance and is crucial for enhancing their practical applicability. However, existing retrieval models exhibit significant deficiencies in cross-domain generalization. On one hand, models tend to overfit specific training domain data, resulting in poor cross-domain matching and significantly reduced retrieval accuracy when dealing with data from different, new, or mixed domains. On the other hand, although data augmentation is a vital strategy for enhancing model generalization, most existing methods focus on unimodal augmentation and fail to fully exploit the multimodal correlations between video and text. As a result, the augmented data lack semantic diversity, which further limits the model’s ability to understand and perform in complex cross-domain scenarios. To address these challenges, this paper proposes an innovative collaborative augmentation approach named MDA-MAA, which includes two core modules: the Masked Attention Augmentation (MAA) module and the Multimodal Diffusion Augmentation (MDA) module. The MAA module applies masking to the original video frame features and uses an attention mechanism to predict the masked features, effectively reducing overfitting to training data and enhancing model generalization. The MDA module generates subtitles from video frames and uses the LLaMA model to infer comprehensive video captions. These captions, combined with the original video frames, are integrated into a diffusion model for joint learning, ultimately generating semantically enriched augmented video frames. This process leverages the multimodal relationship between video and text to increase the diversity of the training data distribution. Experimental results demonstrate that this collaborative augmentation method significantly improves the performance of video-text cross-domain retrieval models, validating its effectiveness in enhancing model generalization.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1595-1606"},"PeriodicalIF":13.7,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146101327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Padé Neurons for Efficient Neural Models 高效神经模型中的神经元
IF 13.7 Pub Date : 2026-01-30 DOI: 10.1109/TIP.2026.3653202
Onur Keleş;A. Murat Tekalp
Neural networks commonly employ the McCulloch-Pitts neuron model, which is a linear model followed by a point-wise non-linear activation. Various researchers have already advanced inherently non-linear neuron models, such as quadratic neurons, generalized operational neurons, generative neurons, and super neurons, which offer stronger non-linearity compared to point-wise activation functions. In this paper, we introduce a novel and better non-linear neuron model called Padé neurons ( $mathrm {textit {Paon}}$ s), inspired by Padé approximants. $mathrm {textit {Paon}}$ s offer several advantages, such as diversity of non-linearity, since each $mathrm {textit {Paon}}$ learns a different non-linear function of its inputs, and layer efficiency, since $mathrm {textit {Paon}}$ s provide stronger non-linearity in much fewer layers compared to piecewise linear approximation. Furthermore, $mathrm {textit {Paon}}$ s include all previously proposed neuron models as special cases, thus any neuron model in any network can be replaced by $mathrm {textit {Paon}}$ s. We note that there has been a proposal to employ the Padé approximation as a generalized point-wise activation function, which is fundamentally different from our model. To validate the efficacy of $mathrm {textit {Paon}}$ s, in our experiments, we replace classic neurons in some well-known neural image super-resolution, compression, and classification models based on the ResNet architecture with $mathrm {textit {Paon}}$ s. Our comprehensive experimental results and analyses demonstrate that neural models built by $mathrm {textit {Paon}}$ s provide better or equal performance than their classic counterparts with a smaller number of layers. The PyTorch implementation code for $mathrm {textit {Paon}}$ is open-sourced at https://github.com/onur-keles/Paon
神经网络通常采用McCulloch-Pitts神经元模型,这是一个线性模型,然后是逐点非线性激活。许多研究者已经提出了固有的非线性神经元模型,如二次神经元、广义运算神经元、生成神经元和超级神经元,它们比点激活函数具有更强的非线性。在本文中,我们引入了一种新的和更好的非线性神经元模型,称为pad神经元($ mathm {textit {Paon}}$ s),灵感来自于pad近似。$mathrm {textit {Paon}}$ s提供了几个优点,例如非线性的多样性,因为每个$mathrm {textit {Paon}}$学习其输入的不同非线性函数,以及层效率,因为$mathrm {textit {Paon}}$ s在更少的层中提供更强的非线性,而不是分段线性近似。此外,$mathrm {textit {Paon}}$ s包含了所有之前提出的神经元模型作为特例,因此任何网络中的任何神经元模型都可以被$mathrm {textit {Paon}}$ s取代。我们注意到,已经有人建议将pad近似作为广义点向激活函数,这与我们的模型有本质的不同。为了验证$mathrm {textit {Paon}}$ s的有效性,在我们的实验中,我们用$mathrm {textit {Paon}}$ s取代了一些基于ResNet架构的知名神经图像超分辨率、压缩和分类模型中的经典神经元。我们的综合实验结果和分析表明,$mathrm {textit {Paon}}$ s构建的神经模型在层数更少的情况下提供了比经典模型更好或相同的性能。$ mathm {textit {Paon}}$的PyTorch实现代码在https://github.com/onur-keles/Paon上是开源的
{"title":"Padé Neurons for Efficient Neural Models","authors":"Onur Keleş;A. Murat Tekalp","doi":"10.1109/TIP.2026.3653202","DOIUrl":"10.1109/TIP.2026.3653202","url":null,"abstract":"Neural networks commonly employ the McCulloch-Pitts neuron model, which is a linear model followed by a point-wise non-linear activation. Various researchers have already advanced inherently non-linear neuron models, such as quadratic neurons, generalized operational neurons, generative neurons, and super neurons, which offer stronger non-linearity compared to point-wise activation functions. In this paper, we introduce a novel and better non-linear neuron model called Padé neurons (<inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula>s), inspired by Padé approximants. <inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula>s offer several advantages, such as diversity of non-linearity, since each <inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula> learns a different non-linear function of its inputs, and layer efficiency, since <inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula>s provide stronger non-linearity in much fewer layers compared to piecewise linear approximation. Furthermore, <inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula>s include all previously proposed neuron models as special cases, thus any neuron model in any network can be replaced by <inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula>s. We note that there has been a proposal to employ the Padé approximation as a generalized point-wise activation function, which is fundamentally different from our model. To validate the efficacy of <inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula>s, in our experiments, we replace classic neurons in some well-known neural image super-resolution, compression, and classification models based on the ResNet architecture with <inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula>s. Our comprehensive experimental results and analyses demonstrate that neural models built by <inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula>s provide better or equal performance than their classic counterparts with a smaller number of layers. The PyTorch implementation code for <inline-formula> <tex-math>$mathrm {textit {Paon}}$ </tex-math></inline-formula> is open-sourced at <uri>https://github.com/onur-keles/Paon</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1508-1520"},"PeriodicalIF":13.7,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146089897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangle to Fuse: Toward Content Preservation and Cross-Modality Consistency for Multi-Modality Image Fusion 从分离到融合:面向多模态图像融合的内容保存与跨模态一致性。
IF 13.7 Pub Date : 2026-01-30 DOI: 10.1109/TIP.2026.3657183
Xinran Qin;Yuning Cui;Shangquan Sun;Ruoyu Chen;Wenqi Ren;Alois Knoll;Xiaochun Cao
Multi-modal image fusion (MMIF) aims to integrate complementary information from heterogeneous sensor modalities. However, substantial cross-modality discrepancies hinder joint scene representation and lead to semantic degradation in the fused output. To address this limitation, we propose C2MFuse, a novel framework designed to preserve content while ensuring cross-modality consistency. To the best of our knowledge, this is the first MMIF approach to explicitly disentangle style and content representations across modalities for image fusion. C2MFuse introduces a content-preserving style normalization mechanism that suppresses modality-specific variations while maintaining the underlying scene structure. The normalized features are then progressively aggregated to enhance fine-grained details and improve content completeness. In light of the lack of ground truth and the inherent ambiguity of the fused distribution, we further align the fused representation with a well-defined source modality, thereby enhancing semantic consistency and reducing distributional uncertainty. Additionally, we introduce an adaptive consistency loss with learnable transformation, which provides dynamic, modality-aware supervision by enforcing global consistency across heterogeneous inputs. Extensive experiments on five datasets across three representative MMIF tasks demonstrate that C2MFuse achieves efficient and high-quality fusion, surpasses existing methods, and generalizes effectively to downstream visual applications.
多模态图像融合(MMIF)旨在整合来自不同传感器模态的互补信息。然而,大量的跨模态差异阻碍了联合场景表示,并导致融合输出中的语义退化。为了解决这一限制,我们提出了C2MFuse,这是一个新的框架,旨在保留内容,同时确保跨模态的一致性。据我们所知,这是第一个明确地将样式和内容表示跨图像融合模式分开的MMIF方法。C2MFuse引入了一种内容保留样式规范化机制,该机制在保持底层场景结构的同时抑制特定于模态的变化。然后逐步聚合规范化的特性,以增强细粒度的细节并提高内容的完整性。针对融合分布缺乏基础真值和固有的模糊性,我们进一步将融合表示与定义良好的源模态对齐,从而增强语义一致性,降低分布不确定性。此外,我们引入了具有可学习转换的自适应一致性损失,它通过强制跨异构输入的全局一致性提供动态的、模式感知的监督。在5个数据集、3个具有代表性的MMIF任务上进行的大量实验表明,C2MFuse实现了高效、高质量的融合,超越了现有方法,并有效地推广到下游可视化应用。
{"title":"Disentangle to Fuse: Toward Content Preservation and Cross-Modality Consistency for Multi-Modality Image Fusion","authors":"Xinran Qin;Yuning Cui;Shangquan Sun;Ruoyu Chen;Wenqi Ren;Alois Knoll;Xiaochun Cao","doi":"10.1109/TIP.2026.3657183","DOIUrl":"10.1109/TIP.2026.3657183","url":null,"abstract":"Multi-modal image fusion (MMIF) aims to integrate complementary information from heterogeneous sensor modalities. However, substantial cross-modality discrepancies hinder joint scene representation and lead to semantic degradation in the fused output. To address this limitation, we propose C2MFuse, a novel framework designed to preserve content while ensuring cross-modality consistency. To the best of our knowledge, this is the first MMIF approach to explicitly disentangle style and content representations across modalities for image fusion. C2MFuse introduces a content-preserving style normalization mechanism that suppresses modality-specific variations while maintaining the underlying scene structure. The normalized features are then progressively aggregated to enhance fine-grained details and improve content completeness. In light of the lack of ground truth and the inherent ambiguity of the fused distribution, we further align the fused representation with a well-defined source modality, thereby enhancing semantic consistency and reducing distributional uncertainty. Additionally, we introduce an adaptive consistency loss with learnable transformation, which provides dynamic, modality-aware supervision by enforcing global consistency across heterogeneous inputs. Extensive experiments on five datasets across three representative MMIF tasks demonstrate that C2MFuse achieves efficient and high-quality fusion, surpasses existing methods, and generalizes effectively to downstream visual applications.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1756-1770"},"PeriodicalIF":13.7,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IHDCP: Single Image Dehazing Using Inverted Haze Density Correction Prior IHDCP:使用反向雾霾密度校正先验的单幅图像去雾。
IF 13.7 Pub Date : 2026-01-29 DOI: 10.1109/TIP.2026.3657636
Yun Liu;Tao Li;Chunping Tan;Wenqi Ren;Cosmin Ancuti;Weisi Lin
Image dehazing, a crucial task in low-level vision, supports numerous practical applications, such as autonomous driving, remote sensing, and surveillance. This paper proposes IHDCP, a novel Inverted Haze Density Correction Prior for efficient single image dehazing. It is observed that the medium transmission can be effectively modeled from the inverted haze density map using correction functions with various gamma coefficients. Based on this observation, a pixel-wise gamma correction coefficient is introduced to formulate the transmission as a function of the inverted haze density map. To estimate the transmission, IHDCP is first incorporated into the classic atmospheric scattering model (ASM), leading to a transcendental equation that is subsequently simplified to a quadratic form with a single unknown parameter using the Taylor expansion. Then, boundary constraints are designed to estimate this model parameter, and the gamma correction coefficient map is derived via the Vieta theorem. Finally, the haze-free result is recovered through ASM inversion. Experimental results on diverse synthetic and real-world datasets verify that our algorithm not only provides visually appealing dehazing performance with high computational efficiency, but also outperforms several state-of-the-art dehazing approaches in both subjective and objective evaluations. Moreover, our IHDCP generalizes well to various types of degraded scenes. Our code is available at https://github.com/TaoLi-TL/IHDCP.
图像去雾是低水平视觉中的一项关键任务,它支持许多实际应用,如自动驾驶、遥感和监视。本文提出了一种新的逆霾密度校正先验算法IHDCP,用于单幅图像的高效去雾。观察到介质传输可以使用具有不同伽马系数的校正函数有效地从倒雾密度图中建模。在此基础上,引入逐像素的伽玛校正系数,将透射率表示为反演雾霾密度图的函数。为了估计传输,首先将IHDCP纳入经典大气散射模型(ASM),得到一个超越方程,随后使用泰勒展开将其简化为具有单个未知参数的二次型。然后,设计边界约束来估计该模型参数,并通过Vieta定理推导出伽马修正系数图。最后,通过ASM反演恢复无雾结果。在各种合成数据集和现实世界数据集上的实验结果验证了我们的算法不仅提供了具有高计算效率的视觉吸引力的除雾性能,而且在主观和客观评估中都优于几种最先进的除雾方法。此外,我们的IHDCP可以很好地推广到各种类型的退化场景。我们的代码可在https://github.com/TaoLi-TL/IHDCP上获得。
{"title":"IHDCP: Single Image Dehazing Using Inverted Haze Density Correction Prior","authors":"Yun Liu;Tao Li;Chunping Tan;Wenqi Ren;Cosmin Ancuti;Weisi Lin","doi":"10.1109/TIP.2026.3657636","DOIUrl":"10.1109/TIP.2026.3657636","url":null,"abstract":"Image dehazing, a crucial task in low-level vision, supports numerous practical applications, such as autonomous driving, remote sensing, and surveillance. This paper proposes IHDCP, a novel Inverted Haze Density Correction Prior for efficient single image dehazing. It is observed that the medium transmission can be effectively modeled from the inverted haze density map using correction functions with various gamma coefficients. Based on this observation, a pixel-wise gamma correction coefficient is introduced to formulate the transmission as a function of the inverted haze density map. To estimate the transmission, IHDCP is first incorporated into the classic atmospheric scattering model (ASM), leading to a transcendental equation that is subsequently simplified to a quadratic form with a single unknown parameter using the Taylor expansion. Then, boundary constraints are designed to estimate this model parameter, and the gamma correction coefficient map is derived via the Vieta theorem. Finally, the haze-free result is recovered through ASM inversion. Experimental results on diverse synthetic and real-world datasets verify that our algorithm not only provides visually appealing dehazing performance with high computational efficiency, but also outperforms several state-of-the-art dehazing approaches in both subjective and objective evaluations. Moreover, our IHDCP generalizes well to various types of degraded scenes. Our code is available at <uri>https://github.com/TaoLi-TL/IHDCP</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1448-1461"},"PeriodicalIF":13.7,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146073159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1