首页 > 最新文献

IEEE Transactions on Image Processing最新文献

英文 中文
SRS: Siamese Reconstruction-Segmentation Network based on Dynamic-Parameter Convolution SRS:基于动态参数卷积的Siamese重建分割网络
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-19 DOI: 10.1109/tip.2025.3607624
Bingkun Nian, Fenghe Tang, Jianrui Ding, Jie Yang, Zhonglong Zheng, Shaohua Kevin Zhou, Wei Liu
{"title":"SRS: Siamese Reconstruction-Segmentation Network based on Dynamic-Parameter Convolution","authors":"Bingkun Nian, Fenghe Tang, Jianrui Ding, Jie Yang, Zhonglong Zheng, Shaohua Kevin Zhou, Wei Liu","doi":"10.1109/tip.2025.3607624","DOIUrl":"https://doi.org/10.1109/tip.2025.3607624","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"38 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145089107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient and Structure Consistency in Multimodal Emotion Recognition. 多模态情感识别中的梯度和结构一致性。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-18 DOI: 10.1109/tip.2025.3608664
QingHongYa Shi,Mang Ye,Wenke Huang,Bo Du,Xiaofen Zong
Multimodal emotion recognition is a task that integrates text, visual, and audio data to holistically infer an individual's emotional state. Existing research predominantly focuses on exploiting modality-specific cues for joint learning, often ignoring the differences between multiple modalities under common goal learning. Due to multimodal heterogeneity, common goal learning inadvertently introduces optimization biases and interaction noise. To address above challenges, we propose a novel approach named Gradient and Structure Consistency (GSCon). Our strategy operates at both overall and individual levels to consider balance optimization and effective interaction respectively. At the overall level, to avoid the optimization suppression of a modality on other modalities, we construct a balanced gradient direction that aligns each modality's optimization direction, ensuring unbiased convergence. Simultaneously, at the individual level, to avoid the interaction noise caused by multimodal alignment, we align the spatial structure of samples in different modalities. The spatial structure of the samples will not differ due to modal heterogeneity, achieving effective inter-modal interaction. Extensive experiments on multimodal emotion recognition and multimodal intention understanding datasets demonstrate the effectiveness of the proposed method. Code is available at https://github.com/ShiQingHongYa/GSCon.
多模态情绪识别是一项整合文本、视觉和音频数据来全面推断个人情绪状态的任务。现有的研究主要集中在利用模态特异性线索进行联合学习,往往忽视了共同目标学习下多模态之间的差异。由于多模态异质性,共同目标学习无意中引入了优化偏差和交互噪声。为了解决上述挑战,我们提出了一种新的方法,称为梯度和结构一致性(GSCon)。我们的策略在整体和个人层面上运作,分别考虑平衡优化和有效互动。在整体上,为了避免一个模态对其他模态的优化抑制,我们构造了一个平衡梯度方向,使每个模态的优化方向对齐,保证无偏收敛。同时,在个体层面,为了避免多模态对齐带来的交互噪声,我们将样本的空间结构以不同的模态对齐。样品的空间结构不会因模态异质性而产生差异,实现有效的多模态相互作用。在多模态情感识别和多模态意图理解数据集上的大量实验证明了该方法的有效性。代码可从https://github.com/ShiQingHongYa/GSCon获得。
{"title":"Gradient and Structure Consistency in Multimodal Emotion Recognition.","authors":"QingHongYa Shi,Mang Ye,Wenke Huang,Bo Du,Xiaofen Zong","doi":"10.1109/tip.2025.3608664","DOIUrl":"https://doi.org/10.1109/tip.2025.3608664","url":null,"abstract":"Multimodal emotion recognition is a task that integrates text, visual, and audio data to holistically infer an individual's emotional state. Existing research predominantly focuses on exploiting modality-specific cues for joint learning, often ignoring the differences between multiple modalities under common goal learning. Due to multimodal heterogeneity, common goal learning inadvertently introduces optimization biases and interaction noise. To address above challenges, we propose a novel approach named Gradient and Structure Consistency (GSCon). Our strategy operates at both overall and individual levels to consider balance optimization and effective interaction respectively. At the overall level, to avoid the optimization suppression of a modality on other modalities, we construct a balanced gradient direction that aligns each modality's optimization direction, ensuring unbiased convergence. Simultaneously, at the individual level, to avoid the interaction noise caused by multimodal alignment, we align the spatial structure of samples in different modalities. The spatial structure of the samples will not differ due to modal heterogeneity, achieving effective inter-modal interaction. Extensive experiments on multimodal emotion recognition and multimodal intention understanding datasets demonstrate the effectiveness of the proposed method. Code is available at https://github.com/ShiQingHongYa/GSCon.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"6 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-Driven Global-Local Fusion Transformer for Image Super-Resolution. 面向图像超分辨率的语义驱动全局-局部融合变压器。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-18 DOI: 10.1109/tip.2025.3609106
Kaibing Zhang,Zhouwei Cheng,Xin He,Jie Li,Xinbo Gao
Image Super-Resolution (SR) has seen remarkable progress with the emergence of transformer-based architectures. However, due to the high computational cost, many existing transformer-based SR methods limit their attention to local windows, which hinders their ability to model long-range dependencies and global structures. To address these challenges, we propose a novel SR framework named Semantic-Driven Global-Local Fusion Transformer (SGLFT). The proposed model enhances the receptive field by combining a Hybrid Window Transformer (HWT) and a Scalable Transformer Module (STM) to jointly capture local textures and global context. To further strengthen the semantic consistency of reconstruction, we introduce a Semantic Extraction Module (SEM) that distills high-level semantic priors from the input. These semantic cues are adaptively integrated with visual features through an Adaptive Feature Fusion Semantic Integration Module (AFFSIM). Extensive experiments on standard benchmarks demonstrate the effectiveness of SGLFT in producing visually faithful and structurally consistent SR results. The code will be available at https://github.com/kbzhang0505/SGLFT.
随着基于变压器的架构的出现,图像超分辨率(SR)已经取得了显著的进步。然而,由于计算成本高,许多现有的基于变压器的SR方法限制了它们对局部窗口的关注,这阻碍了它们对长期依赖关系和全局结构的建模能力。为了解决这些挑战,我们提出了一种新的SR框架,称为语义驱动的全局-局部融合变压器(SGLFT)。该模型通过结合混合窗口变换(HWT)和可扩展变换模块(STM)来增强接收场,以共同捕获局部纹理和全局上下文。为了进一步加强重建的语义一致性,我们引入了一个语义提取模块(SEM),从输入中提取高级语义先验。这些语义线索通过自适应特征融合语义集成模块(AFFSIM)与视觉特征自适应集成。在标准基准上的大量实验证明了SGLFT在产生视觉上忠实和结构上一致的SR结果方面的有效性。代码可在https://github.com/kbzhang0505/SGLFT上获得。
{"title":"Semantic-Driven Global-Local Fusion Transformer for Image Super-Resolution.","authors":"Kaibing Zhang,Zhouwei Cheng,Xin He,Jie Li,Xinbo Gao","doi":"10.1109/tip.2025.3609106","DOIUrl":"https://doi.org/10.1109/tip.2025.3609106","url":null,"abstract":"Image Super-Resolution (SR) has seen remarkable progress with the emergence of transformer-based architectures. However, due to the high computational cost, many existing transformer-based SR methods limit their attention to local windows, which hinders their ability to model long-range dependencies and global structures. To address these challenges, we propose a novel SR framework named Semantic-Driven Global-Local Fusion Transformer (SGLFT). The proposed model enhances the receptive field by combining a Hybrid Window Transformer (HWT) and a Scalable Transformer Module (STM) to jointly capture local textures and global context. To further strengthen the semantic consistency of reconstruction, we introduce a Semantic Extraction Module (SEM) that distills high-level semantic priors from the input. These semantic cues are adaptively integrated with visual features through an Adaptive Feature Fusion Semantic Integration Module (AFFSIM). Extensive experiments on standard benchmarks demonstrate the effectiveness of SGLFT in producing visually faithful and structurally consistent SR results. The code will be available at https://github.com/kbzhang0505/SGLFT.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"22 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
URFusion: Unsupervised Unified Degradation-Robust Image Fusion Network. URFusion:无监督统一退化-鲁棒图像融合网络。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607628
Han Xu,Xunpeng Yi,Chen Lu,Guangcan Liu,Jiayi Ma
When dealing with low-quality source images, existing image fusion methods either fail to handle degradations or are restricted to specific degradations. This study proposes an unsupervised unified degradation-robust image fusion network, termed as URFusion, in which various types of degradations can be uniformly eliminated during the fusion process, leading to high-quality fused images. URFusion is composed of three core modules: intrinsic content extraction, intrinsic content fusion, and appearance representation learning and assignment. It first extracts degradation-free intrinsic content features from images affected by various degradations. These content features then provide feature-level rather than image-level fusion constraints for optimizing the fusion network, effectively eliminating degradation residues and reliance on ground truth. Finally, URFusion learns the appearance representation of images and assign the statistical appearance representation of high-quality images to the content-fused result, producing the final high-quality fused image. Extensive experiments on multi-exposure image fusion and multi-modal image fusion tasks demonstrate the advantages of URFusion in fusion performance and suppression of multiple types of degradations. The code is available at https://github.com/hanna-xu/URFusion.
当处理低质量的源图像时,现有的图像融合方法要么无法处理降级,要么仅限于特定的降级。本研究提出了一种无监督统一的退化鲁棒图像融合网络,称为URFusion,该网络在融合过程中可以均匀消除各种类型的退化,从而获得高质量的融合图像。URFusion由三个核心模块组成:内在内容提取、内在内容融合和外观表示学习与分配。它首先从受各种退化影响的图像中提取无退化的内在内容特征。然后,这些内容特征为优化融合网络提供了特征级而不是图像级的融合约束,有效地消除了退化残留物和对地面真相的依赖。最后,URFusion学习图像的外观表示,并将高质量图像的统计外观表示分配给内容融合结果,生成最终的高质量融合图像。对多曝光图像融合和多模态图像融合任务的大量实验表明,URFusion在融合性能和抑制多种类型的退化方面具有优势。代码可在https://github.com/hanna-xu/URFusion上获得。
{"title":"URFusion: Unsupervised Unified Degradation-Robust Image Fusion Network.","authors":"Han Xu,Xunpeng Yi,Chen Lu,Guangcan Liu,Jiayi Ma","doi":"10.1109/tip.2025.3607628","DOIUrl":"https://doi.org/10.1109/tip.2025.3607628","url":null,"abstract":"When dealing with low-quality source images, existing image fusion methods either fail to handle degradations or are restricted to specific degradations. This study proposes an unsupervised unified degradation-robust image fusion network, termed as URFusion, in which various types of degradations can be uniformly eliminated during the fusion process, leading to high-quality fused images. URFusion is composed of three core modules: intrinsic content extraction, intrinsic content fusion, and appearance representation learning and assignment. It first extracts degradation-free intrinsic content features from images affected by various degradations. These content features then provide feature-level rather than image-level fusion constraints for optimizing the fusion network, effectively eliminating degradation residues and reliance on ground truth. Finally, URFusion learns the appearance representation of images and assign the statistical appearance representation of high-quality images to the content-fused result, producing the final high-quality fused image. Extensive experiments on multi-exposure image fusion and multi-modal image fusion tasks demonstrate the advantages of URFusion in fusion performance and suppression of multiple types of degradations. The code is available at https://github.com/hanna-xu/URFusion.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"17 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harmonized Domain Enabled Alternate Search for Infrared and Visible Image Alignment. 协调域启用替代搜索红外和可见光图像对齐。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607585
Zhiying Jiang,Zengxi Zhang,Jinyuan Liu
Infrared and visible image alignment is essential and critical to the fusion and multi-modal perception applications. It addresses discrepancies in position and scale caused by spectral properties and environmental variations, ensuring precise pixel correspondence and spatial consistency. Existing manual calibration requires regular maintenance and exhibits poor portability, challenging the adaptability of multi-modal application in dynamic environments. In this paper, we propose a harmonized representation based infrared and visible image alignment, achieving both high accuracy and scene adaptability. Specifically, with regard to the disparity between multi-modal images, we develop an invertible translation process to establish a harmonized representation domain that effectively encapsulates the feature intensity and distribution of both infrared and visible modalities. Building on this, we design a hierarchical framework to correct deformations inferred from the harmonized domain in a coarse-to-fine manner. Our framework leverages advanced perception capabilities alongside residual estimation to enable accurate regression of sparse offsets, while an alternate correlation search mechanism ensures precise correspondence matching. Furthermore, we propose the first ground truth available misaligned infrared and visible image benchmark for evaluation. Extensive experiments validate the effectiveness of the proposed method against the state-of-the-arts, advancing the subsequent applications further.
红外和可见光图像对齐对于融合和多模态感知应用至关重要。它解决了由光谱特性和环境变化引起的位置和尺度差异,确保了精确的像素对应和空间一致性。现有的手动校准需要定期维护,便携性差,挑战了动态环境下多模式应用的适应性。本文提出了一种基于协调表示的红外和可见光图像对齐方法,实现了高精度和场景适应性。具体来说,对于多模态图像之间的差异,我们开发了一个可逆的转换过程,以建立一个协调的表示域,有效地封装了红外和可见光模态的特征强度和分布。在此基础上,我们设计了一个分层框架,以从粗到细的方式纠正从协调域推断的变形。我们的框架利用先进的感知能力和残差估计来实现稀疏偏移的准确回归,而另一种相关搜索机制确保精确的对应匹配。在此基础上,我们提出了第一个可获得的红外和可见光图像错位基准进行评价。大量的实验验证了该方法的有效性,并进一步推进了后续的应用。
{"title":"Harmonized Domain Enabled Alternate Search for Infrared and Visible Image Alignment.","authors":"Zhiying Jiang,Zengxi Zhang,Jinyuan Liu","doi":"10.1109/tip.2025.3607585","DOIUrl":"https://doi.org/10.1109/tip.2025.3607585","url":null,"abstract":"Infrared and visible image alignment is essential and critical to the fusion and multi-modal perception applications. It addresses discrepancies in position and scale caused by spectral properties and environmental variations, ensuring precise pixel correspondence and spatial consistency. Existing manual calibration requires regular maintenance and exhibits poor portability, challenging the adaptability of multi-modal application in dynamic environments. In this paper, we propose a harmonized representation based infrared and visible image alignment, achieving both high accuracy and scene adaptability. Specifically, with regard to the disparity between multi-modal images, we develop an invertible translation process to establish a harmonized representation domain that effectively encapsulates the feature intensity and distribution of both infrared and visible modalities. Building on this, we design a hierarchical framework to correct deformations inferred from the harmonized domain in a coarse-to-fine manner. Our framework leverages advanced perception capabilities alongside residual estimation to enable accurate regression of sparse offsets, while an alternate correlation search mechanism ensures precise correspondence matching. Furthermore, we propose the first ground truth available misaligned infrared and visible image benchmark for evaluation. Extensive experiments validate the effectiveness of the proposed method against the state-of-the-arts, advancing the subsequent applications further.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"50 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Source-Free Object Detection with Detection Transformer. 无源对象检测与检测变压器。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607621
Huizai Yao,Sicheng Zhao,Shuo Lu,Hui Chen,Yangyang Li,Guoping Liu,Tengfei Xing,Chenggang Yan,Jianhua Tao,Guiguang Ding
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data. Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR). In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs. FRANCK comprises four key components: (1) an Objectness Score-based Sample Reweighting (OSSR) module that computes attention-based objectness scores on multi-scale encoder feature maps, reweighting the detection loss to emphasize less-recognized regions; (2) a Contrastive Learning with Matching-based Memory Bank (CMMB) module that integrates multi-level features into memory banks, enhancing class-wise contrastive learning; (3) an Uncertainty-weighted Query-fused Feature Distillation (UQFD) module that improves feature distillation through prediction quality reweighting and query feature fusion; and (4) an improved self-training pipeline with a Dynamic Teacher Updating Interval (DTUI) that optimizes pseudo-label quality. By leveraging these components, FRANCK effectively adapts a source-pretrained DETR model to a target domain with enhanced robustness and generalization. Extensive experiments on several widely used benchmarks demonstrate that our method achieves state-of-the-art performance, highlighting its effectiveness and compatibility with DETR-based SFOD models.
无源对象检测(source - free Object Detection, SFOD)使知识从源域转移到无监督的目标域,从而在不访问源数据的情况下进行对象检测。大多数现有的SFOD方法要么局限于传统的目标检测(OD)模型,如Faster R-CNN,要么被设计为通用解决方案,而没有针对新的OD架构(特别是detection Transformer (DETR))进行量身定制。在本文中,我们介绍了特征重加权和对比学习网络(FRANCK),这是一种新的SFOD框架,专门用于对der执行以查询为中心的特征增强。FRANCK由四个关键部分组成:(1)基于对象分数的样本重加权(OSSR)模块,该模块计算多尺度编码器特征映射上基于注意力的对象分数,重新加权检测损失以强调较少识别的区域;(2)基于匹配记忆库(CMMB)的对比学习模块,该模块将多层次特征整合到记忆库中,增强了班级间的对比学习;(3)不确定加权查询融合特征蒸馏(UQFD)模块,通过预测质量重加权和查询特征融合改进特征蒸馏;(4)采用动态教师更新间隔(DTUI)优化伪标签质量的改进自我培训管道。通过利用这些组件,FRANCK有效地将源预训练的DETR模型适应目标域,增强了鲁棒性和泛化性。在几个广泛使用的基准测试上进行的大量实验表明,我们的方法达到了最先进的性能,突出了其有效性和与基于der的SFOD模型的兼容性。
{"title":"Source-Free Object Detection with Detection Transformer.","authors":"Huizai Yao,Sicheng Zhao,Shuo Lu,Hui Chen,Yangyang Li,Guoping Liu,Tengfei Xing,Chenggang Yan,Jianhua Tao,Guiguang Ding","doi":"10.1109/tip.2025.3607621","DOIUrl":"https://doi.org/10.1109/tip.2025.3607621","url":null,"abstract":"Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data. Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR). In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs. FRANCK comprises four key components: (1) an Objectness Score-based Sample Reweighting (OSSR) module that computes attention-based objectness scores on multi-scale encoder feature maps, reweighting the detection loss to emphasize less-recognized regions; (2) a Contrastive Learning with Matching-based Memory Bank (CMMB) module that integrates multi-level features into memory banks, enhancing class-wise contrastive learning; (3) an Uncertainty-weighted Query-fused Feature Distillation (UQFD) module that improves feature distillation through prediction quality reweighting and query feature fusion; and (4) an improved self-training pipeline with a Dynamic Teacher Updating Interval (DTUI) that optimizes pseudo-label quality. By leveraging these components, FRANCK effectively adapts a source-pretrained DETR model to a target domain with enhanced robustness and generalization. Extensive experiments on several widely used benchmarks demonstrate that our method achieves state-of-the-art performance, highlighting its effectiveness and compatibility with DETR-based SFOD models.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"30 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UMCFuse: A Unified Multiple Complex Scenes Infrared and Visible Image Fusion Framework. UMCFuse:一个统一的多复杂场景红外和可见光图像融合框架。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607623
Xilai Li,Xiaosong Li,Tianshu Tan,Huafeng Li,Tao Ye
Infrared and visible image fusion has emerged as a prominent research area in computer vision. However, little attention has been paid to complex scenes fusion, leading to sub-optimal results under interference. To fill this gap, we propose a unified framework for infrared and visible images fusion in complex scenes, termed UMCFuse. Specifically, we classify the pixels of visible images from the degree of scattering of light transmission, allowing us to separate fine details from overall intensity. Maintaining a balance between interference removal and detail preservation is essential for the generalization capacity of the proposed method. Therefore, we propose an adaptive denoising strategy for the fusion of detail layers. Meanwhile, we fuse the energy features from different modalities by analyzing them from multiple directions. Extensive fusion experiments on real and synthetic complex scenes datasets cover adverse weather conditions, noise, blur, overexposure, fire, as well as downstream tasks including semantic segmentation, object detection, salient object detection, and depth estimation, consistently indicate the superiority of the proposed method compared with the recent representative methods. Our code is available at https://github.com/ixilai/UMCFuse.
红外与可见光图像融合是计算机视觉领域的一个重要研究方向。然而,由于对复杂场景融合的研究较少,导致在干扰下的融合效果不理想。为了填补这一空白,我们提出了一个统一的框架,用于红外和可见光图像融合在复杂的场景,称为UMCFuse。具体来说,我们根据光透射的散射程度对可见图像的像素进行分类,使我们能够从总体强度中分离出精细细节。保持干扰去除和细节保留之间的平衡对于所提出方法的泛化能力至关重要。因此,我们提出了一种细节层融合的自适应去噪策略。同时,通过多方位的分析,融合了不同模态的能量特征。在真实和合成的复杂场景数据集上进行了广泛的融合实验,涵盖了恶劣天气条件、噪声、模糊、过度曝光、火灾以及下游任务,包括语义分割、目标检测、显著目标检测和深度估计,一致表明该方法与最近的代表性方法相比具有优势。我们的代码可在https://github.com/ixilai/UMCFuse上获得。
{"title":"UMCFuse: A Unified Multiple Complex Scenes Infrared and Visible Image Fusion Framework.","authors":"Xilai Li,Xiaosong Li,Tianshu Tan,Huafeng Li,Tao Ye","doi":"10.1109/tip.2025.3607623","DOIUrl":"https://doi.org/10.1109/tip.2025.3607623","url":null,"abstract":"Infrared and visible image fusion has emerged as a prominent research area in computer vision. However, little attention has been paid to complex scenes fusion, leading to sub-optimal results under interference. To fill this gap, we propose a unified framework for infrared and visible images fusion in complex scenes, termed UMCFuse. Specifically, we classify the pixels of visible images from the degree of scattering of light transmission, allowing us to separate fine details from overall intensity. Maintaining a balance between interference removal and detail preservation is essential for the generalization capacity of the proposed method. Therefore, we propose an adaptive denoising strategy for the fusion of detail layers. Meanwhile, we fuse the energy features from different modalities by analyzing them from multiple directions. Extensive fusion experiments on real and synthetic complex scenes datasets cover adverse weather conditions, noise, blur, overexposure, fire, as well as downstream tasks including semantic segmentation, object detection, salient object detection, and depth estimation, consistently indicate the superiority of the proposed method compared with the recent representative methods. Our code is available at https://github.com/ixilai/UMCFuse.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"64 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HOPE: Enhanced Position Image Priors via High-Order Implicit Representations. 希望:通过高阶隐式表示增强位置图像先验。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607582
Yang Chen,Ruituo Wu,Junhui Hou,Ce Zhu,Yipeng Liu
Deep Image Prior (DIP) has shown that networks with stochastic initialization and custom architectures can effectively address inverse imaging challenges. Despite its potential, DIP requires significant computational resources, whereas the lighter Implicit Neural Positional Image Prior (PIP) often yields overly smooth solutions due to exacerbated spectral bias. Research on lightweight, high-performance solutions for inverse imaging remains limited. This paper proposes a novel framework, Enhanced Positional Image Priors through High-Order Implicit Representations (HOPE), incorporating high-order interactions between layers within a conventional cascade structure. This approach reduces the spectral bias commonly seen in PIP, enhancing the model's ability to capture both low- and high-frequency components for optimal inverse problem performance. We theoretically demonstrate that HOPE's expanded representational space, narrower convergence range, and improved Neural Tangent Kernel (NTK) diagonal properties enable more precise frequency representations than PIP. Comprehensive experiments across tasks such as signal representation (audio, image, volume) and inverse image processing (denoising, super-resolution, CT reconstruction, inpainting) confirm that HOPE establishes new benchmarks for recovery quality and training efficiency.
深度图像先验(DIP)表明,具有随机初始化和自定义架构的网络可以有效地解决逆成像挑战。尽管DIP具有潜力,但它需要大量的计算资源,而较轻的隐式神经位置图像先验(PIP)由于加剧了光谱偏差,通常会产生过于平滑的解决方案。对轻量化、高性能逆成像解决方案的研究仍然有限。本文提出了一个新的框架,通过高阶隐式表示(HOPE)增强位置图像先验,在传统的级联结构中纳入了层之间的高阶相互作用。这种方法减少了PIP中常见的频谱偏差,增强了模型捕获低频和高频分量的能力,以获得最佳的反问题性能。我们从理论上证明了HOPE扩展的表征空间、更窄的收敛范围和改进的神经切线核(NTK)对角线特性使频率表征比PIP更精确。信号表示(音频、图像、体积)和图像逆处理(去噪、超分辨率、CT重建、修复)等任务的综合实验证实,HOPE为恢复质量和训练效率建立了新的基准。
{"title":"HOPE: Enhanced Position Image Priors via High-Order Implicit Representations.","authors":"Yang Chen,Ruituo Wu,Junhui Hou,Ce Zhu,Yipeng Liu","doi":"10.1109/tip.2025.3607582","DOIUrl":"https://doi.org/10.1109/tip.2025.3607582","url":null,"abstract":"Deep Image Prior (DIP) has shown that networks with stochastic initialization and custom architectures can effectively address inverse imaging challenges. Despite its potential, DIP requires significant computational resources, whereas the lighter Implicit Neural Positional Image Prior (PIP) often yields overly smooth solutions due to exacerbated spectral bias. Research on lightweight, high-performance solutions for inverse imaging remains limited. This paper proposes a novel framework, Enhanced Positional Image Priors through High-Order Implicit Representations (HOPE), incorporating high-order interactions between layers within a conventional cascade structure. This approach reduces the spectral bias commonly seen in PIP, enhancing the model's ability to capture both low- and high-frequency components for optimal inverse problem performance. We theoretically demonstrate that HOPE's expanded representational space, narrower convergence range, and improved Neural Tangent Kernel (NTK) diagonal properties enable more precise frequency representations than PIP. Comprehensive experiments across tasks such as signal representation (audio, image, volume) and inverse image processing (denoising, super-resolution, CT reconstruction, inpainting) confirm that HOPE establishes new benchmarks for recovery quality and training efficiency.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"24 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-Temporal Evolutionary Graph Learning for Brain Network Analysis using Medical Imaging. 基于医学影像的脑网络分析的时空演化图学习。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607633
Shengrong Li,Qi Zhu,Chunwei Tian,Li Zhang,Bo Shen,Chuhang Zheng,Daoqiang Zhang,Wei Shao
Dynamic functional brain network (DFBN) can flexibly describe the time-varying topological connectivity patterns of the brain, and show great potential in brain disease diagnosis. However, most of the existing DFBN analysis methods focus on capturing the dynamic interaction at the brain region level, ignoring the spatio-temporal topological evolution across time windows. Moreover, they are difficult to suppress interfering connections in DFBNs, which leads to a diminished capacity for discerning the intrinsic structures that are intimately linked to brain disorders. To address these issues, we propose a topological evolution graph learning model to capture disease-related spatio-temporal topological features in DFBNs. Specifically, we first take the hubness of adjacent DFBN as the source domain and the target domain in turn, and then use Wasserstein distance (WD) and Gromov-Wasserstein distance (GWD) to capture the brain's evolution law at the node and edge levels, respectively. Furthermore, we introduce the principle of relevant information to guide the topology evolution graph to learn the structures that are most relevant to brain diseases yet least redundant information between adjacent DFBNs. On this basis, we develop a high-order spatio-temporal model with multi-hop graph convolution to collaboratively extract long-range spatial and temporal dependencies from the topological evolution graph. Extensive experiments show that the proposed method outperforms the current state-of-the-art methods, and can effectively reveal the information evolution mechanism between brain regions across windows.
动态脑功能网络(DFBN)可以灵活地描述大脑时变的拓扑连接模式,在脑部疾病诊断中显示出巨大的潜力。然而,现有的DFBN分析方法大多侧重于捕捉脑区水平的动态相互作用,忽略了跨时间窗口的时空拓扑演变。此外,它们很难抑制dfbn中的干扰连接,这导致识别与脑部疾病密切相关的内在结构的能力下降。为了解决这些问题,我们提出了一种拓扑进化图学习模型来捕获dfbn中与疾病相关的时空拓扑特征。具体而言,我们首先将相邻DFBN的hub分别作为源域和目标域,然后利用Wasserstein距离(WD)和Gromov-Wasserstein距离(GWD)分别在节点和边缘水平捕捉大脑的进化规律。此外,我们引入了相关信息原理来引导拓扑进化图,以学习与脑疾病最相关的结构,而相邻dfbn之间的冗余信息最少。在此基础上,提出了一种基于多跳图卷积的高阶时空模型,从拓扑进化图中协同提取远程时空依赖关系。大量实验表明,该方法优于现有的方法,可以有效地揭示跨窗口脑区之间的信息演化机制。
{"title":"Spatio-Temporal Evolutionary Graph Learning for Brain Network Analysis using Medical Imaging.","authors":"Shengrong Li,Qi Zhu,Chunwei Tian,Li Zhang,Bo Shen,Chuhang Zheng,Daoqiang Zhang,Wei Shao","doi":"10.1109/tip.2025.3607633","DOIUrl":"https://doi.org/10.1109/tip.2025.3607633","url":null,"abstract":"Dynamic functional brain network (DFBN) can flexibly describe the time-varying topological connectivity patterns of the brain, and show great potential in brain disease diagnosis. However, most of the existing DFBN analysis methods focus on capturing the dynamic interaction at the brain region level, ignoring the spatio-temporal topological evolution across time windows. Moreover, they are difficult to suppress interfering connections in DFBNs, which leads to a diminished capacity for discerning the intrinsic structures that are intimately linked to brain disorders. To address these issues, we propose a topological evolution graph learning model to capture disease-related spatio-temporal topological features in DFBNs. Specifically, we first take the hubness of adjacent DFBN as the source domain and the target domain in turn, and then use Wasserstein distance (WD) and Gromov-Wasserstein distance (GWD) to capture the brain's evolution law at the node and edge levels, respectively. Furthermore, we introduce the principle of relevant information to guide the topology evolution graph to learn the structures that are most relevant to brain diseases yet least redundant information between adjacent DFBNs. On this basis, we develop a high-order spatio-temporal model with multi-hop graph convolution to collaboratively extract long-range spatial and temporal dependencies from the topological evolution graph. Extensive experiments show that the proposed method outperforms the current state-of-the-art methods, and can effectively reveal the information evolution mechanism between brain regions across windows.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised Text-based Person Search 半监督的基于文本的人物搜索
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607637
Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, Min Zhang
{"title":"Semi-supervised Text-based Person Search","authors":"Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, Min Zhang","doi":"10.1109/tip.2025.3607637","DOIUrl":"https://doi.org/10.1109/tip.2025.3607637","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145072837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1