Multimodal emotion recognition is a task that integrates text, visual, and audio data to holistically infer an individual's emotional state. Existing research predominantly focuses on exploiting modality-specific cues for joint learning, often ignoring the differences between multiple modalities under common goal learning. Due to multimodal heterogeneity, common goal learning inadvertently introduces optimization biases and interaction noise. To address above challenges, we propose a novel approach named Gradient and Structure Consistency (GSCon). Our strategy operates at both overall and individual levels to consider balance optimization and effective interaction respectively. At the overall level, to avoid the optimization suppression of a modality on other modalities, we construct a balanced gradient direction that aligns each modality's optimization direction, ensuring unbiased convergence. Simultaneously, at the individual level, to avoid the interaction noise caused by multimodal alignment, we align the spatial structure of samples in different modalities. The spatial structure of the samples will not differ due to modal heterogeneity, achieving effective inter-modal interaction. Extensive experiments on multimodal emotion recognition and multimodal intention understanding datasets demonstrate the effectiveness of the proposed method. Code is available at https://github.com/ShiQingHongYa/GSCon.
{"title":"Gradient and Structure Consistency in Multimodal Emotion Recognition.","authors":"QingHongYa Shi,Mang Ye,Wenke Huang,Bo Du,Xiaofen Zong","doi":"10.1109/tip.2025.3608664","DOIUrl":"https://doi.org/10.1109/tip.2025.3608664","url":null,"abstract":"Multimodal emotion recognition is a task that integrates text, visual, and audio data to holistically infer an individual's emotional state. Existing research predominantly focuses on exploiting modality-specific cues for joint learning, often ignoring the differences between multiple modalities under common goal learning. Due to multimodal heterogeneity, common goal learning inadvertently introduces optimization biases and interaction noise. To address above challenges, we propose a novel approach named Gradient and Structure Consistency (GSCon). Our strategy operates at both overall and individual levels to consider balance optimization and effective interaction respectively. At the overall level, to avoid the optimization suppression of a modality on other modalities, we construct a balanced gradient direction that aligns each modality's optimization direction, ensuring unbiased convergence. Simultaneously, at the individual level, to avoid the interaction noise caused by multimodal alignment, we align the spatial structure of samples in different modalities. The spatial structure of the samples will not differ due to modal heterogeneity, achieving effective inter-modal interaction. Extensive experiments on multimodal emotion recognition and multimodal intention understanding datasets demonstrate the effectiveness of the proposed method. Code is available at https://github.com/ShiQingHongYa/GSCon.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"6 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image Super-Resolution (SR) has seen remarkable progress with the emergence of transformer-based architectures. However, due to the high computational cost, many existing transformer-based SR methods limit their attention to local windows, which hinders their ability to model long-range dependencies and global structures. To address these challenges, we propose a novel SR framework named Semantic-Driven Global-Local Fusion Transformer (SGLFT). The proposed model enhances the receptive field by combining a Hybrid Window Transformer (HWT) and a Scalable Transformer Module (STM) to jointly capture local textures and global context. To further strengthen the semantic consistency of reconstruction, we introduce a Semantic Extraction Module (SEM) that distills high-level semantic priors from the input. These semantic cues are adaptively integrated with visual features through an Adaptive Feature Fusion Semantic Integration Module (AFFSIM). Extensive experiments on standard benchmarks demonstrate the effectiveness of SGLFT in producing visually faithful and structurally consistent SR results. The code will be available at https://github.com/kbzhang0505/SGLFT.
{"title":"Semantic-Driven Global-Local Fusion Transformer for Image Super-Resolution.","authors":"Kaibing Zhang,Zhouwei Cheng,Xin He,Jie Li,Xinbo Gao","doi":"10.1109/tip.2025.3609106","DOIUrl":"https://doi.org/10.1109/tip.2025.3609106","url":null,"abstract":"Image Super-Resolution (SR) has seen remarkable progress with the emergence of transformer-based architectures. However, due to the high computational cost, many existing transformer-based SR methods limit their attention to local windows, which hinders their ability to model long-range dependencies and global structures. To address these challenges, we propose a novel SR framework named Semantic-Driven Global-Local Fusion Transformer (SGLFT). The proposed model enhances the receptive field by combining a Hybrid Window Transformer (HWT) and a Scalable Transformer Module (STM) to jointly capture local textures and global context. To further strengthen the semantic consistency of reconstruction, we introduce a Semantic Extraction Module (SEM) that distills high-level semantic priors from the input. These semantic cues are adaptively integrated with visual features through an Adaptive Feature Fusion Semantic Integration Module (AFFSIM). Extensive experiments on standard benchmarks demonstrate the effectiveness of SGLFT in producing visually faithful and structurally consistent SR results. The code will be available at https://github.com/kbzhang0505/SGLFT.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"22 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16DOI: 10.1109/tip.2025.3607628
Han Xu,Xunpeng Yi,Chen Lu,Guangcan Liu,Jiayi Ma
When dealing with low-quality source images, existing image fusion methods either fail to handle degradations or are restricted to specific degradations. This study proposes an unsupervised unified degradation-robust image fusion network, termed as URFusion, in which various types of degradations can be uniformly eliminated during the fusion process, leading to high-quality fused images. URFusion is composed of three core modules: intrinsic content extraction, intrinsic content fusion, and appearance representation learning and assignment. It first extracts degradation-free intrinsic content features from images affected by various degradations. These content features then provide feature-level rather than image-level fusion constraints for optimizing the fusion network, effectively eliminating degradation residues and reliance on ground truth. Finally, URFusion learns the appearance representation of images and assign the statistical appearance representation of high-quality images to the content-fused result, producing the final high-quality fused image. Extensive experiments on multi-exposure image fusion and multi-modal image fusion tasks demonstrate the advantages of URFusion in fusion performance and suppression of multiple types of degradations. The code is available at https://github.com/hanna-xu/URFusion.
{"title":"URFusion: Unsupervised Unified Degradation-Robust Image Fusion Network.","authors":"Han Xu,Xunpeng Yi,Chen Lu,Guangcan Liu,Jiayi Ma","doi":"10.1109/tip.2025.3607628","DOIUrl":"https://doi.org/10.1109/tip.2025.3607628","url":null,"abstract":"When dealing with low-quality source images, existing image fusion methods either fail to handle degradations or are restricted to specific degradations. This study proposes an unsupervised unified degradation-robust image fusion network, termed as URFusion, in which various types of degradations can be uniformly eliminated during the fusion process, leading to high-quality fused images. URFusion is composed of three core modules: intrinsic content extraction, intrinsic content fusion, and appearance representation learning and assignment. It first extracts degradation-free intrinsic content features from images affected by various degradations. These content features then provide feature-level rather than image-level fusion constraints for optimizing the fusion network, effectively eliminating degradation residues and reliance on ground truth. Finally, URFusion learns the appearance representation of images and assign the statistical appearance representation of high-quality images to the content-fused result, producing the final high-quality fused image. Extensive experiments on multi-exposure image fusion and multi-modal image fusion tasks demonstrate the advantages of URFusion in fusion performance and suppression of multiple types of degradations. The code is available at https://github.com/hanna-xu/URFusion.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"17 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16DOI: 10.1109/tip.2025.3607585
Zhiying Jiang,Zengxi Zhang,Jinyuan Liu
Infrared and visible image alignment is essential and critical to the fusion and multi-modal perception applications. It addresses discrepancies in position and scale caused by spectral properties and environmental variations, ensuring precise pixel correspondence and spatial consistency. Existing manual calibration requires regular maintenance and exhibits poor portability, challenging the adaptability of multi-modal application in dynamic environments. In this paper, we propose a harmonized representation based infrared and visible image alignment, achieving both high accuracy and scene adaptability. Specifically, with regard to the disparity between multi-modal images, we develop an invertible translation process to establish a harmonized representation domain that effectively encapsulates the feature intensity and distribution of both infrared and visible modalities. Building on this, we design a hierarchical framework to correct deformations inferred from the harmonized domain in a coarse-to-fine manner. Our framework leverages advanced perception capabilities alongside residual estimation to enable accurate regression of sparse offsets, while an alternate correlation search mechanism ensures precise correspondence matching. Furthermore, we propose the first ground truth available misaligned infrared and visible image benchmark for evaluation. Extensive experiments validate the effectiveness of the proposed method against the state-of-the-arts, advancing the subsequent applications further.
{"title":"Harmonized Domain Enabled Alternate Search for Infrared and Visible Image Alignment.","authors":"Zhiying Jiang,Zengxi Zhang,Jinyuan Liu","doi":"10.1109/tip.2025.3607585","DOIUrl":"https://doi.org/10.1109/tip.2025.3607585","url":null,"abstract":"Infrared and visible image alignment is essential and critical to the fusion and multi-modal perception applications. It addresses discrepancies in position and scale caused by spectral properties and environmental variations, ensuring precise pixel correspondence and spatial consistency. Existing manual calibration requires regular maintenance and exhibits poor portability, challenging the adaptability of multi-modal application in dynamic environments. In this paper, we propose a harmonized representation based infrared and visible image alignment, achieving both high accuracy and scene adaptability. Specifically, with regard to the disparity between multi-modal images, we develop an invertible translation process to establish a harmonized representation domain that effectively encapsulates the feature intensity and distribution of both infrared and visible modalities. Building on this, we design a hierarchical framework to correct deformations inferred from the harmonized domain in a coarse-to-fine manner. Our framework leverages advanced perception capabilities alongside residual estimation to enable accurate regression of sparse offsets, while an alternate correlation search mechanism ensures precise correspondence matching. Furthermore, we propose the first ground truth available misaligned infrared and visible image benchmark for evaluation. Extensive experiments validate the effectiveness of the proposed method against the state-of-the-arts, advancing the subsequent applications further.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"50 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data. Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR). In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs. FRANCK comprises four key components: (1) an Objectness Score-based Sample Reweighting (OSSR) module that computes attention-based objectness scores on multi-scale encoder feature maps, reweighting the detection loss to emphasize less-recognized regions; (2) a Contrastive Learning with Matching-based Memory Bank (CMMB) module that integrates multi-level features into memory banks, enhancing class-wise contrastive learning; (3) an Uncertainty-weighted Query-fused Feature Distillation (UQFD) module that improves feature distillation through prediction quality reweighting and query feature fusion; and (4) an improved self-training pipeline with a Dynamic Teacher Updating Interval (DTUI) that optimizes pseudo-label quality. By leveraging these components, FRANCK effectively adapts a source-pretrained DETR model to a target domain with enhanced robustness and generalization. Extensive experiments on several widely used benchmarks demonstrate that our method achieves state-of-the-art performance, highlighting its effectiveness and compatibility with DETR-based SFOD models.
{"title":"Source-Free Object Detection with Detection Transformer.","authors":"Huizai Yao,Sicheng Zhao,Shuo Lu,Hui Chen,Yangyang Li,Guoping Liu,Tengfei Xing,Chenggang Yan,Jianhua Tao,Guiguang Ding","doi":"10.1109/tip.2025.3607621","DOIUrl":"https://doi.org/10.1109/tip.2025.3607621","url":null,"abstract":"Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data. Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR). In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs. FRANCK comprises four key components: (1) an Objectness Score-based Sample Reweighting (OSSR) module that computes attention-based objectness scores on multi-scale encoder feature maps, reweighting the detection loss to emphasize less-recognized regions; (2) a Contrastive Learning with Matching-based Memory Bank (CMMB) module that integrates multi-level features into memory banks, enhancing class-wise contrastive learning; (3) an Uncertainty-weighted Query-fused Feature Distillation (UQFD) module that improves feature distillation through prediction quality reweighting and query feature fusion; and (4) an improved self-training pipeline with a Dynamic Teacher Updating Interval (DTUI) that optimizes pseudo-label quality. By leveraging these components, FRANCK effectively adapts a source-pretrained DETR model to a target domain with enhanced robustness and generalization. Extensive experiments on several widely used benchmarks demonstrate that our method achieves state-of-the-art performance, highlighting its effectiveness and compatibility with DETR-based SFOD models.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"30 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16DOI: 10.1109/tip.2025.3607623
Xilai Li,Xiaosong Li,Tianshu Tan,Huafeng Li,Tao Ye
Infrared and visible image fusion has emerged as a prominent research area in computer vision. However, little attention has been paid to complex scenes fusion, leading to sub-optimal results under interference. To fill this gap, we propose a unified framework for infrared and visible images fusion in complex scenes, termed UMCFuse. Specifically, we classify the pixels of visible images from the degree of scattering of light transmission, allowing us to separate fine details from overall intensity. Maintaining a balance between interference removal and detail preservation is essential for the generalization capacity of the proposed method. Therefore, we propose an adaptive denoising strategy for the fusion of detail layers. Meanwhile, we fuse the energy features from different modalities by analyzing them from multiple directions. Extensive fusion experiments on real and synthetic complex scenes datasets cover adverse weather conditions, noise, blur, overexposure, fire, as well as downstream tasks including semantic segmentation, object detection, salient object detection, and depth estimation, consistently indicate the superiority of the proposed method compared with the recent representative methods. Our code is available at https://github.com/ixilai/UMCFuse.
{"title":"UMCFuse: A Unified Multiple Complex Scenes Infrared and Visible Image Fusion Framework.","authors":"Xilai Li,Xiaosong Li,Tianshu Tan,Huafeng Li,Tao Ye","doi":"10.1109/tip.2025.3607623","DOIUrl":"https://doi.org/10.1109/tip.2025.3607623","url":null,"abstract":"Infrared and visible image fusion has emerged as a prominent research area in computer vision. However, little attention has been paid to complex scenes fusion, leading to sub-optimal results under interference. To fill this gap, we propose a unified framework for infrared and visible images fusion in complex scenes, termed UMCFuse. Specifically, we classify the pixels of visible images from the degree of scattering of light transmission, allowing us to separate fine details from overall intensity. Maintaining a balance between interference removal and detail preservation is essential for the generalization capacity of the proposed method. Therefore, we propose an adaptive denoising strategy for the fusion of detail layers. Meanwhile, we fuse the energy features from different modalities by analyzing them from multiple directions. Extensive fusion experiments on real and synthetic complex scenes datasets cover adverse weather conditions, noise, blur, overexposure, fire, as well as downstream tasks including semantic segmentation, object detection, salient object detection, and depth estimation, consistently indicate the superiority of the proposed method compared with the recent representative methods. Our code is available at https://github.com/ixilai/UMCFuse.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"64 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16DOI: 10.1109/tip.2025.3607582
Yang Chen,Ruituo Wu,Junhui Hou,Ce Zhu,Yipeng Liu
Deep Image Prior (DIP) has shown that networks with stochastic initialization and custom architectures can effectively address inverse imaging challenges. Despite its potential, DIP requires significant computational resources, whereas the lighter Implicit Neural Positional Image Prior (PIP) often yields overly smooth solutions due to exacerbated spectral bias. Research on lightweight, high-performance solutions for inverse imaging remains limited. This paper proposes a novel framework, Enhanced Positional Image Priors through High-Order Implicit Representations (HOPE), incorporating high-order interactions between layers within a conventional cascade structure. This approach reduces the spectral bias commonly seen in PIP, enhancing the model's ability to capture both low- and high-frequency components for optimal inverse problem performance. We theoretically demonstrate that HOPE's expanded representational space, narrower convergence range, and improved Neural Tangent Kernel (NTK) diagonal properties enable more precise frequency representations than PIP. Comprehensive experiments across tasks such as signal representation (audio, image, volume) and inverse image processing (denoising, super-resolution, CT reconstruction, inpainting) confirm that HOPE establishes new benchmarks for recovery quality and training efficiency.
{"title":"HOPE: Enhanced Position Image Priors via High-Order Implicit Representations.","authors":"Yang Chen,Ruituo Wu,Junhui Hou,Ce Zhu,Yipeng Liu","doi":"10.1109/tip.2025.3607582","DOIUrl":"https://doi.org/10.1109/tip.2025.3607582","url":null,"abstract":"Deep Image Prior (DIP) has shown that networks with stochastic initialization and custom architectures can effectively address inverse imaging challenges. Despite its potential, DIP requires significant computational resources, whereas the lighter Implicit Neural Positional Image Prior (PIP) often yields overly smooth solutions due to exacerbated spectral bias. Research on lightweight, high-performance solutions for inverse imaging remains limited. This paper proposes a novel framework, Enhanced Positional Image Priors through High-Order Implicit Representations (HOPE), incorporating high-order interactions between layers within a conventional cascade structure. This approach reduces the spectral bias commonly seen in PIP, enhancing the model's ability to capture both low- and high-frequency components for optimal inverse problem performance. We theoretically demonstrate that HOPE's expanded representational space, narrower convergence range, and improved Neural Tangent Kernel (NTK) diagonal properties enable more precise frequency representations than PIP. Comprehensive experiments across tasks such as signal representation (audio, image, volume) and inverse image processing (denoising, super-resolution, CT reconstruction, inpainting) confirm that HOPE establishes new benchmarks for recovery quality and training efficiency.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"24 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic functional brain network (DFBN) can flexibly describe the time-varying topological connectivity patterns of the brain, and show great potential in brain disease diagnosis. However, most of the existing DFBN analysis methods focus on capturing the dynamic interaction at the brain region level, ignoring the spatio-temporal topological evolution across time windows. Moreover, they are difficult to suppress interfering connections in DFBNs, which leads to a diminished capacity for discerning the intrinsic structures that are intimately linked to brain disorders. To address these issues, we propose a topological evolution graph learning model to capture disease-related spatio-temporal topological features in DFBNs. Specifically, we first take the hubness of adjacent DFBN as the source domain and the target domain in turn, and then use Wasserstein distance (WD) and Gromov-Wasserstein distance (GWD) to capture the brain's evolution law at the node and edge levels, respectively. Furthermore, we introduce the principle of relevant information to guide the topology evolution graph to learn the structures that are most relevant to brain diseases yet least redundant information between adjacent DFBNs. On this basis, we develop a high-order spatio-temporal model with multi-hop graph convolution to collaboratively extract long-range spatial and temporal dependencies from the topological evolution graph. Extensive experiments show that the proposed method outperforms the current state-of-the-art methods, and can effectively reveal the information evolution mechanism between brain regions across windows.
{"title":"Spatio-Temporal Evolutionary Graph Learning for Brain Network Analysis using Medical Imaging.","authors":"Shengrong Li,Qi Zhu,Chunwei Tian,Li Zhang,Bo Shen,Chuhang Zheng,Daoqiang Zhang,Wei Shao","doi":"10.1109/tip.2025.3607633","DOIUrl":"https://doi.org/10.1109/tip.2025.3607633","url":null,"abstract":"Dynamic functional brain network (DFBN) can flexibly describe the time-varying topological connectivity patterns of the brain, and show great potential in brain disease diagnosis. However, most of the existing DFBN analysis methods focus on capturing the dynamic interaction at the brain region level, ignoring the spatio-temporal topological evolution across time windows. Moreover, they are difficult to suppress interfering connections in DFBNs, which leads to a diminished capacity for discerning the intrinsic structures that are intimately linked to brain disorders. To address these issues, we propose a topological evolution graph learning model to capture disease-related spatio-temporal topological features in DFBNs. Specifically, we first take the hubness of adjacent DFBN as the source domain and the target domain in turn, and then use Wasserstein distance (WD) and Gromov-Wasserstein distance (GWD) to capture the brain's evolution law at the node and edge levels, respectively. Furthermore, we introduce the principle of relevant information to guide the topology evolution graph to learn the structures that are most relevant to brain diseases yet least redundant information between adjacent DFBNs. On this basis, we develop a high-order spatio-temporal model with multi-hop graph convolution to collaboratively extract long-range spatial and temporal dependencies from the topological evolution graph. Extensive experiments show that the proposed method outperforms the current state-of-the-art methods, and can effectively reveal the information evolution mechanism between brain regions across windows.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16DOI: 10.1109/tip.2025.3607637
Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, Min Zhang
{"title":"Semi-supervised Text-based Person Search","authors":"Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, Min Zhang","doi":"10.1109/tip.2025.3607637","DOIUrl":"https://doi.org/10.1109/tip.2025.3607637","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145072837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}