首页 > 最新文献

Information Fusion最新文献

英文 中文
Retrieval-Augmented Dialogue Knowledge Aggregation for expressive conversational speech synthesis
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-18 DOI: 10.1016/j.inffus.2025.102948
Rui Liu , Zhenqi Jia , Feilong Bao , Haizhou Li
Conversational speech synthesis (CSS) aims to take the current dialogue (CD) history as a reference to synthesize expressive speech that aligns with the conversational style. Unlike CD, stored dialogue (SD) contains preserved dialogue fragments from earlier stages of user–agent interaction, which include style expression knowledge relevant to scenarios similar to those in CD. Note that this knowledge plays a significant role in enabling the agent to synthesize expressive conversational speech that generates empathetic feedback. However, prior research has overlooked this aspect. To address this issue, we propose a novel Retrieval-Augmented Dialogue Knowledge Aggregation scheme for expressive CSS, termed RADKA-CSS, which includes three main components: (1) To effectively retrieve dialogues from SD that are similar to CD in terms of both semantic and style. First, we build a stored dialogue semantic-style database (SDSSD) which includes the text and audio samples. Then, we design a multi-attribute retrieval scheme to match the dialogue semantic and style vectors of the CD with the stored dialogue semantic and style vectors in the SDSSD, retrieving the most similar dialogues. (2) To effectively utilize the style knowledge from CD and SD, we propose adopting the multi-granularity graph structure to encode the dialogue and introducing a multi-source style knowledge aggregation mechanism. (3) Finally, the aggregated style knowledge are fed into the speech synthesizer to help the agent synthesize expressive speech that aligns with the conversational style. We conducted a comprehensive and in-depth experiment based on the DailyTalk dataset, which is a benchmarking dataset for the CSS task. Both objective and subjective evaluations demonstrate that RADKA-CSS outperforms baseline models in expressiveness rendering. Code and audio samples can be found at: https://github.com/Coder-jzq/RADKA-CSS.
{"title":"Retrieval-Augmented Dialogue Knowledge Aggregation for expressive conversational speech synthesis","authors":"Rui Liu ,&nbsp;Zhenqi Jia ,&nbsp;Feilong Bao ,&nbsp;Haizhou Li","doi":"10.1016/j.inffus.2025.102948","DOIUrl":"10.1016/j.inffus.2025.102948","url":null,"abstract":"<div><div>Conversational speech synthesis (CSS) aims to take the current dialogue (CD) history as a reference to synthesize expressive speech that aligns with the conversational style. Unlike CD, stored dialogue (SD) contains preserved dialogue fragments from earlier stages of user–agent interaction, which include style expression knowledge relevant to scenarios similar to those in CD. Note that this knowledge plays a significant role in enabling the agent to synthesize expressive conversational speech that generates empathetic feedback. However, prior research has overlooked this aspect. To address this issue, we propose a novel <strong>R</strong>etrieval-<strong>A</strong>ugmented <strong>D</strong>ialogue <strong>K</strong>nowledge <strong>A</strong>ggregation scheme for expressive CSS, termed <strong>RADKA-CSS</strong>, which includes three main components: (1) To effectively retrieve dialogues from SD that are similar to CD in terms of both semantic and style. First, we build a stored dialogue semantic-style database (SDSSD) which includes the text and audio samples. Then, we design a multi-attribute retrieval scheme to match the dialogue semantic and style vectors of the CD with the stored dialogue semantic and style vectors in the SDSSD, retrieving the most similar dialogues. (2) To effectively utilize the style knowledge from CD and SD, we propose adopting the multi-granularity graph structure to encode the dialogue and introducing a multi-source style knowledge aggregation mechanism. (3) Finally, the aggregated style knowledge are fed into the speech synthesizer to help the agent synthesize expressive speech that aligns with the conversational style. We conducted a comprehensive and in-depth experiment based on the DailyTalk dataset, which is a benchmarking dataset for the CSS task. Both objective and subjective evaluations demonstrate that RADKA-CSS outperforms baseline models in expressiveness rendering. Code and audio samples can be found at: <span><span>https://github.com/Coder-jzq/RADKA-CSS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102948"},"PeriodicalIF":14.7,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seeing helps hearing: A multi-modal dataset and a mamba-based dual branch parallel network for auditory attention decoding
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-18 DOI: 10.1016/j.inffus.2025.102946
Cunhang Fan , Hongyu Zhang , Qinke Ni , Jingjing Zhang , Jianhua Tao , Jian Zhou , Jiangyan Yi , Zhao Lv , Xiaopei Wu
EEG-based auditory attention decoding (AAD) aims to identify the attended speaker from the listener’s EEG signals. Existing datasets mainly focus on auditory stimuli, ignoring real-world multi-modal inputs. To address this, a new multi-modal AAD dataset (MM-AAD) is constructed, representing the first dataset to include audio–visual stimuli. Additionally, prior studies mostly extract single-domain features, neglecting complementary temporal and frequency domain information, which can perform well in the within-trial setting but poorly in the cross-trial setting. Therefore, a framework called Mamba-based dual branch parallel network (M-DBPNet) is proposed, effectively fusing temporal and frequency domain features. By adding Mamba, temporal features in time sequence signals are better extracted. Experimental results show that Mamba enhances decoding performance in the within-trial setting with fewer parameters and demonstrates strong generalization in the cross-trial setting. Visualization analysis indicates that visual stimuli strengthen evoked responses and activation in temporal and occipital lobes, enhancing auditory perception and decoding performance.
{"title":"Seeing helps hearing: A multi-modal dataset and a mamba-based dual branch parallel network for auditory attention decoding","authors":"Cunhang Fan ,&nbsp;Hongyu Zhang ,&nbsp;Qinke Ni ,&nbsp;Jingjing Zhang ,&nbsp;Jianhua Tao ,&nbsp;Jian Zhou ,&nbsp;Jiangyan Yi ,&nbsp;Zhao Lv ,&nbsp;Xiaopei Wu","doi":"10.1016/j.inffus.2025.102946","DOIUrl":"10.1016/j.inffus.2025.102946","url":null,"abstract":"<div><div>EEG-based auditory attention decoding (AAD) aims to identify the attended speaker from the listener’s EEG signals. Existing datasets mainly focus on auditory stimuli, ignoring real-world multi-modal inputs. To address this, a new multi-modal AAD dataset (MM-AAD) is constructed, representing the first dataset to include audio–visual stimuli. Additionally, prior studies mostly extract single-domain features, neglecting complementary temporal and frequency domain information, which can perform well in the within-trial setting but poorly in the cross-trial setting. Therefore, a framework called Mamba-based dual branch parallel network (M-DBPNet) is proposed, effectively fusing temporal and frequency domain features. By adding Mamba, temporal features in time sequence signals are better extracted. Experimental results show that Mamba enhances decoding performance in the within-trial setting with fewer parameters and demonstrates strong generalization in the cross-trial setting. Visualization analysis indicates that visual stimuli strengthen evoked responses and activation in temporal and occipital lobes, enhancing auditory perception and decoding performance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102946"},"PeriodicalIF":14.7,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A static event-triggered background-impulse Kalman filter for wireless sensor networks with non-Gaussian measurement noise
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-18 DOI: 10.1016/j.inffus.2025.102955
Xinkai You, Kangqi Xiao, Gang Wang
Event-triggered mechanisms (ETMs) have received increasing attention since they provide a way to reduce the communication burden by preventing sensors from transmitting unnecessary measurement values. This article focuses on the problem of a static ETM-based Kalman filter (static ET-KF) failing to work in the case of non-Gaussian measurement noise. To tackle this problem, we combine the static ETM with a background-impulse Kalman filter (BIKF) where the non-Gaussian noise is modeled as a Gaussian mixture model, composed of background noise and impulse noise. First, we make modifications to BIKF to facilitate its integration with the static ETM. Based on this, we propose a static event-triggered background-impulse Kalman filter (static ETBIKF) algorithm for a single sensor. Then we extend the static ETBIKF to the fusion form used for wireless sensor networks. The existing static ET-KF is a special case of our static ETBIKF. Simulations show that the proposed algorithms perform better than static ET-KF under non-Gaussian environments and the communication-saving can reach 45.64% at most.
{"title":"A static event-triggered background-impulse Kalman filter for wireless sensor networks with non-Gaussian measurement noise","authors":"Xinkai You,&nbsp;Kangqi Xiao,&nbsp;Gang Wang","doi":"10.1016/j.inffus.2025.102955","DOIUrl":"10.1016/j.inffus.2025.102955","url":null,"abstract":"<div><div>Event-triggered mechanisms (ETMs) have received increasing attention since they provide a way to reduce the communication burden by preventing sensors from transmitting unnecessary measurement values. This article focuses on the problem of a static ETM-based Kalman filter (static ET-KF) failing to work in the case of non-Gaussian measurement noise. To tackle this problem, we combine the static ETM with a background-impulse Kalman filter (BIKF) where the non-Gaussian noise is modeled as a Gaussian mixture model, composed of background noise and impulse noise. First, we make modifications to BIKF to facilitate its integration with the static ETM. Based on this, we propose a static event-triggered background-impulse Kalman filter (static ETBIKF) algorithm for a single sensor. Then we extend the static ETBIKF to the fusion form used for wireless sensor networks. The existing static ET-KF is a special case of our static ETBIKF. Simulations show that the proposed algorithms perform better than static ET-KF under non-Gaussian environments and the communication-saving can reach 45.64% at most.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102955"},"PeriodicalIF":14.7,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Degradation-Decoupled and semantic-aggregated cross-space fusion for underwater image enhancement
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-16 DOI: 10.1016/j.inffus.2024.102927
Xinwei Xue , Jincheng Yuan , Tianjiao Ma , Long Ma , Qi Jia , Jinjia Zhou , Yi Wang
The enhancement of underwater imaging has recently garnered significant attention due to the development of marine resources. Complex underwater environments cause images to suffer from various degradations, such as color casts and haze effects. These degradation factors are tangled in the original color space, making them challenging to eliminate using existing methods. Moreover, current underwater image enhancement techniques focus solely on visual quality improvement without considering downstream semantic understanding tasks, potentially impacting subsequent applications. To address these issues, we propose a Scene-Adapted Semantic-Aggregated Degradation-Decoupling (S2D2) framework. Our approach mainly consists of two components: Degradation-Decoupled Color Space Translation and Semantic-Aggregated Cross-Space Fusion. In the Degradation-Decoupled Color Space Translation module, We introduces a learnable color space, termed the Underwater Scenes Orient (USO) color space, enabling the separation of degradation factors specific to different underwater scenes. And we eliminate degradation factors using a parallel architecture consisting of a model-inspired haze-removal module and a data-driven color-adaptation module. And we then design the Semantic-Aggregated Cross-Space Fusion module, which aggregates semantic information extracted from VGG-based models with features, enhancing performance in both visual quality and semantic-related vision tasks. Extensive experiments demonstrate that the proposed method significantly outperforms existing techniques both quantitatively and qualitatively across multiple benchmarks. Furthermore, the integration of fused features results in superior performance in salient object detection, highlighting the effectiveness of our fusion approach for semantic-related vision tasks.
{"title":"Degradation-Decoupled and semantic-aggregated cross-space fusion for underwater image enhancement","authors":"Xinwei Xue ,&nbsp;Jincheng Yuan ,&nbsp;Tianjiao Ma ,&nbsp;Long Ma ,&nbsp;Qi Jia ,&nbsp;Jinjia Zhou ,&nbsp;Yi Wang","doi":"10.1016/j.inffus.2024.102927","DOIUrl":"10.1016/j.inffus.2024.102927","url":null,"abstract":"<div><div>The enhancement of underwater imaging has recently garnered significant attention due to the development of marine resources. Complex underwater environments cause images to suffer from various degradations, such as color casts and haze effects. These degradation factors are tangled in the original color space, making them challenging to eliminate using existing methods. Moreover, current underwater image enhancement techniques focus solely on visual quality improvement without considering downstream semantic understanding tasks, potentially impacting subsequent applications. To address these issues, we propose a Scene-Adapted Semantic-Aggregated Degradation-Decoupling (S2D2) framework. Our approach mainly consists of two components: Degradation-Decoupled Color Space Translation and Semantic-Aggregated Cross-Space Fusion. In the Degradation-Decoupled Color Space Translation module, We introduces a learnable color space, termed the Underwater Scenes Orient (USO) color space, enabling the separation of degradation factors specific to different underwater scenes. And we eliminate degradation factors using a parallel architecture consisting of a model-inspired haze-removal module and a data-driven color-adaptation module. And we then design the Semantic-Aggregated Cross-Space Fusion module, which aggregates semantic information extracted from VGG-based models with features, enhancing performance in both visual quality and semantic-related vision tasks. Extensive experiments demonstrate that the proposed method significantly outperforms existing techniques both quantitatively and qualitatively across multiple benchmarks. Furthermore, the integration of fused features results in superior performance in salient object detection, highlighting the effectiveness of our fusion approach for semantic-related vision tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102927"},"PeriodicalIF":14.7,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143170351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DF-BSFNet: A bilateral synergistic fusion network with novel dynamic flow convolution for robust road extraction DF-BSFNet:一种具有新型动态流卷积的双边协同融合网络,用于鲁棒道路提取
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-15 DOI: 10.1016/j.inffus.2025.102958
Chong Zhang , Huazu Zhang , Xiaogang Guo , Heng Qi , Zilong Zhao , Luliang Tang
Accurate and robust road extraction with good continuity and completeness is crucial for the development of smart city and intelligent transportation. Remote sensing images and vehicle trajectories are attractive data sources with rich and complementary multimodal road information, and the fusion of them promises to significantly promote the performance of road extraction. However, existing studies on fusion-based road extraction suffer from the problems that the feature extraction modules pay little attention to the inherent morphology of roads, and the multimodal feature fusion techniques are too simple and superficial to fully and efficiently exploit the complementary information from different data sources, resulting in road predictions with poor continuity and limited performance. To this end, we propose a Bilateral Synergistic Fusion network with novel Dynamic Flow convolution, termed DF-BSFNet, which fully leverages the complementary road information from images and trajectories in a dual-mutual adaptive guidance and incremental refinement manner. First, we propose a novel Dynamic Flow Convolution (DFConv) that more adeptly and consciously captures the elongated and winding “flow” morphology of roads in complex scenarios, providing flexible and powerful capabilities for learning detail-heavy and robust road feature representations. Second, we develop two parallel modality-specific feature extractors with DFConv to extract hierarchical road features specific to images and trajectories, effectively exploiting the distinctive advantages of each modality. Third, we propose a Bilateral Synergistic Adaptive Feature Fusion (BSAFF) module which synthesizes the global-context and local-context of complementary multimodal road information and achieves a sophisticated feature fusion with dynamic guided-propagation and dual-mutual refinement. Extensive experiments on three road datasets demonstrate that our DF-BSFNet outperforms current state-of-the-art methods by a large margin in terms of continuity and accuracy.
准确、稳健、连续性和完整性好的道路提取对于智慧城市和智能交通的发展至关重要。遥感图像和车辆轨迹是具有丰富互补的多模式道路信息的有吸引力的数据源,它们的融合有望显著提高道路提取的性能。然而,现有基于融合的道路提取研究存在特征提取模块对道路固有形态关注不足,多模态特征融合技术过于简单和肤浅,无法充分有效地利用不同数据源的互补信息,导致道路预测连续性差,性能有限等问题。为此,我们提出了一种具有新型动态流卷积的双边协同融合网络DF-BSFNet,该网络以双向自适应引导和增量细化的方式充分利用了图像和轨迹的互补道路信息。首先,我们提出了一种新的动态流卷积(DFConv),它更熟练和有意识地捕捉复杂场景中道路的细长和蜿蜒的“流”形态,为学习重细节和鲁棒的道路特征表示提供了灵活而强大的能力。其次,我们利用DFConv开发了两个并行的特定于模态的特征提取器,以提取特定于图像和轨迹的分层道路特征,有效地利用了每种模态的独特优势。第三,我们提出了一种双边协同自适应特征融合(BSAFF)模块,该模块综合了互补的多模式道路信息的全局上下文和局部上下文,实现了动态引导传播和双向细化的复杂特征融合。在三个道路数据集上进行的大量实验表明,我们的DF-BSFNet在连续性和准确性方面大大优于当前最先进的方法。
{"title":"DF-BSFNet: A bilateral synergistic fusion network with novel dynamic flow convolution for robust road extraction","authors":"Chong Zhang ,&nbsp;Huazu Zhang ,&nbsp;Xiaogang Guo ,&nbsp;Heng Qi ,&nbsp;Zilong Zhao ,&nbsp;Luliang Tang","doi":"10.1016/j.inffus.2025.102958","DOIUrl":"10.1016/j.inffus.2025.102958","url":null,"abstract":"<div><div>Accurate and robust road extraction with good continuity and completeness is crucial for the development of smart city and intelligent transportation. Remote sensing images and vehicle trajectories are attractive data sources with rich and complementary multimodal road information, and the fusion of them promises to significantly promote the performance of road extraction. However, existing studies on fusion-based road extraction suffer from the problems that the feature extraction modules pay little attention to the inherent morphology of roads, and the multimodal feature fusion techniques are too simple and superficial to fully and efficiently exploit the complementary information from different data sources, resulting in road predictions with poor continuity and limited performance. To this end, we propose a <strong>B</strong>ilateral <strong>S</strong>ynergistic <strong>F</strong>usion network with novel <strong>D</strong>ynamic <strong>F</strong>low convolution, termed DF-BSFNet, which fully leverages the complementary road information from images and trajectories in a dual-mutual adaptive guidance and incremental refinement manner. First, we propose a novel Dynamic Flow Convolution (DFConv) that more adeptly and consciously captures the elongated and winding “flow” morphology of roads in complex scenarios, providing flexible and powerful capabilities for learning detail-heavy and robust road feature representations. Second, we develop two parallel modality-specific feature extractors with DFConv to extract hierarchical road features specific to images and trajectories, effectively exploiting the distinctive advantages of each modality. Third, we propose a Bilateral Synergistic Adaptive Feature Fusion (BSAFF) module which synthesizes the global-context and local-context of complementary multimodal road information and achieves a sophisticated feature fusion with dynamic guided-propagation and dual-mutual refinement. Extensive experiments on three road datasets demonstrate that our DF-BSFNet outperforms current state-of-the-art methods by a large margin in terms of continuity and accuracy.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102958"},"PeriodicalIF":14.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142990593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation KDFuse:一种基于跨领域知识蒸馏的高级视觉任务驱动的红外和可见光图像融合方法
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-14 DOI: 10.1016/j.inffus.2025.102944
Chenjia Yang , Xiaoqing Luo , Zhancheng Zhang , Zhiguo Chen , Xiao-jun Wu
To enhance the comprehensiveness of fusion features and meet the requirements of high-level vision tasks, some fusion methods attempt to coordinate the fusion process by directly interacting with the high-level semantic feature. However, due to the significant disparity between high-level semantic domain and fusion representation domain, there is potential for enhancing the effectiveness of the collaborative approach to direct interaction. To overcome this obstacle, a high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation is proposed, referred to as KDFuse. The KDFuse brings multi-task perceptual representation into the same domain through cross-domain knowledge distillation. By facilitating interaction between semantic information and fusion information at an equivalent level, it effectively reduces the gap between the semantic and fusion domains, enabling multi-task collaborative fusion. Specifically, to acquire superior high-level semantic representations essential for instructing the fusion network, the teaching relationship is established to realize multi-task collaboration by the multi-domain interaction distillation module (MIDM). The multi-scale semantic perception module (MSPM) is designed to learn the ability to capture semantic information through the cross-domain knowledge distillation and the semantic detail integration module (SDIM) is constructed to integrate the fusion-level semantic representations with the fusion-level visual representations. Moreover, to balance the semantic and visual representations during the fusion process, the Fourier transform is introduced into the loss function. Extensive comprehensive experiments demonstrate the effectiveness of the proposed method in both image fusion and downstream tasks. The source code is available at https://github.com/lxq-jnu/KDFuse.
为了增强融合特征的全面性,满足高级视觉任务的要求,一些融合方法试图通过与高级语义特征直接交互来协调融合过程。然而,由于高级语义域和融合表示域之间存在显著差异,因此协作方法在直接交互方面的有效性仍有提高的潜力。为了克服这一障碍,提出了一种基于跨领域知识蒸馏的高级视觉任务驱动的红外与可见光图像融合方法,称为KDFuse。KDFuse通过跨领域的知识升华,将多任务感知表示引入同一领域。通过促进语义信息与融合信息在等效层次上的交互,有效地缩小了语义域与融合域之间的差距,实现了多任务协同融合。具体而言,为了获得指导融合网络所需的高级语义表示,通过多域交互蒸馏模块(MIDM)建立教学关系,实现多任务协作。设计了多尺度语义感知模块(MSPM),通过跨领域知识蒸馏学习捕获语义信息的能力;构建了语义细节集成模块(SDIM),将融合级语义表示与融合级视觉表示相结合。此外,为了在融合过程中平衡语义表示和视觉表示,在损失函数中引入傅里叶变换。广泛的综合实验证明了该方法在图像融合和下游任务中的有效性。源代码可从https://github.com/lxq-jnu/KDFuse获得。
{"title":"KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation","authors":"Chenjia Yang ,&nbsp;Xiaoqing Luo ,&nbsp;Zhancheng Zhang ,&nbsp;Zhiguo Chen ,&nbsp;Xiao-jun Wu","doi":"10.1016/j.inffus.2025.102944","DOIUrl":"10.1016/j.inffus.2025.102944","url":null,"abstract":"<div><div>To enhance the comprehensiveness of fusion features and meet the requirements of high-level vision tasks, some fusion methods attempt to coordinate the fusion process by directly interacting with the high-level semantic feature. However, due to the significant disparity between high-level semantic domain and fusion representation domain, there is potential for enhancing the effectiveness of the collaborative approach to direct interaction. To overcome this obstacle, a high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation is proposed, referred to as KDFuse. The KDFuse brings multi-task perceptual representation into the same domain through cross-domain knowledge distillation. By facilitating interaction between semantic information and fusion information at an equivalent level, it effectively reduces the gap between the semantic and fusion domains, enabling multi-task collaborative fusion. Specifically, to acquire superior high-level semantic representations essential for instructing the fusion network, the teaching relationship is established to realize multi-task collaboration by the multi-domain interaction distillation module (MIDM). The multi-scale semantic perception module (MSPM) is designed to learn the ability to capture semantic information through the cross-domain knowledge distillation and the semantic detail integration module (SDIM) is constructed to integrate the fusion-level semantic representations with the fusion-level visual representations. Moreover, to balance the semantic and visual representations during the fusion process, the Fourier transform is introduced into the loss function. Extensive comprehensive experiments demonstrate the effectiveness of the proposed method in both image fusion and downstream tasks. The source code is available at <span><span>https://github.com/lxq-jnu/KDFuse</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102944"},"PeriodicalIF":14.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142990592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive consensus model with hybrid feedback mechanism: Exploring interference effects under evidence theory
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-13 DOI: 10.1016/j.inffus.2025.102949
Jingmei Xiao , Mei Cai , Guo Wei , Suqiong Hu
The consensus reaching process (CRP) is crucial for achieving broad agreement in group decision-making (GDM). In the CRP, factors such as epistemic uncertainty and opinion interference of experts may cause cognitive biases and irrational behaviors. Therefore, this paper proposes a new adaptive consensus model based on quantum probability theory (QPT) in the context of evidence theory and develops a hybrid feedback mechanism to select the optimal alternative accepted by a majority of experts. To improve the level of precision when dealing with epistemic uncertainty, the design of parameters in evidence theory is optimized, jointly considering the experts’ harmony degree and reliability, to reduce decision biases. Moreover, expert relationships are classified into three cases—mutual support, mutual conflict, and mutual independence—while considering the interference effects within the group. To mitigate conflicts and promote consensus, the quantum Bayesian network (QBN) is employed to model expert opinion interference, and a hybrid feedback mechanism, that uses individual or group opinions as a reference, is designed for adjusting opinions tailored to the specific relationships among experts. Finally, an illustrative example regarding the risk assessment of medical waste disposal is presented to verify the feasibility and effectiveness of the proposed method.
{"title":"An adaptive consensus model with hybrid feedback mechanism: Exploring interference effects under evidence theory","authors":"Jingmei Xiao ,&nbsp;Mei Cai ,&nbsp;Guo Wei ,&nbsp;Suqiong Hu","doi":"10.1016/j.inffus.2025.102949","DOIUrl":"10.1016/j.inffus.2025.102949","url":null,"abstract":"<div><div>The consensus reaching process (CRP) is crucial for achieving broad agreement in group decision-making (GDM). In the CRP, factors such as epistemic uncertainty and opinion interference of experts may cause cognitive biases and irrational behaviors. Therefore, this paper proposes a new adaptive consensus model based on quantum probability theory (QPT) in the context of evidence theory and develops a hybrid feedback mechanism to select the optimal alternative accepted by a majority of experts. To improve the level of precision when dealing with epistemic uncertainty, the design of parameters in evidence theory is optimized, jointly considering the experts’ harmony degree and reliability, to reduce decision biases. Moreover, expert relationships are classified into three cases—mutual support, mutual conflict, and mutual independence—while considering the interference effects within the group. To mitigate conflicts and promote consensus, the quantum Bayesian network (QBN) is employed to model expert opinion interference, and a hybrid feedback mechanism, that uses individual or group opinions as a reference, is designed for adjusting opinions tailored to the specific relationships among experts. Finally, an illustrative example regarding the risk assessment of medical waste disposal is presented to verify the feasibility and effectiveness of the proposed method.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102949"},"PeriodicalIF":14.7,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fractal dimension and clinical neurophysiology fusion to gain a deeper brain signal understanding: A systematic review
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-13 DOI: 10.1016/j.inffus.2025.102936
Sadaf Moaveninejad , Simone Cauzzo , Camillo Porcaro
Fractal dimension (FD) analysis, a powerful tool that has significantly advanced our understanding of brain complexity, evolving from basic geometrical characterization to the nuanced analysis of neurophysiological signals. This review integrates the theoretical foundations of FD calculation with its practical applications in clinical neurophysiology, focusing on the Higuchi method. This method, widely recognized for its effectiveness in analyzing clinical time series datasets, is a crucial aspect of our research. Emphasizing the importance of fractal properties in interpreting brain function, we explore how FD analysis reveals the brain’s physiological and pathological states.
The review systematically examines FD analysis’s role across various neurological conditions, drawing on a meta-analysis of existing literature, including studies on Alzheimer’s disease, Parkinson’s disease, multiple sclerosis, stroke, and schizophrenia. Additionally, we discuss its implications in aging and developmental research, particularly in elderly and young populations. By establishing FD analysis, particularly the Higuchi method, as an indispensable tool for evaluating brain dynamics, we highlight its potential for providing new insights and identifying biomarkers for these conditions. This exploration also underscores the ongoing challenges in synthesizing a unified model of brain function and the need for continued development of computational models that emulate the biological brain.
{"title":"Fractal dimension and clinical neurophysiology fusion to gain a deeper brain signal understanding: A systematic review","authors":"Sadaf Moaveninejad ,&nbsp;Simone Cauzzo ,&nbsp;Camillo Porcaro","doi":"10.1016/j.inffus.2025.102936","DOIUrl":"10.1016/j.inffus.2025.102936","url":null,"abstract":"<div><div>Fractal dimension (FD) analysis, a powerful tool that has significantly advanced our understanding of brain complexity, evolving from basic geometrical characterization to the nuanced analysis of neurophysiological signals. This review integrates the theoretical foundations of FD calculation with its practical applications in clinical neurophysiology, focusing on the Higuchi method. This method, widely recognized for its effectiveness in analyzing clinical time series datasets, is a crucial aspect of our research. Emphasizing the importance of fractal properties in interpreting brain function, we explore how FD analysis reveals the brain’s physiological and pathological states.</div><div>The review systematically examines FD analysis’s role across various neurological conditions, drawing on a meta-analysis of existing literature, including studies on Alzheimer’s disease, Parkinson’s disease, multiple sclerosis, stroke, and schizophrenia. Additionally, we discuss its implications in aging and developmental research, particularly in elderly and young populations. By establishing FD analysis, particularly the Higuchi method, as an indispensable tool for evaluating brain dynamics, we highlight its potential for providing new insights and identifying biomarkers for these conditions. This exploration also underscores the ongoing challenges in synthesizing a unified model of brain function and the need for continued development of computational models that emulate the biological brain.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102936"},"PeriodicalIF":14.7,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143170566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A data-fusion spatiotemporal matrix factorization approach for citywide traffic flow estimation and prediction under insufficient detection
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-13 DOI: 10.1016/j.inffus.2025.102952
Zhengchao Zhang , Meng Li
Citywide traffic flow is essential for the urban traffic planning, traffic signal control, and automotive emission management. However, it is impractical to directly detect the traffic flow of each road segment due to the unaffordable costs of detector installment and maintenance. Under the insufficient detection, the traffic flows of road segments without detectors are totally unknown. Thus, it is imperative to estimate existing values and predict future values for traffic flows of unobserved road segments. To solve this conundrum, we propose a data-fusion spatiotemporal matrix factorization (DSTMF) approach. Firstly, the Gaussian priors are introduced to latent matrices of a matrix factorization model. Secondly, we develop an adaptive spatial regularization term, which models the dependencies of road segments through the knowledge from floating cars’ speed. Thirdly, a learnable autoregressive temporal regularization term is proposed to capture the temporal dependencies of traffic flow and predict future values. Finally, DSTMF is formulated as a quadratic programming and we design an optimization algorithm based on the alternating least squares to solve it. Validated on two real-world large-scale traffic datasets with almost 600 road segments, our method is consistently superior to well-known benchmark models for both the estimation and prediction tasks.
{"title":"A data-fusion spatiotemporal matrix factorization approach for citywide traffic flow estimation and prediction under insufficient detection","authors":"Zhengchao Zhang ,&nbsp;Meng Li","doi":"10.1016/j.inffus.2025.102952","DOIUrl":"10.1016/j.inffus.2025.102952","url":null,"abstract":"<div><div>Citywide traffic flow is essential for the urban traffic planning, traffic signal control, and automotive emission management. However, it is impractical to directly detect the traffic flow of each road segment due to the unaffordable costs of detector installment and maintenance. Under the insufficient detection, the traffic flows of road segments without detectors are totally unknown. Thus, it is imperative to estimate existing values and predict future values for traffic flows of unobserved road segments. To solve this conundrum, we propose a data-fusion spatiotemporal matrix factorization (DSTMF) approach. Firstly, the Gaussian priors are introduced to latent matrices of a matrix factorization model. Secondly, we develop an adaptive spatial regularization term, which models the dependencies of road segments through the knowledge from floating cars’ speed. Thirdly, a learnable autoregressive temporal regularization term is proposed to capture the temporal dependencies of traffic flow and predict future values. Finally, DSTMF is formulated as a quadratic programming and we design an optimization algorithm based on the alternating least squares to solve it. Validated on two real-world large-scale traffic datasets with almost 600 road segments, our method is consistently superior to well-known benchmark models for both the estimation and prediction tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102952"},"PeriodicalIF":14.7,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143170568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bidomain uncertainty gated recursive network for pan-sharpening
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-13 DOI: 10.1016/j.inffus.2025.102938
Junming Hou , Xinyang Liu , Chenxu Wu , Xiaofeng Cong , Chentong Huang , Liang-Jian Deng , Jian Wei You
Pan-sharpening aims to integrate the complementary information of different modalities of satellite images, e.g., texture-rich PAN images and multi-spectral (MS) images, to produce more informative fusion images for various practical tasks. Currently, most deep learning based pan-sharpening techniques primarily concentrate on developing various elaborate architectures to enhance their representation capabilities. Despite significant advancements, these heuristic and intricate network designs result in models that lack interpretability and exhibit poor generalization ability in real-world scenarios. To alleviate these issues, we propose a simple yet effective spatial-frequency (bidomain) uncertainty gated progressive fusion framework for pan-sharpening, termed BUGPan. Specifically, the main body of BUGPan consists of multiple uncertainty gated recursive modules (UGRM) which are responsible for the cross-modal representation learning at different spatial resolutions. In contrast to the prior recursive designs that perform a fixed and manually set number of recursions, the UGRM introduces an innovative spatial-frequency uncertainty gated recursive mechanism featuring two key designs, i.e., Bidomain Uncertainty Estimation (BUE) and Uncertainty-Aware Gating (UAG). This mechanism strategically orchestrates the recursive embedding of features, and tailors the process to specific outcome contexts, enabling more transparent feature representation learning. To the best of our knowledge, this is not only the first attempt to introduce both spatial and frequency uncertainty in pan-sharpening, but we also extend the role of uncertainty in a novel functional mechanism design. Extensive experimental results highlight the superiority of the proposed BUGPan, surpassing other state-of-the-art methods both qualitatively and quantitatively across multiple satellite datasets. Particularly noteworthy is its ability to generalize effectively to real-world scenarios and new satellite data. The code is available at https://github.com/coder-JMHou/BUGPan.
{"title":"Bidomain uncertainty gated recursive network for pan-sharpening","authors":"Junming Hou ,&nbsp;Xinyang Liu ,&nbsp;Chenxu Wu ,&nbsp;Xiaofeng Cong ,&nbsp;Chentong Huang ,&nbsp;Liang-Jian Deng ,&nbsp;Jian Wei You","doi":"10.1016/j.inffus.2025.102938","DOIUrl":"10.1016/j.inffus.2025.102938","url":null,"abstract":"<div><div>Pan-sharpening aims to integrate the complementary information of different modalities of satellite images, <em>e.g.,</em> texture-rich PAN images and multi-spectral (MS) images, to produce more informative fusion images for various practical tasks. Currently, most deep learning based pan-sharpening techniques primarily concentrate on developing various elaborate architectures to enhance their representation capabilities. Despite significant advancements, these heuristic and intricate network designs result in models that lack interpretability and exhibit poor generalization ability in real-world scenarios. To alleviate these issues, we propose a simple yet effective spatial-frequency (bidomain) uncertainty gated progressive fusion framework for pan-sharpening, termed BUGPan. Specifically, the main body of BUGPan consists of multiple uncertainty gated recursive modules (UGRM) which are responsible for the cross-modal representation learning at different spatial resolutions. In contrast to the prior recursive designs that perform a fixed and manually set number of recursions, the UGRM introduces an innovative spatial-frequency uncertainty gated recursive mechanism featuring two key designs, <em>i.e.</em>, Bidomain Uncertainty Estimation (BUE) and Uncertainty-Aware Gating (UAG). This mechanism strategically orchestrates the recursive embedding of features, and tailors the process to specific outcome contexts, enabling more transparent feature representation learning. To the best of our knowledge, this is not only the first attempt to introduce both spatial and frequency uncertainty in pan-sharpening, but we also extend the role of uncertainty in a novel functional mechanism design. Extensive experimental results highlight the superiority of the proposed BUGPan, surpassing other state-of-the-art methods both qualitatively and quantitatively across multiple satellite datasets. Particularly noteworthy is its ability to generalize effectively to real-world scenarios and new satellite data. The code is available at <span><span>https://github.com/coder-JMHou/BUGPan</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102938"},"PeriodicalIF":14.7,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143170567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1