首页 > 最新文献

Information Fusion最新文献

英文 中文
Multi-source information fusion through tucker tensor decomposition-based transfer learning for handwriting-Based Alzheimer's disease detection 基于tucker张量分解的多源信息融合迁移学习用于手写阿尔茨海默病检测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.inffus.2025.104112
Yao Yao , Zhuoxi Yu , Dehui Wang , Chengzhe Wang , Congting Sun
With Alzheimer’s disease affecting approximately 50 million people globally, early detection has emerged as a critical public health priority in aging societies. This paper proposes a novel multi-level information fusion framework for handwriting-based Alzheimer’s disease detection, addressing the fundamental challenges of data scarcity and high-dimensional feature representation. Our approach integrates: (1) structural fusion through tensor representation preserving the multi-dimensional nature of handwriting data, (2) feature-level fusion via Tucker decomposition achieving 80% parameter reduction while maintaining discriminative information, (3) knowledge fusion through our proposed transferable source domain detection algorithm that selectively integrates relevant knowledge from related domains, and (4) decision-level fusion with a two-stage transfer-debias mechanism that mitigates negative transfer risks. Experiments on the DARWIN dataset demonstrate that our transfer learning approach achieves 93.33% accuracy and 99.10% sensitivity, substantially outperforming existing handwriting-based AD detection methods (best reported: 88.29% accuracy, 90.28% sensitivity). The framework exhibits exceptional robustness in small sample scenarios, maintaining 87.50% accuracy with just 10% of the training data. Our comprehensive analysis reveals kinematic features with an importance score of 35.3%, while temporal features collectively contribute 25.7%—among which total time (9.4%) emerges as a key marker within the temporal category. The proposed framework presents a promising non-invasive approach for early Alzheimer’s detection in aging populations, with the potential to facilitate earlier intervention and substantial healthcare cost reductions.
由于阿尔茨海默病影响全球约5000万人,早期发现已成为老龄化社会的一项关键公共卫生优先事项。本文提出了一种基于手写体的阿尔茨海默病检测的多层次信息融合框架,解决了数据稀缺性和高维特征表示的基本挑战。我们的方法包括:(1)通过张量表示的结构融合,保留了手写数据的多维性;(2)通过Tucker分解的特征级融合,在保留判别信息的同时实现了80%的参数缩减;(3)通过我们提出的可转移源域检测算法,有选择性地整合了相关领域的相关知识,实现了知识融合。(4)决策层融合,采用两阶段转移-债务机制,降低负转移风险。在DARWIN数据集上的实验表明,我们的迁移学习方法达到了93.33%的准确率和99.10%的灵敏度,大大优于现有的基于手写的AD检测方法(目前报道的准确率为88.29%,灵敏度为90.28%)。该框架在小样本场景中表现出出色的鲁棒性,仅用10%的训练数据就保持了87.50%的准确率。我们的综合分析显示,运动特征的重要性得分为35.3%,而时间特征的重要性得分为25.7%,其中总时间(9.4%)成为时间类别中的关键标记。该框架提出了一种有希望的非侵入性方法,用于老年人群的早期阿尔茨海默氏症检测,具有促进早期干预和大幅降低医疗成本的潜力。
{"title":"Multi-source information fusion through tucker tensor decomposition-based transfer learning for handwriting-Based Alzheimer's disease detection","authors":"Yao Yao ,&nbsp;Zhuoxi Yu ,&nbsp;Dehui Wang ,&nbsp;Chengzhe Wang ,&nbsp;Congting Sun","doi":"10.1016/j.inffus.2025.104112","DOIUrl":"10.1016/j.inffus.2025.104112","url":null,"abstract":"<div><div>With Alzheimer’s disease affecting approximately 50 million people globally, early detection has emerged as a critical public health priority in aging societies. This paper proposes a novel multi-level information fusion framework for handwriting-based Alzheimer’s disease detection, addressing the fundamental challenges of data scarcity and high-dimensional feature representation. Our approach integrates: (1) structural fusion through tensor representation preserving the multi-dimensional nature of handwriting data, (2) feature-level fusion via Tucker decomposition achieving 80% parameter reduction while maintaining discriminative information, (3) knowledge fusion through our proposed transferable source domain detection algorithm that selectively integrates relevant knowledge from related domains, and (4) decision-level fusion with a two-stage transfer-debias mechanism that mitigates negative transfer risks. Experiments on the DARWIN dataset demonstrate that our transfer learning approach achieves 93.33% accuracy and 99.10% sensitivity, substantially outperforming existing handwriting-based AD detection methods (best reported: 88.29% accuracy, 90.28% sensitivity). The framework exhibits exceptional robustness in small sample scenarios, maintaining 87.50% accuracy with just 10% of the training data. Our comprehensive analysis reveals kinematic features with an importance score of 35.3%, while temporal features collectively contribute 25.7%—among which total time (9.4%) emerges as a key marker within the temporal category. The proposed framework presents a promising non-invasive approach for early Alzheimer’s detection in aging populations, with the potential to facilitate earlier intervention and substantial healthcare cost reductions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104112"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HFPN: Hierarchical fusion and prediction network with multi-level cross-modality relation learning for audio-visual event localization HFPN:基于多层次跨模态关系学习的分层融合预测网络
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.inffus.2025.104111
Pufen Zhang , Lei Jia , Jiaxiang Wang , Meng Wan , Sijie Chang , Tianle Zhang , Peng Shi
Audio-visual event localization (AVEL) task needs to fuse audio-visual modalities via mining their cross-modality relation (CMR). However, existing AVEL works encounter several challenges in CMR learning: (a) The event-unrelated visual regions are not filtered when learning the region-level CMR; (b) The segment-level CMR is modeled in a one-to-one way, ignoring the cross-modality locality context correlation; (c) The holistic semantics of audio and visual tracks of event are consistent, but such a track-level CMR is not explored; (d) The low- and middle-level visual semantics are ignored in existing fusion and CMR learning strategies. To address these issues, a Hierarchical Fusion and Prediction Network (HFPN) with Multi-level Cross-modality Relation Learning Framework (MCRLF) is proposed. Specifically, for challenge (a), MCRLF proposes an audio-adaptive region filter to dynamically filter out event-irrelevant image regions according to event audio. To deal with challenge (b), MCRLF designs a bilateral locality context attention, which captures the cross-modality locality context correlation via convolution windows to guide segment-level CMR learning. For challenge (c), MCRLF introduces a novel dual-track alignment loss to achieve the whole semantic alignment on the audio and visual tracks of event. Finally, to tackle challenge (d), HFPN uses MCRLF as unified fusion framework to hierarchically fuse audio signals with the low-, middle- and high-level visual features, obtaining comprehensive semantics for event prediction. With modest model complexity, HFPN achieves the state-of-the-art results on AVE (84.8 % and 80.2 %) and VGGSound-AVEL100k (67.2 % and 62.7 %) benchmarks under both fully- and weakly-supervised settings, it offers a significant reference for practical application.
视听事件定位(AVEL)任务需要通过挖掘跨模态关系(CMR)来融合视听模态。然而,现有的AVEL工作在CMR学习中遇到了几个挑战:(a)在学习区域级CMR时没有过滤事件无关的视觉区域;(b)段级CMR以一对一的方式建模,忽略了跨模态的局部上下文相关性;(c)事件的视听轨迹的整体语义是一致的,但没有探索这种轨迹级的CMR;(d)在现有的融合和CMR学习策略中忽略了中低层次视觉语义。为了解决这些问题,提出了一种多层次跨模态关系学习框架(MCRLF)的分层融合预测网络(HFPN)。具体来说,对于挑战(a), MCRLF提出了一种音频自适应区域滤波器,根据事件音频动态滤除与事件无关的图像区域。为了应对挑战(b), MCRLF设计了双边局部上下文关注,通过卷积窗口捕获跨模态局部上下文相关性,以指导分段级CMR学习。对于挑战(c), MCRLF引入了一种新的双轨对齐损失,以实现事件音视频轨道的全语义对齐。最后,为了解决挑战(d), HFPN采用MCRLF作为统一融合框架,对音频信号与低、中、高级视觉特征进行分层融合,获得用于事件预测的综合语义。在适度的模型复杂度下,HFPN在AVE(84.8 %和80.2 %)和VGGSound-AVEL100k(67.2 %和62.7 %)基准测试中,在完全监督和弱监督设置下都取得了最先进的结果,为实际应用提供了重要的参考。
{"title":"HFPN: Hierarchical fusion and prediction network with multi-level cross-modality relation learning for audio-visual event localization","authors":"Pufen Zhang ,&nbsp;Lei Jia ,&nbsp;Jiaxiang Wang ,&nbsp;Meng Wan ,&nbsp;Sijie Chang ,&nbsp;Tianle Zhang ,&nbsp;Peng Shi","doi":"10.1016/j.inffus.2025.104111","DOIUrl":"10.1016/j.inffus.2025.104111","url":null,"abstract":"<div><div>Audio-visual event localization (AVEL) task needs to fuse audio-visual modalities via mining their cross-modality relation (CMR). However, existing AVEL works encounter several challenges in CMR learning: (a) The event-unrelated visual regions are not filtered when learning the region-level CMR; (b) The segment-level CMR is modeled in a one-to-one way, ignoring the cross-modality locality context correlation; (c) The holistic semantics of audio and visual tracks of event are consistent, but such a track-level CMR is not explored; (d) The low- and middle-level visual semantics are ignored in existing fusion and CMR learning strategies. To address these issues, a Hierarchical Fusion and Prediction Network (HFPN) with Multi-level Cross-modality Relation Learning Framework (MCRLF) is proposed. Specifically, for challenge (a), MCRLF proposes an audio-adaptive region filter to dynamically filter out event-irrelevant image regions according to event audio. To deal with challenge (b), MCRLF designs a bilateral locality context attention, which captures the cross-modality locality context correlation via convolution windows to guide segment-level CMR learning. For challenge (c), MCRLF introduces a novel dual-track alignment loss to achieve the whole semantic alignment on the audio and visual tracks of event. Finally, to tackle challenge (d), HFPN uses MCRLF as unified fusion framework to hierarchically fuse audio signals with the low-, middle- and high-level visual features, obtaining comprehensive semantics for event prediction. With modest model complexity, HFPN achieves the state-of-the-art results on AVE (84.8 % and 80.2 %) and VGGSound-AVEL100k (67.2 % and 62.7 %) benchmarks under both fully- and weakly-supervised settings, it offers a significant reference for practical application.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104111"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Internet meme on social media: A comprehensive review and new perspectives 社交媒体中的网络模因:综合回顾与新视角
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-29 DOI: 10.1016/j.inffus.2025.104102
Bingbing Wang , Jingjie Lin , Zhixin Bai , Zihan Wang , Shengzhe Sun , Zhengda Jin , Zhiyuan Wen , Geng Tu , Jing Li , Erik Cambria , Ruifeng Xu
Internet memes have become a dominant yet complex form of online communication, spurring a rapid growth of computational research. However, existing surveys remain largely confined to narrow classification tasks and fail to reflect the paradigm shift introduced by Multimodal Large Language Models (MLLMs). To address this gap, we introduce the TriR Framework, comprising Redefinition, Reconsolidation, and Revolution. Within this framework, we redefine the research scope through a taxonomy of higher-order cognitive tasks for meme comprehension, reconsolidate fragmented methodological progress around the unique capabilities of MLLMs, and articulate a trajectory that highlights key challenges and opportunities for advancing compositional and inferential modeling. By offering this structured perspective, the survey anchors the current state of the field while providing a systematic guide for its future development, fostering research that is computationally rigorous, empirically grounded, and ethically responsible.
网络模因已经成为一种占主导地位但又复杂的在线交流形式,刺激了计算研究的快速发展。然而,现有的调查仍然主要局限于狭窄的分类任务,未能反映多模态大语言模型(Multimodal Large Language Models, MLLMs)带来的范式转变。为了解决这一差距,我们引入了TriR框架,包括重新定义,重新整合和革命。在此框架内,我们通过模因理解的高阶认知任务分类法重新定义了研究范围,围绕mllm的独特功能重新整合了支离破碎的方法进展,并阐明了强调推进组成和推理建模的关键挑战和机遇的轨迹。通过提供这种结构化的视角,该调查锚定了该领域的现状,同时为其未来发展提供了系统的指导,促进了计算严谨、经验基础和道德负责的研究。
{"title":"Internet meme on social media: A comprehensive review and new perspectives","authors":"Bingbing Wang ,&nbsp;Jingjie Lin ,&nbsp;Zhixin Bai ,&nbsp;Zihan Wang ,&nbsp;Shengzhe Sun ,&nbsp;Zhengda Jin ,&nbsp;Zhiyuan Wen ,&nbsp;Geng Tu ,&nbsp;Jing Li ,&nbsp;Erik Cambria ,&nbsp;Ruifeng Xu","doi":"10.1016/j.inffus.2025.104102","DOIUrl":"10.1016/j.inffus.2025.104102","url":null,"abstract":"<div><div>Internet memes have become a dominant yet complex form of online communication, spurring a rapid growth of computational research. However, existing surveys remain largely confined to narrow classification tasks and fail to reflect the paradigm shift introduced by Multimodal Large Language Models (MLLMs). To address this gap, we introduce the TriR Framework, comprising Redefinition, Reconsolidation, and Revolution. Within this framework, we redefine the research scope through a taxonomy of higher-order cognitive tasks for meme comprehension, reconsolidate fragmented methodological progress around the unique capabilities of MLLMs, and articulate a trajectory that highlights key challenges and opportunities for advancing compositional and inferential modeling. By offering this structured perspective, the survey anchors the current state of the field while providing a systematic guide for its future development, fostering research that is computationally rigorous, empirically grounded, and ethically responsible.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104102"},"PeriodicalIF":15.5,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal and multi-condition fault diagnosis of rotating machinery via a heterogeneous graph learning framework 基于异构图学习框架的旋转机械多模态多工况故障诊断
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-29 DOI: 10.1016/j.inffus.2025.104106
Xiaoyu Han , Yunpeng Cao , Pan Hu , Weixing Feng
Intelligent fault diagnosis of rotating machinery under multi-modal and multi-condition scenarios presents critical challenges, including the low utilization of structural information and weak model generalization capability. To address these issues, this paper proposes a Structural Awareness Heterogeneous Graph Transformer (SAHGT) framework to achieve unified modeling and robust representation learning of multi-source monitoring signals. The method constructs a unified heterogeneous graph structure to integrate modal features, frequency-domain relationships, and spatial priors. By introducing a heterogeneous graph Transformer with a dual-guided attention mechanism that combines modality-guided and frequency-guided attention, it enhances the selective expression of key features and the discriminative capability for fault patterns. To enhance the model’s adaptability in non-stationary environments, an augmented view-driven contrastive learning mechanism is designed to further strengthen robustness against structural variations and distribution shifts. Notably, this paper establishes a unified training framework that enables switching between Domain Generalization (DG) and Domain Adaptation (DA) tasks solely by configuring loss function combinations without modifying the model architecture. The validation experiments conducted on a high-fidelity gas turbine test platform demonstrate the superior performance of the proposed SAHGT framework, achieving average fault diagnosis accuracies of 83.46% and 99.57% on nine DG tasks and twelve DA tasks, respectively. These results significantly outperform state-of-the-art graph neural network methods, highlighting the model’s strong cross-domain generalization and domain adaptability.
旋转机械在多模态、多工况下的智能故障诊断面临着结构信息利用率低、模型泛化能力弱等严峻挑战。为了解决这些问题,本文提出了一种结构感知异构图转换器(SAHGT)框架,以实现多源监测信号的统一建模和鲁棒表示学习。该方法构建了统一的异构图结构,集成了模态特征、频域关系和空间先验。通过引入具有模态引导和频率引导相结合的双引导注意机制的异构图转换器,增强了关键特征的选择性表达和故障模式的判别能力。为了增强模型在非平稳环境中的适应性,设计了一种增强的视图驱动对比学习机制,以进一步增强模型对结构变化和分布变化的鲁棒性。值得注意的是,本文建立了一个统一的训练框架,仅通过配置损失函数组合就可以在不修改模型体系结构的情况下在域泛化(DG)和域自适应(DA)任务之间切换。在高保真燃气轮机测试平台上进行的验证实验表明,所提出的SAHGT框架在9个DG任务和12个DA任务上的平均故障诊断准确率分别达到83.46%和99.57%。这些结果明显优于最先进的图神经网络方法,突出了模型强大的跨域泛化和域适应性。
{"title":"Multi-modal and multi-condition fault diagnosis of rotating machinery via a heterogeneous graph learning framework","authors":"Xiaoyu Han ,&nbsp;Yunpeng Cao ,&nbsp;Pan Hu ,&nbsp;Weixing Feng","doi":"10.1016/j.inffus.2025.104106","DOIUrl":"10.1016/j.inffus.2025.104106","url":null,"abstract":"<div><div>Intelligent fault diagnosis of rotating machinery under multi-modal and multi-condition scenarios presents critical challenges, including the low utilization of structural information and weak model generalization capability. To address these issues, this paper proposes a Structural Awareness Heterogeneous Graph Transformer (SAHGT) framework to achieve unified modeling and robust representation learning of multi-source monitoring signals. The method constructs a unified heterogeneous graph structure to integrate modal features, frequency-domain relationships, and spatial priors. By introducing a heterogeneous graph Transformer with a dual-guided attention mechanism that combines modality-guided and frequency-guided attention, it enhances the selective expression of key features and the discriminative capability for fault patterns. To enhance the model’s adaptability in non-stationary environments, an augmented view-driven contrastive learning mechanism is designed to further strengthen robustness against structural variations and distribution shifts. Notably, this paper establishes a unified training framework that enables switching between Domain Generalization (DG) and Domain Adaptation (DA) tasks solely by configuring loss function combinations without modifying the model architecture. The validation experiments conducted on a high-fidelity gas turbine test platform demonstrate the superior performance of the proposed SAHGT framework, achieving average fault diagnosis accuracies of 83.46% and 99.57% on nine DG tasks and twelve DA tasks, respectively. These results significantly outperform state-of-the-art graph neural network methods, highlighting the model’s strong cross-domain generalization and domain adaptability.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104106"},"PeriodicalIF":15.5,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty-aware multi-view evidence fusion for feature selection in brain network analysis 脑网络分析中特征选择的不确定性感知多视角证据融合
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-27 DOI: 10.1016/j.inffus.2025.104083
Yuepeng Chen , Weiping Ding , Shangce Gao , Jiaru Yang , Tianyi Zhou , Qichong Hua , Fan Fu
Accurate diagnosis of schizophrenia remains challenging due to the lack of reliable biomarkers. Dynamic functional connectivity (dFC) derived from resting-state fMRI provides a powerful representation of temporal brain dynamics; however, its inherently multi-view structure introduces severe challenges, including high dimensionality, heterogeneity across views, and uncertainty caused by preprocessing. Such uncertainty directly affects feature selection because unreliable features may degrade both diagnostic accuracy and interpretability. However, most existing feature selection methods fail to explicitly model and utilize uncertainty. To address these challenges, we propose a multi-view feature selection method based on evidence theory that explicitly models uncertainty while capturing inter-view consistency and complementarity. This approach enables the selection of both shared and view-specific discriminative patterns that are often overlooked in dynamic brain network analysis. We further introduce an information-theoretic consistency constraint to extract reliable shared information and an uncertainty-weighted loss based on the Dirichlet distribution to prioritize complementary features with lower uncertainty. By integrating confidence measures across views through evidence fusion, our method effectively quantifies and leverages uncertainty to optimize feature selection. Extensive experiments on three independent rs-fMRI schizophrenia datasets demonstrate improved classification accuracy and robustness, providing an interpretable and reliable tool for identifying biomarkers in neuropsychiatric research.
由于缺乏可靠的生物标志物,精神分裂症的准确诊断仍然具有挑战性。动态功能连接(dFC)来源于静息状态的功能磁共振成像提供了一个强有力的大脑动态表征;然而,其固有的多视图结构带来了严峻的挑战,包括高维性、跨视图的异构性以及预处理带来的不确定性。这种不确定性直接影响特征选择,因为不可靠的特征可能会降低诊断的准确性和可解释性。然而,大多数现有的特征选择方法都没有明确地建模和利用不确定性。为了解决这些挑战,我们提出了一种基于证据理论的多视图特征选择方法,该方法在捕获视图间一致性和互补性的同时明确地建模不确定性。这种方法可以选择共享和特定视图的判别模式,这些模式在动态大脑网络分析中经常被忽视。我们进一步引入了信息论的一致性约束来提取可靠的共享信息,并引入了基于Dirichlet分布的不确定性加权损失来优先考虑不确定性较低的互补特征。通过证据融合整合不同视图的置信度度量,该方法有效地量化和利用不确定性来优化特征选择。在三个独立的rs-fMRI精神分裂症数据集上进行的大量实验表明,分类准确性和稳健性得到了提高,为神经精神病学研究中识别生物标志物提供了一种可解释和可靠的工具。
{"title":"Uncertainty-aware multi-view evidence fusion for feature selection in brain network analysis","authors":"Yuepeng Chen ,&nbsp;Weiping Ding ,&nbsp;Shangce Gao ,&nbsp;Jiaru Yang ,&nbsp;Tianyi Zhou ,&nbsp;Qichong Hua ,&nbsp;Fan Fu","doi":"10.1016/j.inffus.2025.104083","DOIUrl":"10.1016/j.inffus.2025.104083","url":null,"abstract":"<div><div>Accurate diagnosis of schizophrenia remains challenging due to the lack of reliable biomarkers. Dynamic functional connectivity (dFC) derived from resting-state fMRI provides a powerful representation of temporal brain dynamics; however, its inherently multi-view structure introduces severe challenges, including high dimensionality, heterogeneity across views, and uncertainty caused by preprocessing. Such uncertainty directly affects feature selection because unreliable features may degrade both diagnostic accuracy and interpretability. However, most existing feature selection methods fail to explicitly model and utilize uncertainty. To address these challenges, we propose a multi-view feature selection method based on evidence theory that explicitly models uncertainty while capturing inter-view consistency and complementarity. This approach enables the selection of both shared and view-specific discriminative patterns that are often overlooked in dynamic brain network analysis. We further introduce an information-theoretic consistency constraint to extract reliable shared information and an uncertainty-weighted loss based on the Dirichlet distribution to prioritize complementary features with lower uncertainty. By integrating confidence measures across views through evidence fusion, our method effectively quantifies and leverages uncertainty to optimize feature selection. Extensive experiments on three independent rs-fMRI schizophrenia datasets demonstrate improved classification accuracy and robustness, providing an interpretable and reliable tool for identifying biomarkers in neuropsychiatric research.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104083"},"PeriodicalIF":15.5,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GCEPANet: A lightweight and efficient remote sensing image cloud removal network model for optical-SAR image fusion GCEPANet:一种轻量高效的遥感图像去云网络模型,用于光学与sar图像融合
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-27 DOI: 10.1016/j.inffus.2025.104090
Qinglong Zhou , Xing Wang , Jiahao Fang , Wenbo Wu , Bingxian Zhang
To mitigate severe cloud interference in optical remote sensing imagery and address the challenges of deploying complex cloud removal models on satellite platforms, this study proposes a lightweight gated parallel attention network, GCEPANet. By integrating optical and SAR data, the network fully exploits the penetration capability of SAR imagery and combines a Gated Convolution Module (GCONV) with an Enhanced Parallel Attention Module (EPA) to establish a “cloud perception–cloud refinement” cooperative mechanism. This mechanism enables the model to identify and filter features according to cloud intensity, effectively separating the feature flows of clear and cloudy regions, and adaptively compensating for cloud-induced degradation to reconstruct the true structural and radiative characteristics of surface objects. Furthermore, a joint spectral–structural loss is introduced to simultaneously constrain spectral consistency and structural fidelity. Extensive experiments on the SEN12MS-CR dataset demonstrate that the proposed GCEPANet consistently outperforms existing methods across multiple metrics, including PSNR, SSIM, MAE, RMSE, SAM, and ERGAS. Compared with the SCTCR model, GCEPANet achieves a 0.9306 dB improvement in PSNR, reduces the number of parameters by 85.5% (to 12.77M), and decreases FLOPs by 76.0% (to 9.71G). These results demonstrate that the proposed method achieves superior cloud removal performance while significantly reducing model complexity, providing an efficient and practical solution for real-time on-orbit cloud removal in optical–SAR fused remote sensing imagery.
为了减轻光学遥感图像中严重的云干扰,并解决在卫星平台上部署复杂云去除模型的挑战,本研究提出了一种轻量级门控并行关注网络GCEPANet。该网络通过整合光学和SAR数据,充分利用SAR图像的突防能力,将门控卷积模块(GCONV)与增强型并行关注模块(EPA)相结合,建立“云感知-云细化”协同机制。该机制使模型能够根据云强度对特征进行识别和过滤,有效分离晴空和多云区域的特征流,并自适应补偿云引起的退化,重建地表物体真实的结构和辐射特征。在此基础上,引入联合谱结构损失,同时约束谱一致性和结构保真度。在SEN12MS-CR数据集上进行的大量实验表明,所提出的GCEPANet在多个指标上始终优于现有方法,包括PSNR、SSIM、MAE、RMSE、SAM和ERGAS。与SCTCR模型相比,GCEPANet的PSNR提高了0.9306 dB,参数个数减少了85.5%(至12.77M), FLOPs降低了76.0%(至9.71G)。结果表明,该方法在显著降低模型复杂度的同时,取得了优异的除云性能,为光学- sar融合遥感影像的实时在轨除云提供了一种高效实用的解决方案。
{"title":"GCEPANet: A lightweight and efficient remote sensing image cloud removal network model for optical-SAR image fusion","authors":"Qinglong Zhou ,&nbsp;Xing Wang ,&nbsp;Jiahao Fang ,&nbsp;Wenbo Wu ,&nbsp;Bingxian Zhang","doi":"10.1016/j.inffus.2025.104090","DOIUrl":"10.1016/j.inffus.2025.104090","url":null,"abstract":"<div><div>To mitigate severe cloud interference in optical remote sensing imagery and address the challenges of deploying complex cloud removal models on satellite platforms, this study proposes a lightweight gated parallel attention network, GCEPANet. By integrating optical and SAR data, the network fully exploits the penetration capability of SAR imagery and combines a Gated Convolution Module (GCONV) with an Enhanced Parallel Attention Module (EPA) to establish a “cloud perception–cloud refinement” cooperative mechanism. This mechanism enables the model to identify and filter features according to cloud intensity, effectively separating the feature flows of clear and cloudy regions, and adaptively compensating for cloud-induced degradation to reconstruct the true structural and radiative characteristics of surface objects. Furthermore, a joint spectral–structural loss is introduced to simultaneously constrain spectral consistency and structural fidelity. Extensive experiments on the SEN12MS-CR dataset demonstrate that the proposed GCEPANet consistently outperforms existing methods across multiple metrics, including PSNR, SSIM, MAE, RMSE, SAM, and ERGAS. Compared with the SCTCR model, GCEPANet achieves a 0.9306 dB improvement in PSNR, reduces the number of parameters by 85.5% (to 12.77M), and decreases FLOPs by 76.0% (to 9.71G). These results demonstrate that the proposed method achieves superior cloud removal performance while significantly reducing model complexity, providing an efficient and practical solution for real-time on-orbit cloud removal in optical–SAR fused remote sensing imagery.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104090"},"PeriodicalIF":15.5,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scoping review of multimodal sentiment analysis and summarization: State of the art, challenges and future directions 多模态情感分析与总结的范围综述:现状、挑战和未来方向
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-27 DOI: 10.1016/j.inffus.2025.104082
Magaly Lika Fujimoto, Ricardo Marcondes Marcacini, Solange Oliveira Rezende
In recent decades, advancements in computing power and the widespread availability of multimodal data have significantly redirected research, shifting the primary focus from text based approaches. This paper presents a scoping review focusing on approaches that jointly perform Multimodal Sentiment Analysis and Multimodal Summarization within the same framework. Beyond this, the review comprehensively surveys each domain individually, highlighting state-of-the-art techniques, key methodologies, and commonly used datasets. It also provides key insights into current challenges and proposes future research directions.
近几十年来,计算能力的进步和多模态数据的广泛可用性显著地改变了研究方向,将主要焦点从基于文本的方法转移。本文介绍了在同一框架内联合执行多模态情感分析和多模态摘要的方法的范围审查。除此之外,该综述还全面调查了每个领域,突出了最先进的技术、关键方法和常用数据集。它还提供了当前挑战的关键见解,并提出了未来的研究方向。
{"title":"Scoping review of multimodal sentiment analysis and summarization: State of the art, challenges and future directions","authors":"Magaly Lika Fujimoto,&nbsp;Ricardo Marcondes Marcacini,&nbsp;Solange Oliveira Rezende","doi":"10.1016/j.inffus.2025.104082","DOIUrl":"10.1016/j.inffus.2025.104082","url":null,"abstract":"<div><div>In recent decades, advancements in computing power and the widespread availability of multimodal data have significantly redirected research, shifting the primary focus from text based approaches. This paper presents a scoping review focusing on approaches that jointly perform Multimodal Sentiment Analysis and Multimodal Summarization within the same framework. Beyond this, the review comprehensively surveys each domain individually, highlighting state-of-the-art techniques, key methodologies, and commonly used datasets. It also provides key insights into current challenges and proposes future research directions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104082"},"PeriodicalIF":15.5,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DepressInstruct: Instruction tuning of large speech-language models for depression detection 抑郁结构:用于抑郁检测的大型语音语言模型的指令调谐
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-27 DOI: 10.1016/j.inffus.2025.104077
Dongdong Li , Li Ding , Zhe Wang , Ke Zhao
Depression is a major global issue in the field of mental health, driving extensive research into AI-based diagnostic and detection methods. Among the various AI technologies, large language models (LLMs) stand out due to their strong generalization ability and versatility. However, one of the main limitations of these models is their exclusive reliance on text input, which somewhat limits their overall performance. Furthermore, the potential of LLMs in analyzing depressive states by incorporating raw speech signals has yet to be fully explored. In this paper, we propose an innovative method that integrates different types of unimodal information into a multimodal description, thereby incorporating raw acoustic information into a large speech language model for multimodal depression detection. By combining raw speech signals, text transcriptions, and speaker emotional information, we construct a multimodal instruction set and fine-tune the large speech language model with instructions to recognize an individual’s potential psychological state. Evaluation on the DAIC, EATD, and CMDC datasets shows that the F1 scores for the DAIC, EATD, and CMDC datasets are 0.8235, 0.8182, and 0.9818, respectively. These results demonstrate that the proposed method achieves state-of-the-art performance in depression detection. Moreover, this approach not only has significant value in depression detection but also provides a new perspective on the ability of large language models to understand and process speech signals. The source code used in the paper is available at https://github.com/jiabing1988/instruct_fine_Qwen2_audio.
抑郁症是精神卫生领域的一个重大全球性问题,推动了对基于人工智能的诊断和检测方法的广泛研究。在各种人工智能技术中,大型语言模型(llm)因其强大的泛化能力和通用性而脱颖而出。然而,这些模型的主要限制之一是它们完全依赖文本输入,这在某种程度上限制了它们的整体性能。此外,llm在结合原始语音信号分析抑郁状态方面的潜力尚未得到充分探索。在本文中,我们提出了一种创新的方法,将不同类型的单模态信息整合到多模态描述中,从而将原始声学信息整合到大型语音语言模型中进行多模态抑制检测。通过结合原始语音信号、文本转录和说话人的情绪信息,我们构建了一个多模态指令集,并用指令对大型语音语言模型进行微调,以识别个体潜在的心理状态。对aic、EATD和cmc数据集的评价结果表明,aic、EATD和cmc数据集的F1得分分别为0.8235、0.8182和0.9818。这些结果表明,该方法在抑郁症检测中达到了最先进的性能。此外,该方法不仅在抑郁症检测中具有重要价值,而且为大型语言模型理解和处理语音信号的能力提供了新的视角。论文中使用的源代码可在https://github.com/jiabing1988/instruct_fine_Qwen2_audio上获得。
{"title":"DepressInstruct: Instruction tuning of large speech-language models for depression detection","authors":"Dongdong Li ,&nbsp;Li Ding ,&nbsp;Zhe Wang ,&nbsp;Ke Zhao","doi":"10.1016/j.inffus.2025.104077","DOIUrl":"10.1016/j.inffus.2025.104077","url":null,"abstract":"<div><div>Depression is a major global issue in the field of mental health, driving extensive research into AI-based diagnostic and detection methods. Among the various AI technologies, large language models (LLMs) stand out due to their strong generalization ability and versatility. However, one of the main limitations of these models is their exclusive reliance on text input, which somewhat limits their overall performance. Furthermore, the potential of LLMs in analyzing depressive states by incorporating raw speech signals has yet to be fully explored. In this paper, we propose an innovative method that integrates different types of unimodal information into a multimodal description, thereby incorporating raw acoustic information into a large speech language model for multimodal depression detection. By combining raw speech signals, text transcriptions, and speaker emotional information, we construct a multimodal instruction set and fine-tune the large speech language model with instructions to recognize an individual’s potential psychological state. Evaluation on the DAIC, EATD, and CMDC datasets shows that the F1 scores for the DAIC, EATD, and CMDC datasets are 0.8235, 0.8182, and 0.9818, respectively. These results demonstrate that the proposed method achieves state-of-the-art performance in depression detection. Moreover, this approach not only has significant value in depression detection but also provides a new perspective on the ability of large language models to understand and process speech signals. The source code used in the paper is available at <span><span>https://github.com/jiabing1988/instruct_fine_Qwen2_audio</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104077"},"PeriodicalIF":15.5,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EBMADDPG: Shapley-based explainable moving target defense for edge intelligence-enabled SIoT systems via joint Bayesian Markov games and DRL EBMADDPG:通过联合贝叶斯马尔可夫博弈和DRL,为边缘智能SIoT系统提供基于shapley的可解释移动目标防御
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-27 DOI: 10.1016/j.inffus.2025.104101
Tao Hong , Yizhou Shen , Xiaoping Wu , Jingnan Dong , Shigen Shen , Zhiquan Liu
The edge intelligence (EI)-enabled social Internet of Things (SIoT) is increasingly vulnerable to sophisticated malware that exploits social relationships between devices to propagate rapidly and bypass traditional security. To address this dynamic threat under incomplete information, we propose a novel moving target defense framework based on a Bayesian Markov game. In our framework, defenders dynamically shift system configurations and resource allocations upon detecting potential threats. Based on their belief states about attacker types, each defender can decide whether to coordinate defense strategies with other agents. Unlike most existing work, we explicitly account for both the incomplete information about attacker capabilities and the dynamic nature of EI-enabled SIoT systems. We formulate a joint optimization problem to simultaneously determine belief updates about attacker types via Bayesian inference, dynamic reconfiguration of defense parameters, and optimal coordination strategies among agents. To efficiently solve this problem, we develop a novel explainable bayesian multi-agent deep deterministic policy gradient algorithm, which integrates centralized training with decentralized execution. Furthermore, we incorporate Shapley Additive Explanations to analyze agent contributions. Theoretical analyses and extensive simulations demonstrate that our proposed solution significantly outperforms traditional reinforcement learning algorithms.
边缘智能(EI)支持的社交物联网(SIoT)越来越容易受到复杂恶意软件的攻击,这些恶意软件利用设备之间的社交关系快速传播并绕过传统安全措施。为了解决这种不完全信息下的动态威胁,我们提出了一种新的基于贝叶斯马尔可夫对策的移动目标防御框架。在我们的框架中,防御者在检测到潜在威胁时动态地转移系统配置和资源分配。基于防御者对攻击者类型的信念状态,每个防御者可以决定是否与其他智能体协调防御策略。与大多数现有工作不同,我们明确地考虑了关于攻击者能力的不完整信息和支持ei的SIoT系统的动态特性。通过贝叶斯推理、防御参数的动态重构和智能体间的最优协调策略,提出了一个联合优化问题,以同时确定攻击者类型的信念更新。为了有效地解决这一问题,我们开发了一种新的可解释贝叶斯多智能体深度确定性策略梯度算法,该算法将集中训练与分散执行相结合。此外,我们采用沙普利加性解释来分析代理的贡献。理论分析和广泛的模拟表明,我们提出的解决方案显着优于传统的强化学习算法。
{"title":"EBMADDPG: Shapley-based explainable moving target defense for edge intelligence-enabled SIoT systems via joint Bayesian Markov games and DRL","authors":"Tao Hong ,&nbsp;Yizhou Shen ,&nbsp;Xiaoping Wu ,&nbsp;Jingnan Dong ,&nbsp;Shigen Shen ,&nbsp;Zhiquan Liu","doi":"10.1016/j.inffus.2025.104101","DOIUrl":"10.1016/j.inffus.2025.104101","url":null,"abstract":"<div><div>The edge intelligence (EI)-enabled social Internet of Things (SIoT) is increasingly vulnerable to sophisticated malware that exploits social relationships between devices to propagate rapidly and bypass traditional security. To address this dynamic threat under incomplete information, we propose a novel moving target defense framework based on a Bayesian Markov game. In our framework, defenders dynamically shift system configurations and resource allocations upon detecting potential threats. Based on their belief states about attacker types, each defender can decide whether to coordinate defense strategies with other agents. Unlike most existing work, we explicitly account for both the incomplete information about attacker capabilities and the dynamic nature of EI-enabled SIoT systems. We formulate a joint optimization problem to simultaneously determine belief updates about attacker types via Bayesian inference, dynamic reconfiguration of defense parameters, and optimal coordination strategies among agents. To efficiently solve this problem, we develop a novel explainable bayesian multi-agent deep deterministic policy gradient algorithm, which integrates centralized training with decentralized execution. Furthermore, we incorporate Shapley Additive Explanations to analyze agent contributions. Theoretical analyses and extensive simulations demonstrate that our proposed solution significantly outperforms traditional reinforcement learning algorithms.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104101"},"PeriodicalIF":15.5,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning-based astronomical multimodal data fusion: A comprehensive review 基于深度学习的天文多模态数据融合研究综述
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-27 DOI: 10.1016/j.inffus.2025.104103
Wujun Shao , Dongwei Fan , Chenzhou Cui , Yunfei Xu , Shirui Wei , Xin Lyu
With the rapid advancements in observational technologies and the widespread implementation of large-scale sky surveys, diverse electromagnetic wave data (e.g., optical and infrared) and non-electromagnetic wave data (e.g., gravitational waves) have become increasingly accessible. Astronomy has thus entered an unprecedented era of data abundance and complexity. Astronomers have long relied on unimodal data analysis to perceive the universe, but these efforts often provide only limited insights when confronted with the current massive and heterogeneous astronomical data. In this context, multimodal data fusion (MDF), as an emerging method, provides new opportunities to enhance the value of astronomical data and deepening the understanding of the universe by integrating information from different modalities. Recent progress in artificial intelligence (AI), particularly in deep learning (DL), has greatly accelerated the development of multimodal research in astronomy. Therefore, a timely review of this field is essential. This paper begins by discussing the motivation and necessity of astronomical MDF, followed by an overview of astronomical data sources and major data modalities. It then introduces representative DL models commonly used in astronomical multimodal studies, the general fusion process as well as various fusion strategies, emphasizing their characteristics, applicability, advantages, and limitations. Subsequently, the paper surveys existing astronomical multimodal studies and datasets. Finally, the discussion section synthesizes key findings, identifies potential challenges, and suggests promising directions for future research. By offering a structured overview and critical analysis, this review aims to inspire and guide researchers engaged in DL-based MDF in astronomy.
随着观测技术的快速进步和大规模巡天的广泛实施,各种电磁波数据(如光学和红外)和非电磁波数据(如引力波)越来越容易获得。天文学因此进入了一个前所未有的数据丰富和复杂的时代。长期以来,天文学家一直依靠单峰数据分析来感知宇宙,但当面对当前大量和异构的天文数据时,这些努力往往只能提供有限的见解。在这种背景下,多模态数据融合(MDF)作为一种新兴的方法,通过整合不同模态的信息,为提高天文数据的价值和加深对宇宙的认识提供了新的机会。人工智能(AI)的最新进展,特别是深度学习(DL)的进展,极大地加速了天文学多模式研究的发展。因此,及时审查这一领域是至关重要的。本文首先讨论了天文MDF的动机和必要性,然后概述了天文数据来源和主要数据形式。然后介绍了天文学多模态研究中常用的代表性DL模型、一般融合过程以及各种融合策略,强调了它们的特点、适用性、优势和局限性。随后,对现有的天文多模态研究和数据集进行了综述。最后,讨论部分综合了主要发现,确定了潜在的挑战,并为未来的研究提出了有希望的方向。通过提供结构化的概述和批判性的分析,本文旨在启发和指导从事基于dl的天文学MDF的研究人员。
{"title":"Deep learning-based astronomical multimodal data fusion: A comprehensive review","authors":"Wujun Shao ,&nbsp;Dongwei Fan ,&nbsp;Chenzhou Cui ,&nbsp;Yunfei Xu ,&nbsp;Shirui Wei ,&nbsp;Xin Lyu","doi":"10.1016/j.inffus.2025.104103","DOIUrl":"10.1016/j.inffus.2025.104103","url":null,"abstract":"<div><div>With the rapid advancements in observational technologies and the widespread implementation of large-scale sky surveys, diverse electromagnetic wave data (e.g., optical and infrared) and non-electromagnetic wave data (e.g., gravitational waves) have become increasingly accessible. Astronomy has thus entered an unprecedented era of data abundance and complexity. Astronomers have long relied on unimodal data analysis to perceive the universe, but these efforts often provide only limited insights when confronted with the current massive and heterogeneous astronomical data. In this context, multimodal data fusion (MDF), as an emerging method, provides new opportunities to enhance the value of astronomical data and deepening the understanding of the universe by integrating information from different modalities. Recent progress in artificial intelligence (AI), particularly in deep learning (DL), has greatly accelerated the development of multimodal research in astronomy. Therefore, a timely review of this field is essential. This paper begins by discussing the motivation and necessity of astronomical MDF, followed by an overview of astronomical data sources and major data modalities. It then introduces representative DL models commonly used in astronomical multimodal studies, the general fusion process as well as various fusion strategies, emphasizing their characteristics, applicability, advantages, and limitations. Subsequently, the paper surveys existing astronomical multimodal studies and datasets. Finally, the discussion section synthesizes key findings, identifies potential challenges, and suggests promising directions for future research. By offering a structured overview and critical analysis, this review aims to inspire and guide researchers engaged in DL-based MDF in astronomy.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104103"},"PeriodicalIF":15.5,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1