首页 > 最新文献

Information Fusion最新文献

英文 中文
GIAFormer: A Gradient-Infused Attention and Transformer for Pain Assessment with EDA-fNIRS Fusion GIAFormer:一种梯度注入的注意力和转换器,用于EDA-fNIRS融合的疼痛评估
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.inffus.2026.104173
Muhammad Umar Khan , Girija Chetty , Stefanos Gkikas , Manolis Tsiknakis , Roland Goecke , Raul Fernandez-Rojas
Reliable pain assessment is crucial in clinical practice, yet it remains a challenge because self-report-based assessment is inherently subjective. In this work, we introduce GIAFormer, a deep learning framework designed to provide an objective measure of multilevel pain by jointly analysing Electrodermal Activity (EDA) and functional Near-Infrared Spectroscopy (fNIRS) signals. By combining the complementary information from autonomic and cortical responses, the proposed model aims to capture both physiological and neural aspects of pain. GIAFormer integrates a Gradient-Infused Attention (GIA) module with a Transformer. The GIA module enhances signal representation by fusing the physiological signals with their temporal gradients and applying spatial attention to highlight inter-channel dependencies. The Transformer component follows, enabling the model to learn long-range temporal relationships. The framework was evaluated on the AI4Pain dataset comprising 65 subjects using a leave-one-subject-out validation protocol. GIAFormer achieved an accuracy of 90.51% and outperformed recent state-of-the-art approaches. These findings highlight the potential of gradient-aware attention and multimodal fusion for interpretable, non-invasive, and generalisable pain assessment suitable for clinical and real-world applications.
可靠的疼痛评估在临床实践中至关重要,但它仍然是一个挑战,因为基于自我报告的评估本质上是主观的。在这项工作中,我们介绍了GIAFormer,这是一个深度学习框架,旨在通过联合分析皮肤电活动(EDA)和功能近红外光谱(fNIRS)信号,提供多层次疼痛的客观测量。通过结合来自自主神经和皮层反应的互补信息,提出的模型旨在捕捉疼痛的生理和神经方面。GIAFormer集成了一个梯度注入注意力(GIA)模块和一个变压器。GIA模块通过融合生理信号及其时间梯度和应用空间注意来突出通道间依赖性来增强信号表示。接下来是Transformer组件,使模型能够学习长期时间关系。该框架在AI4Pain数据集上进行评估,该数据集包括65个受试者,使用留一个受试者验证协议。GIAFormer实现了90.51%的准确率,优于最近的最先进的方法。这些发现强调了梯度感知注意力和多模态融合在可解释、无创、可推广的疼痛评估中的潜力,适用于临床和现实世界的应用。
{"title":"GIAFormer: A Gradient-Infused Attention and Transformer for Pain Assessment with EDA-fNIRS Fusion","authors":"Muhammad Umar Khan ,&nbsp;Girija Chetty ,&nbsp;Stefanos Gkikas ,&nbsp;Manolis Tsiknakis ,&nbsp;Roland Goecke ,&nbsp;Raul Fernandez-Rojas","doi":"10.1016/j.inffus.2026.104173","DOIUrl":"10.1016/j.inffus.2026.104173","url":null,"abstract":"<div><div>Reliable pain assessment is crucial in clinical practice, yet it remains a challenge because self-report-based assessment is inherently subjective. In this work, we introduce GIAFormer, a deep learning framework designed to provide an objective measure of multilevel pain by jointly analysing Electrodermal Activity (EDA) and functional Near-Infrared Spectroscopy (fNIRS) signals. By combining the complementary information from autonomic and cortical responses, the proposed model aims to capture both physiological and neural aspects of pain. GIAFormer integrates a Gradient-Infused Attention (GIA) module with a Transformer. The GIA module enhances signal representation by fusing the physiological signals with their temporal gradients and applying spatial attention to highlight inter-channel dependencies. The Transformer component follows, enabling the model to learn long-range temporal relationships. The framework was evaluated on the AI4Pain dataset comprising 65 subjects using a leave-one-subject-out validation protocol. GIAFormer achieved an accuracy of 90.51% and outperformed recent state-of-the-art approaches. These findings highlight the potential of gradient-aware attention and multimodal fusion for interpretable, non-invasive, and generalisable pain assessment suitable for clinical and real-world applications.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104173"},"PeriodicalIF":15.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grading-inspired complementary enhancing for multimodal sentiment analysis 基于评分的多模态情感分析互补增强
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.inffus.2026.104174
Zhijing Huang , Wen-Jue He , Baotian Hu, Zheng Zhang
Due to its strong capacity for integrating heterogeneous multi-source information, multimodal sentiment analysis (MSA) has achieved remarkable progress in affective computing. However, existing methods typically adopt symmetric fusion strategies that treat all modalities equally, overlooking their inherent performance disparities that some modalities excel at discriminative representation, while others carry underutilized supportive cues. This limitation leads to insufficiency in cross-modal complementary correlation exploration. To address this issue, we propose a novel Grading-Inspired Complementary Enhancing (GCE) framework for MSA, which is one of the first attempts to conduct dynamic assessment for knowledge transfer in progressive multimodal fusion and cooperation. Specifically, based on cross-modal interaction, a task-aware grading mechanism categorizes modality-pair associations into dominant (high-performing) and supplementary (low-performing) branches according to their task performance. Accordingly, a relation filtering module selectively identifies the trustworthy information from the dominant branch to enhance consistency exploration in supplementary modality pairs with minimized redundancy. Afterwards, a weight adaptation module is adopted to dynamically adjust the guiding weight of individual samples for adaptability and generalization. Extensive experiments conducted on three benchmark datasets evidence that our proposed GCE approach can outperform the state-of-the-art MSA methods. Our code is available at https://github.com/hka-7/GCEforMSA.
多模态情感分析(MSA)由于具有强大的整合异构多源信息的能力,在情感计算领域取得了显著的进展。然而,现有的方法通常采用对称融合策略,平等地对待所有模式,忽视了它们内在的性能差异,即一些模式擅长于歧视性表征,而另一些模式则带有未充分利用的支持性线索。这种局限性导致了跨模态互补相关性研究的不足。为了解决这一问题,我们提出了一种新的基于评分启发的互补增强(GCE)框架,这是对渐进式多模态融合与合作中的知识转移进行动态评估的首次尝试。具体而言,基于跨模态交互,任务感知分级机制根据其任务绩效将模态对关联分类为主导(高性能)和辅助(低性能)分支。因此,关系过滤模块选择性地从优势分支中识别可信信息,以增强冗余最小化的互补模态对的一致性探索。然后,采用权值自适应模块动态调整单个样本的引导权值,实现自适应性和泛化。在三个基准数据集上进行的大量实验表明,我们提出的GCE方法优于最先进的MSA方法。我们的代码可在https://github.com/hka-7/GCEforMSA上获得。
{"title":"Grading-inspired complementary enhancing for multimodal sentiment analysis","authors":"Zhijing Huang ,&nbsp;Wen-Jue He ,&nbsp;Baotian Hu,&nbsp;Zheng Zhang","doi":"10.1016/j.inffus.2026.104174","DOIUrl":"10.1016/j.inffus.2026.104174","url":null,"abstract":"<div><div>Due to its strong capacity for integrating heterogeneous multi-source information, multimodal sentiment analysis (MSA) has achieved remarkable progress in affective computing. However, existing methods typically adopt symmetric fusion strategies that treat all modalities equally, overlooking their inherent performance disparities that some modalities excel at discriminative representation, while others carry underutilized supportive cues. This limitation leads to insufficiency in cross-modal complementary correlation exploration. To address this issue, we propose a novel Grading-Inspired Complementary Enhancing (GCE) framework for MSA, which is one of the first attempts to conduct dynamic assessment for knowledge transfer in progressive multimodal fusion and cooperation. Specifically, based on cross-modal interaction, a task-aware grading mechanism categorizes modality-pair associations into dominant (high-performing) and supplementary (low-performing) branches according to their task performance. Accordingly, a relation filtering module selectively identifies the trustworthy information from the dominant branch to enhance consistency exploration in supplementary modality pairs with minimized redundancy. Afterwards, a weight adaptation module is adopted to dynamically adjust the guiding weight of individual samples for adaptability and generalization. Extensive experiments conducted on three benchmark datasets evidence that our proposed GCE approach can outperform the state-of-the-art MSA methods. Our code is available at <span><span>https://github.com/hka-7/GCEforMSA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104174"},"PeriodicalIF":15.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial perturbation for RGB-T tracking via intra-modal excavation and cross-modal collusion 基于模态内挖掘和跨模态合谋的RGB-T跟踪的对抗性扰动
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.inffus.2026.104183
Xinyu Xiang , Xuying Wu , Shengxiang Li , Qinglong Yan , Tong Zou , Hao Zhang , Jiayi Ma
Existing adversarial perturbation attack for visual object trackers mainly focuses on RGB modality, yet research on RGB-T trackers’ adversarial perturbation remains unexplored. To address this gap, we propose an Intra-modal excavation and Cross-modal collusion adversarial perturbation attack algorithm (ICAttack) for RGB-T Tracking. Firstly, we establish a novel intra-modal adversarial clues excavation (ImAE) paradigm. By leveraging the unique distribution properties of each modality as a prior, we independently extract the attack cues of different modalities from the public noise space. Building upon this, we develop a cross-modal adversarial collusion (CmAC) strategy, which enables implicit and dynamic interaction between the adversarial tokens of two modalities. This interaction facilitates negotiation and collaboration, achieving a synergistic attack gain for RGB-T trackers that surpasses the effect of a single-modality attack. The above process, from intra-modal excavation to cross-modal collusion, creates a progressive and systematic attack framework for RGB-T trackers. Besides, by introducing the spatial adversarial intensity control module and precise response disruption loss, we further enhance both the attack stealthiness and precision of our adversarial perturbations. The control module reduces attack strength in less critical areas to improve stealth. The disruption loss uses a small mask on the tracker’s brightest semantic response region, concentrating the perturbation to interfere with the tracker’s target awareness precisely. Extensive evaluations of attack performances in different SOTA victimized RGB-T trackers demonstrate the advantages of ICAttack in terms of specificity and effectiveness of cross-modal attacks. Moreover, we offer a user-friendly interface to promote the practical deployment of adversarial perturbations. Our code is publicly available at https://github.com/Xinyu-Xiang/ICAttack.
现有的针对视觉目标跟踪器的对抗性摄动攻击主要集中在RGB模态上,而针对RGB- t跟踪器的对抗性摄动攻击的研究还很少。为了解决这一差距,我们提出了一种用于RGB-T跟踪的模内挖掘和跨模态合谋对抗摄动攻击算法(ICAttack)。首先,我们建立了一种新的模态内对抗线索挖掘(ImAE)范式。通过利用每个模态的独特分布特性作为先验,我们独立地从公共噪声空间中提取不同模态的攻击线索。在此基础上,我们开发了一种跨模态对抗性共谋(CmAC)策略,该策略使两种模态对抗性令牌之间的隐式和动态交互成为可能。这种交互促进了协商和协作,实现了RGB-T跟踪器的协同攻击增益,超过了单模态攻击的影响。上述过程从模态内挖掘到跨模态串通,为RGB-T跟踪器创建了一个渐进的、系统的攻击框架。此外,通过引入空间对抗强度控制模块和精确响应干扰损失,进一步提高了对抗摄动的攻击隐身性和精度。控制模块在不太关键的区域降低攻击强度,以提高隐身性。干扰损失在跟踪器最亮的语义响应区域上使用一个小掩模,集中扰动来精确干扰跟踪器的目标感知。对不同SOTA受害RGB-T跟踪器攻击性能的广泛评估表明,ICAttack在跨模式攻击的特异性和有效性方面具有优势。此外,我们提供了一个用户友好的界面,以促进对抗性扰动的实际部署。我们的代码可以在https://github.com/Xinyu-Xiang/ICAttack上公开获得。
{"title":"Adversarial perturbation for RGB-T tracking via intra-modal excavation and cross-modal collusion","authors":"Xinyu Xiang ,&nbsp;Xuying Wu ,&nbsp;Shengxiang Li ,&nbsp;Qinglong Yan ,&nbsp;Tong Zou ,&nbsp;Hao Zhang ,&nbsp;Jiayi Ma","doi":"10.1016/j.inffus.2026.104183","DOIUrl":"10.1016/j.inffus.2026.104183","url":null,"abstract":"<div><div>Existing adversarial perturbation attack for visual object trackers mainly focuses on RGB modality, yet research on RGB-T trackers’ adversarial perturbation remains unexplored. To address this gap, we propose an <strong>I</strong>ntra-modal excavation and <strong>C</strong>ross-modal collusion adversarial perturbation attack algorithm (ICAttack) for RGB-T Tracking. Firstly, we establish a novel intra-modal adversarial clues excavation (ImAE) paradigm. By leveraging the unique distribution properties of each modality as a prior, we independently extract the attack cues of different modalities from the public noise space. Building upon this, we develop a cross-modal adversarial collusion (CmAC) strategy, which enables implicit and dynamic interaction between the adversarial tokens of two modalities. This interaction facilitates negotiation and collaboration, achieving a synergistic attack gain for RGB-T trackers that surpasses the effect of a single-modality attack. The above process, from intra-modal excavation to cross-modal collusion, creates a progressive and systematic attack framework for RGB-T trackers. Besides, by introducing the spatial adversarial intensity control module and precise response disruption loss, we further enhance both the attack stealthiness and precision of our adversarial perturbations. The control module reduces attack strength in less critical areas to improve stealth. The disruption loss uses a small mask on the tracker’s brightest semantic response region, concentrating the perturbation to interfere with the tracker’s target awareness precisely. Extensive evaluations of attack performances in different SOTA victimized RGB-T trackers demonstrate the advantages of ICAttack in terms of specificity and effectiveness of cross-modal attacks. Moreover, we offer a user-friendly interface to promote the practical deployment of adversarial perturbations. Our code is publicly available at <span><span>https://github.com/Xinyu-Xiang/ICAttack</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104183"},"PeriodicalIF":15.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PromptMix: LLM-aided prompt learning for generalizing vision-language models PromptMix:用于泛化视觉语言模型的llm辅助提示学习
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.inffus.2026.104186
Yongcai Chen , Qinghua Zhang , Xinfa Shi , Lei Zhang
Intelligent engineering tasks step into real application with the development of deep learning techniques. However, performance in real conditions often falls into decline caused by scarce data, or subtle, easily confused patterns. Although vision-language models with prompt learning provide a new way for learning without retraining the backbone, these approaches still suffer from problems of overfitting under low-data regimes or poor expressive ability of prompts. To address these challenges, we propose a novel framework PromptMix that jointly considers semantic prompt learning, multimodal information fusion, and the alignment between pre-trained and domain-specific data. Specifically, PromptMix integrates three key components: (1) a Modality-Agnostic Shared Representation module to construct a shared latent space that mitigates the distribution discrepancies between pre-trained and target data, (2) a LLM-Aided Prompt Evolution mechanism to semantically enrich and iteratively refine learnable context prompts, and (3) a Cross-Attentive Adapter to enhance multimodal information fusion and robustness under low-sample conditions. Experiments on seven datasets, including six public benchmarks and one custom industrial dataset, demonstrate that PromptMix effectively enhances vision-language model adaptability, improves semantic representations, and achieves robust generalization under both base-to-novel and few-shot learning scenarios, delivering superior performance in engineering applications with limited labeled data.
随着深度学习技术的发展,智能工程任务逐步进入实际应用。然而,在实际情况下,由于数据稀缺或微妙的、容易混淆的模式,性能往往会下降。尽管具有提示学习的视觉语言模型提供了一种无需再训练主干的新学习方法,但这些方法在低数据情况下仍然存在过拟合或提示表达能力差的问题。为了解决这些挑战,我们提出了一个新的框架PromptMix,它联合考虑了语义提示学习、多模态信息融合以及预训练数据和特定领域数据之间的对齐。具体来说,PromptMix集成了三个关键组件:(1)模态不可知共享表示模块,用于构建共享潜在空间,以减轻预训练数据和目标数据之间的分布差异;(2)llm辅助提示进化机制,用于语义丰富和迭代细化可学习的上下文提示;(3)交叉关注适配器,用于增强低样本条件下的多模态信息融合和鲁棒性。在包括6个公共基准和1个自定义工业数据集在内的7个数据集上进行的实验表明,PromptMix有效地增强了视觉语言模型的适应性,改善了语义表示,并在基础到新颖和少量学习场景下实现了鲁棒泛化,在标记数据有限的工程应用中提供了卓越的性能。
{"title":"PromptMix: LLM-aided prompt learning for generalizing vision-language models","authors":"Yongcai Chen ,&nbsp;Qinghua Zhang ,&nbsp;Xinfa Shi ,&nbsp;Lei Zhang","doi":"10.1016/j.inffus.2026.104186","DOIUrl":"10.1016/j.inffus.2026.104186","url":null,"abstract":"<div><div>Intelligent engineering tasks step into real application with the development of deep learning techniques. However, performance in real conditions often falls into decline caused by scarce data, or subtle, easily confused patterns. Although vision-language models with prompt learning provide a new way for learning without retraining the backbone, these approaches still suffer from problems of overfitting under low-data regimes or poor expressive ability of prompts. To address these challenges, we propose a novel framework <em>PromptMix</em> that jointly considers semantic prompt learning, multimodal information fusion, and the alignment between pre-trained and domain-specific data. Specifically, PromptMix integrates three key components: (1) a <em>Modality-Agnostic Shared Representation</em> module to construct a shared latent space that mitigates the distribution discrepancies between pre-trained and target data, (2) a <em>LLM-Aided Prompt Evolution</em> mechanism to semantically enrich and iteratively refine learnable context prompts, and (3) a <em>Cross-Attentive Adapter</em> to enhance multimodal information fusion and robustness under low-sample conditions. Experiments on seven datasets, including six public benchmarks and one custom industrial dataset, demonstrate that PromptMix effectively enhances vision-language model adaptability, improves semantic representations, and achieves robust generalization under both base-to-novel and few-shot learning scenarios, delivering superior performance in engineering applications with limited labeled data.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104186"},"PeriodicalIF":15.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSTFDN: An EEG-fNIRS multimodal spatial-temporal fusion decoding network for personalized multi-task scenarios MSTFDN:面向个性化多任务场景的EEG-fNIRS多模态时空融合解码网络
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.inffus.2026.104187
Peng Ding , Liyong Yin , Zhengxuan Zhou , Yuwei Su , Minqian Zhang , Yingwei Li , Xiaoli Li
Multimodal information enables Brain-Computer Interface (BCI) systems to adapt to the differences in individual neural characteristics, overcoming the limitations of each modality. As a result, multimodal fusion technology that integrates non-invasive brain imaging techniques such as electroencephalogram (EEG) and functional near-infrared spectroscopy (fNIRS) has gained widespread attention. However, in the field of hybrid BCI, challenges remain in effectively integrating the heterogeneous information from these two modalities and improving the decoding accuracy and generalization across various task conditions. The core issue lies in the underutilization of each modality’s signal characteristics and the incomplete capture of the potential homogeneity of higher-order hybrid features. Therefore, we propose a novel EEG-fNIRS multimodal spatial-temporal fusion decoding network (MSTFDN). This network combines multi-scale temporal convolution of time series differences and spatial multi-head self-attention mechanism. MSTFDN consists of three core components, including the EEG branch, the fNIRS branch, the EEG-fNIRS fusion branch. A multi-dimensional loss function is constructed based on independent and hybrid space multi-head expression diversity, aiming to achieve high-precision decoding in small sample datasets under multi-task and multiple personalized experimental protocols. In experiments with four motor imagery (MI) and mental workload (MWL) tasks of two public datasets under three personalized experimental protocols, MSTFDN demonstrated state-of-the-art performance. The more comprehensive experimental protocols may establish a benchmark for model performance evaluation for future research in this field. Meanwhile, MSTFDN is also expected to become a new benchmark method for EEG-fNIRS hybrid BCI research.
多模态信息使脑机接口(BCI)系统能够适应个体神经特征的差异,克服每种模态的局限性。因此,结合脑电图(EEG)和功能近红外光谱(fNIRS)等非侵入性脑成像技术的多模态融合技术得到了广泛的关注。然而,在混合脑机接口领域,如何有效地整合这两种模式的异构信息,提高不同任务条件下的解码精度和泛化程度仍然是一个挑战。核心问题在于每种模态的信号特征未被充分利用,高阶混合特征的潜在同质性未被完全捕获。为此,我们提出了一种新的EEG-fNIRS多模态时空融合解码网络(MSTFDN)。该网络结合了时间序列差异的多尺度时间卷积和空间多头自注意机制。MSTFDN由EEG分支、fNIRS分支、EEG-fNIRS融合分支三个核心部分组成。基于独立和混合空间多头表达多样性构建了多维损失函数,旨在实现多任务、多个性化实验方案下小样本数据集的高精度解码。在三个个性化实验方案下,MSTFDN在两个公共数据集的四个运动意象(MI)和精神负荷(MWL)任务中表现出了最先进的性能。更全面的实验方案可以为未来该领域的研究建立模型性能评价的基准。同时,MSTFDN也有望成为EEG-fNIRS混合脑接口研究的新标杆方法。
{"title":"MSTFDN: An EEG-fNIRS multimodal spatial-temporal fusion decoding network for personalized multi-task scenarios","authors":"Peng Ding ,&nbsp;Liyong Yin ,&nbsp;Zhengxuan Zhou ,&nbsp;Yuwei Su ,&nbsp;Minqian Zhang ,&nbsp;Yingwei Li ,&nbsp;Xiaoli Li","doi":"10.1016/j.inffus.2026.104187","DOIUrl":"10.1016/j.inffus.2026.104187","url":null,"abstract":"<div><div>Multimodal information enables Brain-Computer Interface (BCI) systems to adapt to the differences in individual neural characteristics, overcoming the limitations of each modality. As a result, multimodal fusion technology that integrates non-invasive brain imaging techniques such as electroencephalogram (EEG) and functional near-infrared spectroscopy (fNIRS) has gained widespread attention. However, in the field of hybrid BCI, challenges remain in effectively integrating the heterogeneous information from these two modalities and improving the decoding accuracy and generalization across various task conditions. The core issue lies in the underutilization of each modality’s signal characteristics and the incomplete capture of the potential homogeneity of higher-order hybrid features. Therefore, we propose a novel EEG-fNIRS multimodal spatial-temporal fusion decoding network (MSTFDN). This network combines multi-scale temporal convolution of time series differences and spatial multi-head self-attention mechanism. MSTFDN consists of three core components, including the EEG branch, the fNIRS branch, the EEG-fNIRS fusion branch. A multi-dimensional loss function is constructed based on independent and hybrid space multi-head expression diversity, aiming to achieve high-precision decoding in small sample datasets under multi-task and multiple personalized experimental protocols. In experiments with four motor imagery (MI) and mental workload (MWL) tasks of two public datasets under three personalized experimental protocols, MSTFDN demonstrated state-of-the-art performance. The more comprehensive experimental protocols may establish a benchmark for model performance evaluation for future research in this field. Meanwhile, MSTFDN is also expected to become a new benchmark method for EEG-fNIRS hybrid BCI research.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104187"},"PeriodicalIF":15.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StegaFusion: Steganography for information hiding and fusion in multimodality 隐写融合:多模态信息隐藏与融合的隐写技术
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.inffus.2026.104150
Zihao Xu , Dawei Xu , Zihan Li , Juan Hu , Baokun Zheng , Chuan Zhang , Liehuang Zhu
Current generative steganography techniques have attracted considerable attention due to their security. However, different platforms and social environments exhibit varying preferred modalities, and existing generative steganography techniques are often restricted to a single modality. Inspired by advancements in inpainting techniques, we observe that the inpainting process is inherently generative. Moreover, cross-modal inpainting minimally perturbs unchanged regions and shares a consistent masking-and-fill procedure. Based on these insights, we introduce StegaFusion, a novel framework for unifying multimodal generative steganography. StegaFusion leverages shared generation seeds and conditional information, which enables the receiver to deterministically reconstruct the reference content. The receiver then performs differential analysis on the inpainting-generated stego content to extract the secret message. Compared to traditional unimodal methods, StegaFusion enhances controllability, security, compatibility, and interpretability without requiring additional model training. To the best of our knowledge, StegaFusion is the first framework to formalize and unify cross-modal generative steganography, offering wide applicability. Extensive qualitative and quantitative experiments demonstrate the superior performance of StegaFusion in terms of controllability, security, and cross-modal compatibility.
目前的生成隐写技术由于其安全性受到了广泛的关注。然而,不同的平台和社会环境表现出不同的首选模式,现有的生成隐写技术通常仅限于单一模式。受绘画技术进步的启发,我们观察到绘画过程本身就是生成的。此外,跨模态绘制最小程度地干扰未改变的区域,并共享一致的遮罩和填充过程。基于这些见解,我们介绍了StegaFusion,一个统一多模态生成隐写的新框架。StegaFusion利用共享的生成种子和条件信息,使接收者能够确定性地重建参考内容。然后,接收方对绘制生成的隐写内容执行差分分析以提取秘密消息。与传统的单模态方法相比,StegaFusion增强了可控性、安全性、兼容性和可解释性,而无需额外的模型训练。据我们所知,StegaFusion是第一个形式化和统一跨模态生成隐写的框架,具有广泛的适用性。大量的定性和定量实验证明了StegaFusion在可控性、安全性和跨模态兼容性方面的优越性能。
{"title":"StegaFusion: Steganography for information hiding and fusion in multimodality","authors":"Zihao Xu ,&nbsp;Dawei Xu ,&nbsp;Zihan Li ,&nbsp;Juan Hu ,&nbsp;Baokun Zheng ,&nbsp;Chuan Zhang ,&nbsp;Liehuang Zhu","doi":"10.1016/j.inffus.2026.104150","DOIUrl":"10.1016/j.inffus.2026.104150","url":null,"abstract":"<div><div>Current generative steganography techniques have attracted considerable attention due to their security. However, different platforms and social environments exhibit varying preferred modalities, and existing generative steganography techniques are often restricted to a single modality. Inspired by advancements in inpainting techniques, we observe that the inpainting process is inherently generative. Moreover, cross-modal inpainting minimally perturbs unchanged regions and shares a consistent masking-and-fill procedure. Based on these insights, we introduce StegaFusion, a novel framework for unifying multimodal generative steganography. StegaFusion leverages shared generation seeds and conditional information, which enables the receiver to deterministically reconstruct the reference content. The receiver then performs differential analysis on the inpainting-generated stego content to extract the secret message. Compared to traditional unimodal methods, StegaFusion enhances controllability, security, compatibility, and interpretability without requiring additional model training. To the best of our knowledge, StegaFusion is the first framework to formalize and unify cross-modal generative steganography, offering wide applicability. Extensive qualitative and quantitative experiments demonstrate the superior performance of StegaFusion in terms of controllability, security, and cross-modal compatibility.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104150"},"PeriodicalIF":15.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EDGCN: An embedding-driven fusion framework for heterogeneity-aware motor imagery decoding 基于嵌入驱动的异构感知运动图像解码融合框架
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.inffus.2026.104170
Chaowen Shen, Yanwen Zhang, Zejing Zhao, Akio Namiki
Motor imagery electroencephalography (MI-EEG) captures neural activity associated with imagined motor tasks and has been widely applied in both basic neuroscience and clinical research. However, the intrinsic spatio-temporal heterogeneity of MI-EEG signals and pronounced inter-subject variability present major challenges for accurate decoding. Most existing deep learning methods rely on fixed architectures and shared parameters, which limits their ability to capture the complex, dynamic patterns driven by individual differences. To address these limitations, we propose an Embedding-Driven Graph Convolutional Network (EDGCN), which leverages a heterogeneity-aware spatio-temporal embedding fusion mechanism to adaptively generate graph convolutional kernel parameters from a shared embedding-driven parameter bank. Specifically, we design a Multi-Resolution Temporal Embedding (MRTE) strategy based on multi-resolution power spectral features and a Structure-Aware Spatial Embedding (SASE) mechanism that integrates both local and global connectivity structures. On this basis, we construct a heterogeneity-aware parameter generation mechanism based on Chebyshev graph convolution to effectively capture the spatiotemporal heterogeneity of EEG signals, with an orthogonality-constrained parameter space that enhances diversity and representational fusion. Experimental results demonstrate that the proposed model achieves superior classification accuracies of 86.50% and 90.14% on the BCIC-IV-2a and BCIC-IV-2b datasets, respectively, outperforming current state-of-the-art methods.
运动图像脑电图(MI-EEG)捕获与想象的运动任务相关的神经活动,已广泛应用于基础神经科学和临床研究。然而,MI-EEG信号固有的时空异质性和明显的主体间变异性给准确解码带来了重大挑战。大多数现有的深度学习方法依赖于固定的架构和共享参数,这限制了它们捕捉由个体差异驱动的复杂动态模式的能力。为了解决这些限制,我们提出了一种嵌入驱动的图卷积网络(EDGCN),它利用异构感知的时空嵌入融合机制从共享的嵌入驱动参数库中自适应地生成图卷积核参数。具体而言,我们设计了一种基于多分辨率功率谱特征的多分辨率时间嵌入(MRTE)策略和一种集成了局部和全局连接结构的结构感知空间嵌入(SASE)机制。在此基础上,构建了基于切比雪夫图卷积的异构感知参数生成机制,有效捕获脑电信号的时空异质性,正交性约束的参数空间增强了多样性和表征融合。实验结果表明,该模型在bbic - iv -2a和bbic - iv -2b数据集上的分类准确率分别达到了86.50%和90.14%,优于目前最先进的方法。
{"title":"EDGCN: An embedding-driven fusion framework for heterogeneity-aware motor imagery decoding","authors":"Chaowen Shen,&nbsp;Yanwen Zhang,&nbsp;Zejing Zhao,&nbsp;Akio Namiki","doi":"10.1016/j.inffus.2026.104170","DOIUrl":"10.1016/j.inffus.2026.104170","url":null,"abstract":"<div><div>Motor imagery electroencephalography (MI-EEG) captures neural activity associated with imagined motor tasks and has been widely applied in both basic neuroscience and clinical research. However, the intrinsic spatio-temporal heterogeneity of MI-EEG signals and pronounced inter-subject variability present major challenges for accurate decoding. Most existing deep learning methods rely on fixed architectures and shared parameters, which limits their ability to capture the complex, dynamic patterns driven by individual differences. To address these limitations, we propose an Embedding-Driven Graph Convolutional Network (EDGCN), which leverages a heterogeneity-aware spatio-temporal embedding fusion mechanism to adaptively generate graph convolutional kernel parameters from a shared embedding-driven parameter bank. Specifically, we design a Multi-Resolution Temporal Embedding (MRTE) strategy based on multi-resolution power spectral features and a Structure-Aware Spatial Embedding (SASE) mechanism that integrates both local and global connectivity structures. On this basis, we construct a heterogeneity-aware parameter generation mechanism based on Chebyshev graph convolution to effectively capture the spatiotemporal heterogeneity of EEG signals, with an orthogonality-constrained parameter space that enhances diversity and representational fusion. Experimental results demonstrate that the proposed model achieves superior classification accuracies of 86.50% and 90.14% on the BCIC-IV-2a and BCIC-IV-2b datasets, respectively, outperforming current state-of-the-art methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104170"},"PeriodicalIF":15.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unleashing Mamba’s expressive power: A non-tradeoff approach to spatio-temporal forecasting 释放曼巴的表现力:一种非权衡的时空预测方法
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.inffus.2026.104172
Zhiqi Shao , Ze Wang , Haoning Xi , Michael G.H. Bell , Xusheng Yao , D. Glenn Geers , Junbin Gao
Real-time spatiotemporal forecasting, particularly in traffic systems, requires balancing computational cost and predictive accuracy-a challenge that conventional methods struggle to address effectively. In this work, we propose a non-trade-off framework called Spatial-Temporal Selective State Space (ST-Mamba), which leverages two key components to achieve both efficiency and accuracy concurrently. The Spatial-Temporal Mixer (ST-Mixer) dynamically fuses spatial and temporal features to capture complex dependencies, and the STF-Mamba layer incorporates Mamba’s selective state-space formulation to capture long-range dynamics efficiently. Beyond empirical improvements, we address a critical gap in the literature by presenting a theoretical analysis of ST-Mamba’s expressive power. Specifically, we establish its ability to approximate a broad class of Transformer and formally demonstrate its equivalence to at least two consecutive attention layers within the same framework. This result highlights ST-Mamba’s capacity to capture long-range dependencies while reducing computational overhead efficiently, reinforcing its theoretical and practical advantages over conventional transformer-based models. Through extensive evaluations of real-world traffic datasets, ST-Mamba demonstrates a 61.11% reduction in runtime alongside a 0.67% improvement in predictive performance compared to leading approaches, underscoring its potential to set a new benchmark for real-time spatiotemporal forecasting.
实时时空预测,特别是在交通系统中,需要平衡计算成本和预测准确性,这是传统方法难以有效解决的挑战。在这项工作中,我们提出了一个称为时空选择状态空间(ST-Mamba)的非权衡框架,它利用两个关键组件同时实现效率和准确性。时空混频器(ST-Mixer)动态地融合空间和时间特征来捕获复杂的依赖关系,而STF-Mamba层结合了Mamba的选择性状态空间公式来有效地捕获远程动态。除了经验的改进,我们通过提出st -曼巴的表达能力的理论分析,解决了一个关键的差距在文献。具体地说,我们建立了它近似于一个广泛的Transformer类的能力,并正式证明了它与同一框架内至少两个连续的注意层的等价性。这一结果突出了ST-Mamba在有效减少计算开销的同时捕获远程依赖关系的能力,加强了其与传统变压器模型相比的理论和实践优势。通过对现实世界交通数据集的广泛评估,ST-Mamba显示,与领先的方法相比,运行时间减少了61.11%,预测性能提高了0.67%,强调了其为实时时空预测设定新基准的潜力。
{"title":"Unleashing Mamba’s expressive power: A non-tradeoff approach to spatio-temporal forecasting","authors":"Zhiqi Shao ,&nbsp;Ze Wang ,&nbsp;Haoning Xi ,&nbsp;Michael G.H. Bell ,&nbsp;Xusheng Yao ,&nbsp;D. Glenn Geers ,&nbsp;Junbin Gao","doi":"10.1016/j.inffus.2026.104172","DOIUrl":"10.1016/j.inffus.2026.104172","url":null,"abstract":"<div><div>Real-time spatiotemporal forecasting, particularly in traffic systems, requires balancing computational cost and predictive accuracy-a challenge that conventional methods struggle to address effectively. In this work, we propose a non-trade-off framework called Spatial-Temporal Selective State Space (ST-Mamba), which leverages two key components to achieve both efficiency and accuracy concurrently. The Spatial-Temporal Mixer (ST-Mixer) dynamically fuses spatial and temporal features to capture complex dependencies, and the STF-Mamba layer incorporates Mamba’s selective state-space formulation to capture long-range dynamics efficiently. Beyond empirical improvements, we address a critical gap in the literature by presenting a theoretical analysis of ST-Mamba’s expressive power. Specifically, we establish its ability to approximate a broad class of Transformer and formally demonstrate its equivalence to at least two consecutive attention layers within the same framework. This result highlights ST-Mamba’s capacity to capture long-range dependencies while reducing computational overhead efficiently, reinforcing its theoretical and practical advantages over conventional transformer-based models. Through extensive evaluations of real-world traffic datasets, <span>ST-Mamba</span> demonstrates a 61.11% reduction in runtime alongside a 0.67% improvement in predictive performance compared to leading approaches, underscoring its potential to set a new benchmark for real-time spatiotemporal forecasting.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104172"},"PeriodicalIF":15.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MRFNet: Multi-reference fusion for image deblurring MRFNet:多参考融合图像去模糊
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.inffus.2026.104169
Tingrui Guo , Chi Xu , Kaifeng Tang , Hao Qian
Motion blur is a persistent challenge in visual data processing. While single-image deblurring methods have made significant progress, using multiple reference images from the same scene for deblurring remains an overlooked problem. Existing methods struggle to integrate information from multiple reference images with differences in lighting, color, and perspective. Herein, we propose a novel framework MRFNet which leverages any number of discontinuous reference images for deblurring. The framework consists of two key components: (1) the Offset Fusion Module (OFM) guided by dense matching, which aggregates features from discontinuous reference images through high-frequency detail enhancement and permutation-invariant units; and (2) the Deformable Enrichment Module (DEM), which refines misaligned features using deformable convolutions for precise detail recovery. Quantitative and qualitative evaluations on synthetic and real-world datasets show that the proposed method outperforms state-of-the-art deblurring approaches. Additionally, a new real-world dataset is provided to fill the gap in evaluating discontinuous reference problems.
运动模糊一直是视觉数据处理中的难题。虽然单幅图像去模糊方法已经取得了重大进展,但使用来自同一场景的多幅参考图像进行去模糊仍然是一个被忽视的问题。现有的方法很难整合来自多个参考图像的信息,这些图像在光照、颜色和视角上存在差异。在此,我们提出了一个新的框架MRFNet利用任意数量的不连续参考图像去模糊。该框架由两个关键部分组成:(1)偏移融合模块(OFM)以密集匹配为指导,通过高频细节增强和置换不变单元聚合不连续参考图像的特征;(2)可变形浓缩模块(DEM),它使用可变形卷积来细化不对齐的特征,以实现精确的细节恢复。对合成和真实世界数据集的定量和定性评估表明,所提出的方法优于最先进的去模糊方法。此外,还提供了一个新的真实数据集来填补评估不连续参考问题的空白。
{"title":"MRFNet: Multi-reference fusion for image deblurring","authors":"Tingrui Guo ,&nbsp;Chi Xu ,&nbsp;Kaifeng Tang ,&nbsp;Hao Qian","doi":"10.1016/j.inffus.2026.104169","DOIUrl":"10.1016/j.inffus.2026.104169","url":null,"abstract":"<div><div>Motion blur is a persistent challenge in visual data processing. While single-image deblurring methods have made significant progress, using multiple reference images from the same scene for deblurring remains an overlooked problem. Existing methods struggle to integrate information from multiple reference images with differences in lighting, color, and perspective. Herein, we propose a novel framework MRFNet which leverages any number of discontinuous reference images for deblurring. The framework consists of two key components: (1) the Offset Fusion Module (OFM) guided by dense matching, which aggregates features from discontinuous reference images through high-frequency detail enhancement and permutation-invariant units; and (2) the Deformable Enrichment Module (DEM), which refines misaligned features using deformable convolutions for precise detail recovery. Quantitative and qualitative evaluations on synthetic and real-world datasets show that the proposed method outperforms state-of-the-art deblurring approaches. Additionally, a new real-world dataset is provided to fill the gap in evaluating discontinuous reference problems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104169"},"PeriodicalIF":15.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A data fusion approach to synthesize microwave imagery of tropical cyclones from infrared data using vision transformers 利用视觉变换从红外数据合成热带气旋微波图像的数据融合方法
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.inffus.2026.104167
Fan Meng , Tao Song , Xianxuan Lin , Kunlin Yang
Microwave images with high spatiotemporal resolution are essential for observing and predicting tropical cyclones (TCs), including TC positioning, intensity estimation, and detection of concentric eyewall. Nevertheless, the temporal resolution of tropical cyclone microwave (TCMW) images is limited due to satellite quantity and orbit constraints, presenting a challenging problem for TC disaster forecasting. This research suggests a multi-sensor data fusion approach, using high-temporal-resolution tropical cyclone infrared (TCIR) images to generate synthetic TCMW images, offering a solution to this data scarcity problem. In particular, we introduce a deep learning network based on the Vision Transformer (TCA-ViT) to translate TCIR images into TCMW images. This can be viewed as a form of synthetic data generation, enhancing the available information for decision-making. We integrate a phase-based physical guidance mechanism into the training process. Furthermore, we have developed a dataset of TC infrared-to-microwave image conversions (TCIR2MW) for training and testing the model. Experimental results demonstrate the method’s capability in rapidly and accurately extracting key features of TCs. Leveraging techniques like Mask and Transfer Learning, it addresses the absence of TCMW images by generating MW images from IR images, thereby aiding downstream tasks like TC intensity and precipitation forecasting. This study introduces a novel approach to the field of TC image research, with the potential to advance deep learning in this direction and provide vital insights for real-time observation and prediction of global TCs. Our source code and data are publicly available online at https://github.com/kleenY/TCIR2MW.
具有高时空分辨率的微波图像对热带气旋的观测和预报至关重要,包括热带气旋定位、强度估计和同心眼壁检测。然而,由于卫星数量和轨道的限制,热带气旋微波图像的时间分辨率有限,这给热带气旋灾害预报带来了挑战。本研究提出了一种多传感器数据融合方法,利用高时间分辨率热带气旋红外(TCIR)图像生成合成的热带气旋红外(TCMW)图像,解决了这一数据稀缺问题。特别地,我们引入了一种基于视觉转换器(TCA-ViT)的深度学习网络,将TCIR图像转换为TCMW图像。这可以看作是生成综合数据的一种形式,增强了决策所需的现有信息。我们将基于阶段的物理指导机制整合到训练过程中。此外,我们开发了一个TC红外到微波图像转换(TCIR2MW)数据集,用于训练和测试模型。实验结果表明,该方法能够快速准确地提取tc的关键特征。利用掩模和迁移学习等技术,它通过从红外图像生成MW图像来解决TCMW图像的缺失问题,从而帮助下游任务,如TC强度和降水预报。本研究为TC图像研究领域引入了一种新的方法,有可能在这一方向上推进深度学习,并为全球TC的实时观测和预测提供重要的见解。我们的源代码和数据可以在https://github.com/kleenY/TCIR2MW上公开获取。
{"title":"A data fusion approach to synthesize microwave imagery of tropical cyclones from infrared data using vision transformers","authors":"Fan Meng ,&nbsp;Tao Song ,&nbsp;Xianxuan Lin ,&nbsp;Kunlin Yang","doi":"10.1016/j.inffus.2026.104167","DOIUrl":"10.1016/j.inffus.2026.104167","url":null,"abstract":"<div><div>Microwave images with high spatiotemporal resolution are essential for observing and predicting tropical cyclones (TCs), including TC positioning, intensity estimation, and detection of concentric eyewall. Nevertheless, the temporal resolution of tropical cyclone microwave (TCMW) images is limited due to satellite quantity and orbit constraints, presenting a challenging problem for TC disaster forecasting. This research suggests a multi-sensor data fusion approach, using high-temporal-resolution tropical cyclone infrared (TCIR) images to generate synthetic TCMW images, offering a solution to this data scarcity problem. In particular, we introduce a deep learning network based on the Vision Transformer (TCA-ViT) to translate TCIR images into TCMW images. This can be viewed as a form of synthetic data generation, enhancing the available information for decision-making. We integrate a phase-based physical guidance mechanism into the training process. Furthermore, we have developed a dataset of TC infrared-to-microwave image conversions (TCIR2MW) for training and testing the model. Experimental results demonstrate the method’s capability in rapidly and accurately extracting key features of TCs. Leveraging techniques like Mask and Transfer Learning, it addresses the absence of TCMW images by generating MW images from IR images, thereby aiding downstream tasks like TC intensity and precipitation forecasting. This study introduces a novel approach to the field of TC image research, with the potential to advance deep learning in this direction and provide vital insights for real-time observation and prediction of global TCs. Our source code and data are publicly available online at <span><span>https://github.com/kleenY/TCIR2MW</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104167"},"PeriodicalIF":15.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1