首页 > 最新文献

Information Fusion最新文献

英文 中文
ELOGOnet: Knowledge-enhanced local-global learning for cardiac diagnosis ELOGOnet:用于心脏诊断的知识增强局部-全局学习
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-21 DOI: 10.1016/j.inffus.2026.104168
Yizhuo Feng , Beibei Wang , Zirui Wang , Ke Jiang , Peng Wang , Lidong Du , Xianxiang Chen , Pang Wu , Zhenfeng Li , Junxian Song , Libin Jiang , Zhen Fang
The diagnostic process of a human cardiologist is a holistic act of reasoning that seamlessly integrates two key components: (1) a synergistic analysis of the ECG signal itself, combining insights from both global rhythmic patterns and local morphologies; and (2) a prior-informed interpretation process that leverages internalized medical priors and external patient-specific information. However, existing deep learning models struggle to emulate this complex expert reasoning, often facing a dual dilemma: a failure to synergize local and global features within a unified framework, and a widespread neglect of valuable, low-cost prior knowledge sources like disease associations and patient metadata. To bridge this gap, we propose ELOGOnet, a novel deep learning framework designed to model the expert diagnostic workflow. Modeling the expert’s synergistic signal analysis, ELOGOnet employs a parallel hybrid architecture that integrates a State Space Model (SSM) for global rhythms and a CNN for local morphologies. Enabling a prior-informed interpretation, the framework incorporates two key innovations: an association loss that enhances clinical coherence by modeling disease comorbidity and mutual exclusivity, and an adaptive cross-gating module for the robust fusion of patient metadata. Extensive experiments on several mainstream public benchmarks demonstrate that ELOGOnet establishes a new state-of-the-art by achieving an average Macro-F1 of 63.8% across 8 multi-label tasks and consistently outperforming 16 competitive baselines, thereby setting a new performance benchmark for automated cardiac diagnosis from ECG.
人类心脏病专家的诊断过程是一个整体的推理行为,它无缝集成了两个关键组成部分:(1)对ECG信号本身的协同分析,结合来自全局节律模式和局部形态的见解;(2)利用内化医疗经验和外部患者特定信息的先验信息解释过程。然而,现有的深度学习模型很难模仿这种复杂的专家推理,往往面临双重困境:无法在统一的框架内协同局部和全局特征,以及普遍忽视有价值的、低成本的先验知识来源,如疾病关联和患者元数据。为了弥补这一差距,我们提出了ELOGOnet,这是一个新的深度学习框架,旨在对专家诊断工作流程进行建模。ELOGOnet采用并行混合架构,为专家的协同信号分析建模,该架构集成了用于全局节奏的状态空间模型(SSM)和用于局部形态的CNN。该框架采用了两个关键的创新:通过建模疾病共病和互斥性来增强临床一致性的关联丢失,以及用于稳健融合患者元数据的自适应交叉门控模块。在几个主流公共基准上进行的大量实验表明,ELOGOnet通过在8个多标签任务中实现平均63.8%的Macro-F1,并始终优于16个竞争基准,从而建立了新的性能基准,从而为心电图自动心脏诊断设定了新的性能基准。
{"title":"ELOGOnet: Knowledge-enhanced local-global learning for cardiac diagnosis","authors":"Yizhuo Feng ,&nbsp;Beibei Wang ,&nbsp;Zirui Wang ,&nbsp;Ke Jiang ,&nbsp;Peng Wang ,&nbsp;Lidong Du ,&nbsp;Xianxiang Chen ,&nbsp;Pang Wu ,&nbsp;Zhenfeng Li ,&nbsp;Junxian Song ,&nbsp;Libin Jiang ,&nbsp;Zhen Fang","doi":"10.1016/j.inffus.2026.104168","DOIUrl":"10.1016/j.inffus.2026.104168","url":null,"abstract":"<div><div>The diagnostic process of a human cardiologist is a holistic act of reasoning that seamlessly integrates two key components: (1) a synergistic analysis of the ECG signal itself, combining insights from both global rhythmic patterns and local morphologies; and (2) a prior-informed interpretation process that leverages internalized medical priors and external patient-specific information. However, existing deep learning models struggle to emulate this complex expert reasoning, often facing a dual dilemma: a failure to synergize local and global features within a unified framework, and a widespread neglect of valuable, low-cost prior knowledge sources like disease associations and patient metadata. To bridge this gap, we propose ELOGOnet, a novel deep learning framework designed to model the expert diagnostic workflow. Modeling the expert’s synergistic signal analysis, ELOGOnet employs a parallel hybrid architecture that integrates a State Space Model (SSM) for global rhythms and a CNN for local morphologies. Enabling a prior-informed interpretation, the framework incorporates two key innovations: an association loss that enhances clinical coherence by modeling disease comorbidity and mutual exclusivity, and an adaptive cross-gating module for the robust fusion of patient metadata. Extensive experiments on several mainstream public benchmarks demonstrate that ELOGOnet establishes a new state-of-the-art by achieving an average Macro-F1 of 63.8% across 8 multi-label tasks and consistently outperforming 16 competitive baselines, thereby setting a new performance benchmark for automated cardiac diagnosis from ECG.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104168"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GeoCraft: A Diffusion Model-based 3D Reconstruction Method driven by image and point cloud fusion georaft:一种基于扩散模型的图像和点云融合驱动的三维重建方法
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-13 DOI: 10.1016/j.inffus.2026.104149
Weixuan Ma , Yamin Li , Chujin Liu , Hao Zhang , Jie Li , Kansong Chen , Weixuan Gao
With the rapid development of technologies like virtual reality (VR), autonomous driving, and digital twins, the demand for high-precision and realistic multimodal 3D reconstruction has surged. This technology has become a core research focus in computer vision and graphics due to its ability to integrate multi-source data, such as 2D images and point clouds. However, existing methods face challenges such as geometric inconsistency in single-view reconstruction, poor point cloud-to-mesh conversion, and insufficient multimodal feature fusion, limiting their practical application. To address these issues, this paper proposes GeoCraft, a multimodal 3D reconstruction method that generates high-precision 3D models from 2D images through three collaborative stages: Diff2DPoint, Point2DMesh, and Vision3DGen. Specifically, Diff2DPoint generates an initial point cloud with geometric alignment using a diffusion model and projection feature fusion; Point2DMesh converts the point cloud into a high-quality mesh using an autoregressive decoder-only Transformer and Direct Preference Optimization (DPO); Vision3DGen creates high-fidelity 3D objects through multimodal feature alignment. Experiments on the Google Scanned Objects (GSO) and Pix3D datasets show that GeoCraft excels in key metrics. On the GSO dataset, its CMMD is 2.810 and FIDCLIP is 26.420; on Pix3D, CMMD is 3.020 and FIDCLIP is 27.030. GeoCraft significantly outperforms existing 3D reconstruction methods and also demonstrates advantages in computational efficiency, effectively solving key challenges in 3D reconstruction.The code is available at https://github.com/weixuanma/GeoCraft.
随着虚拟现实(VR)、自动驾驶、数字孪生等技术的快速发展,对高精度、逼真的多模态3D重建的需求激增。该技术由于能够集成二维图像和点云等多源数据,已成为计算机视觉和图形学领域的核心研究热点。然而,现有方法存在单视图重构几何不一致、点云-网格转换差、多模态特征融合不足等问题,限制了其实际应用。为了解决这些问题,本文提出了GeoCraft,这是一种多模态3D重建方法,通过三个协作阶段:Diff2DPoint, Point2DMesh和Vision3DGen,从2D图像生成高精度3D模型。具体来说,Diff2DPoint使用扩散模型和投影特征融合生成具有几何对齐的初始点云;Point2DMesh使用自回归解码器转换器和直接偏好优化(DPO)将点云转换成高质量的网格;Vision3DGen通过多模态特征对齐创建高保真3D对象。在谷歌扫描目标(GSO)和Pix3D数据集上的实验表明,GeoCraft在关键指标上表现优异。在GSO数据集上,其CMMD为2.810,FIDCLIP为26.420;在Pix3D上,CMMD为3.020,FIDCLIP为27.030。GeoCraft大大优于现有的三维重建方法,并且在计算效率方面也显示出优势,有效地解决了三维重建中的关键挑战。代码可在https://github.com/weixuanma/GeoCraft上获得。
{"title":"GeoCraft: A Diffusion Model-based 3D Reconstruction Method driven by image and point cloud fusion","authors":"Weixuan Ma ,&nbsp;Yamin Li ,&nbsp;Chujin Liu ,&nbsp;Hao Zhang ,&nbsp;Jie Li ,&nbsp;Kansong Chen ,&nbsp;Weixuan Gao","doi":"10.1016/j.inffus.2026.104149","DOIUrl":"10.1016/j.inffus.2026.104149","url":null,"abstract":"<div><div>With the rapid development of technologies like virtual reality (VR), autonomous driving, and digital twins, the demand for high-precision and realistic multimodal 3D reconstruction has surged. This technology has become a core research focus in computer vision and graphics due to its ability to integrate multi-source data, such as 2D images and point clouds. However, existing methods face challenges such as geometric inconsistency in single-view reconstruction, poor point cloud-to-mesh conversion, and insufficient multimodal feature fusion, limiting their practical application. To address these issues, this paper proposes GeoCraft, a multimodal 3D reconstruction method that generates high-precision 3D models from 2D images through three collaborative stages: Diff2DPoint, Point2DMesh, and Vision3DGen. Specifically, Diff2DPoint generates an initial point cloud with geometric alignment using a diffusion model and projection feature fusion; Point2DMesh converts the point cloud into a high-quality mesh using an autoregressive decoder-only Transformer and Direct Preference Optimization (DPO); Vision3DGen creates high-fidelity 3D objects through multimodal feature alignment. Experiments on the Google Scanned Objects (GSO) and Pix3D datasets show that GeoCraft excels in key metrics. On the GSO dataset, its CMMD is 2.810 and FID<sub>CLIP</sub> is 26.420; on Pix3D, CMMD is 3.020 and FID<sub>CLIP</sub> is 27.030. GeoCraft significantly outperforms existing 3D reconstruction methods and also demonstrates advantages in computational efficiency, effectively solving key challenges in 3D reconstruction.The code is available at <span><span>https://github.com/weixuanma/GeoCraft</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104149"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PGSC: A gradient sparsification communication optimization criterion for nonequilibrium thermodynamics 非平衡态热力学的梯度稀疏化通信优化准则
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-27 DOI: 10.1016/j.inffus.2026.104188
Wenlong Zhang , Ying Li , Hanhan Du , Yan Wei , Aiqing Fang
Gradient compression can reduce communication overhead. However, current static sparsity techniques may disturb gradient dynamics, resulting in unstable model convergence and reduced feature discriminative ability, whereas transmitting the complete gradient leads to high costs. To address this issue, inspired by nonequilibrium thermodynamics, this paper proposes a Physics-guided Gradient Sparsification Criterion (PGSC). Specifically, we formulate a continuous field equation based on the gradient magnitude distribution, deriving an adaptive decay rule for the sparsification threshold during the training phase. We then dynamically adjust the sparsification threshold according to this rule, effectively addressing the complexity of multimodal features and ensuring consistent information transmission. Our method achieves adaptive co-optimization of gradient compression and model accuracy by establishing a dynamic equilibrium mechanism between gradient dissipation and information entropy. This approach ensures stable convergence rates while preserving the gradient structure of multi-scale features. Extensive experiments on public datasets, including CIFAR-10, MNIST, and FLIR_ADAS_v2, demonstrate significant advantages over competitors such as TopK and quantization compression, while also reducing communication costs.
梯度压缩可以减少通信开销。然而,现有的静态稀疏性技术可能会干扰梯度动态,导致模型收敛不稳定,特征判别能力下降,而传输完整梯度的成本较高。为了解决这个问题,受非平衡态热力学的启发,本文提出了一个物理引导的梯度稀疏化准则(PGSC)。具体来说,我们基于梯度幅度分布建立了一个连续场方程,推导了训练阶段稀疏化阈值的自适应衰减规则。然后根据该规则动态调整稀疏化阈值,有效地解决了多模态特征的复杂性,保证了信息传输的一致性。该方法通过建立梯度耗散与信息熵之间的动态平衡机制,实现梯度压缩与模型精度的自适应协同优化。该方法在保持多尺度特征梯度结构的同时保证了稳定的收敛速度。在公共数据集(包括CIFAR-10、MNIST和FLIR_ADAS_v2)上进行的大量实验表明,与TopK和量化压缩等竞争对手相比,该算法具有显著优势,同时还降低了通信成本。
{"title":"PGSC: A gradient sparsification communication optimization criterion for nonequilibrium thermodynamics","authors":"Wenlong Zhang ,&nbsp;Ying Li ,&nbsp;Hanhan Du ,&nbsp;Yan Wei ,&nbsp;Aiqing Fang","doi":"10.1016/j.inffus.2026.104188","DOIUrl":"10.1016/j.inffus.2026.104188","url":null,"abstract":"<div><div>Gradient compression can reduce communication overhead. However, current static sparsity techniques may disturb gradient dynamics, resulting in unstable model convergence and reduced feature discriminative ability, whereas transmitting the complete gradient leads to high costs. To address this issue, inspired by nonequilibrium thermodynamics, this paper proposes a Physics-guided Gradient Sparsification Criterion (PGSC). Specifically, we formulate a continuous field equation based on the gradient magnitude distribution, deriving an adaptive decay rule for the sparsification threshold during the training phase. We then dynamically adjust the sparsification threshold according to this rule, effectively addressing the complexity of multimodal features and ensuring consistent information transmission. Our method achieves adaptive co-optimization of gradient compression and model accuracy by establishing a dynamic equilibrium mechanism between gradient dissipation and information entropy. This approach ensures stable convergence rates while preserving the gradient structure of multi-scale features. Extensive experiments on public datasets, including CIFAR-10, MNIST, and FLIR_ADAS_v2, demonstrate significant advantages over competitors such as TopK and quantization compression, while also reducing communication costs.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104188"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey of multimodal fusion for Alzheimer’s disease prediction: A new taxonomy and trends 阿尔茨海默病预测的多模态融合研究:一种新的分类和趋势
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2025-12-26 DOI: 10.1016/j.inffus.2025.104098
Yifan Guan , Wei Wang , Jianjun Chen , Po Yang , Jingzhou Xu , Jun Qi
Alzheimer’s disease (AD) is a neurodegenerative disease, well-known for its incurability, and is common among the elderly population worldwide. Previous studies have demonstrated that early intervention positively influences disease progression, leading to increased research into pathological analysis and disease trajectory prediction through machine learning (ML) methods. Given the similarities across different neurodegenerative disorders, a diagnosis relying solely upon a single modality of data is inadequate. Consequently, current research predominantly focuses on multimodal analysis, integrating medical imaging and clinical patient information, with continuous identification of new data types potentially aiding AD diagnosis. Multimodal approaches have been explored extensively over the past two decades, with significant advances observed following the introduction of Deep Learning (DL) techniques. Deep neural networks can adaptively extract and fuse features directly from input data, significantly broadening the scope of multimodal analysis. However, earlier classification studies have primarily concentrated on traditional ML, often neglecting the rapid advancements in DL networks. This article provides a comprehensive description of the acquisition pathways based on modalities, discusses the modalities currently used for research in neuroimaging, human body fluids, and other relevant sources. Additionally, it classifies fusion methodologies utilised in both DL and traditional ML contexts, highlights existing challenges, and outlines potential directions for future research.
阿尔茨海默病(AD)是一种神经退行性疾病,以其不可治愈性而闻名,在世界范围内的老年人群中很常见。先前的研究表明,早期干预对疾病进展有积极影响,从而增加了通过机器学习(ML)方法进行病理分析和疾病轨迹预测的研究。鉴于不同神经退行性疾病之间的相似性,仅依靠单一数据模式的诊断是不充分的。因此,目前的研究主要集中在多模态分析,整合医学成像和临床患者信息,不断识别可能有助于AD诊断的新数据类型。在过去的二十年里,人们对多模态方法进行了广泛的探索,在引入深度学习(DL)技术后取得了重大进展。深度神经网络可以直接从输入数据中自适应提取和融合特征,大大拓宽了多模态分析的范围。然而,早期的分类研究主要集中在传统的机器学习上,往往忽视了深度学习网络的快速发展。本文提供了基于模态的获取途径的全面描述,讨论了目前用于神经成像、人体体液和其他相关来源研究的模态。此外,它还对深度学习和传统机器学习环境中使用的融合方法进行了分类,强调了现有的挑战,并概述了未来研究的潜在方向。
{"title":"A survey of multimodal fusion for Alzheimer’s disease prediction: A new taxonomy and trends","authors":"Yifan Guan ,&nbsp;Wei Wang ,&nbsp;Jianjun Chen ,&nbsp;Po Yang ,&nbsp;Jingzhou Xu ,&nbsp;Jun Qi","doi":"10.1016/j.inffus.2025.104098","DOIUrl":"10.1016/j.inffus.2025.104098","url":null,"abstract":"<div><div>Alzheimer’s disease (AD) is a neurodegenerative disease, well-known for its incurability, and is common among the elderly population worldwide. Previous studies have demonstrated that early intervention positively influences disease progression, leading to increased research into pathological analysis and disease trajectory prediction through machine learning (ML) methods. Given the similarities across different neurodegenerative disorders, a diagnosis relying solely upon a single modality of data is inadequate. Consequently, current research predominantly focuses on multimodal analysis, integrating medical imaging and clinical patient information, with continuous identification of new data types potentially aiding AD diagnosis. Multimodal approaches have been explored extensively over the past two decades, with significant advances observed following the introduction of Deep Learning (DL) techniques. Deep neural networks can adaptively extract and fuse features directly from input data, significantly broadening the scope of multimodal analysis. However, earlier classification studies have primarily concentrated on traditional ML, often neglecting the rapid advancements in DL networks. This article provides a comprehensive description of the acquisition pathways based on modalities, discusses the modalities currently used for research in neuroimaging, human body fluids, and other relevant sources. Additionally, it classifies fusion methodologies utilised in both DL and traditional ML contexts, highlights existing challenges, and outlines potential directions for future research.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104098"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GIAFormer: A Gradient-Infused Attention and Transformer for Pain Assessment with EDA-fNIRS Fusion GIAFormer:一种梯度注入的注意力和转换器,用于EDA-fNIRS融合的疼痛评估
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-23 DOI: 10.1016/j.inffus.2026.104173
Muhammad Umar Khan , Girija Chetty , Stefanos Gkikas , Manolis Tsiknakis , Roland Goecke , Raul Fernandez-Rojas
Reliable pain assessment is crucial in clinical practice, yet it remains a challenge because self-report-based assessment is inherently subjective. In this work, we introduce GIAFormer, a deep learning framework designed to provide an objective measure of multilevel pain by jointly analysing Electrodermal Activity (EDA) and functional Near-Infrared Spectroscopy (fNIRS) signals. By combining the complementary information from autonomic and cortical responses, the proposed model aims to capture both physiological and neural aspects of pain. GIAFormer integrates a Gradient-Infused Attention (GIA) module with a Transformer. The GIA module enhances signal representation by fusing the physiological signals with their temporal gradients and applying spatial attention to highlight inter-channel dependencies. The Transformer component follows, enabling the model to learn long-range temporal relationships. The framework was evaluated on the AI4Pain dataset comprising 65 subjects using a leave-one-subject-out validation protocol. GIAFormer achieved an accuracy of 90.51% and outperformed recent state-of-the-art approaches. These findings highlight the potential of gradient-aware attention and multimodal fusion for interpretable, non-invasive, and generalisable pain assessment suitable for clinical and real-world applications.
可靠的疼痛评估在临床实践中至关重要,但它仍然是一个挑战,因为基于自我报告的评估本质上是主观的。在这项工作中,我们介绍了GIAFormer,这是一个深度学习框架,旨在通过联合分析皮肤电活动(EDA)和功能近红外光谱(fNIRS)信号,提供多层次疼痛的客观测量。通过结合来自自主神经和皮层反应的互补信息,提出的模型旨在捕捉疼痛的生理和神经方面。GIAFormer集成了一个梯度注入注意力(GIA)模块和一个变压器。GIA模块通过融合生理信号及其时间梯度和应用空间注意来突出通道间依赖性来增强信号表示。接下来是Transformer组件,使模型能够学习长期时间关系。该框架在AI4Pain数据集上进行评估,该数据集包括65个受试者,使用留一个受试者验证协议。GIAFormer实现了90.51%的准确率,优于最近的最先进的方法。这些发现强调了梯度感知注意力和多模态融合在可解释、无创、可推广的疼痛评估中的潜力,适用于临床和现实世界的应用。
{"title":"GIAFormer: A Gradient-Infused Attention and Transformer for Pain Assessment with EDA-fNIRS Fusion","authors":"Muhammad Umar Khan ,&nbsp;Girija Chetty ,&nbsp;Stefanos Gkikas ,&nbsp;Manolis Tsiknakis ,&nbsp;Roland Goecke ,&nbsp;Raul Fernandez-Rojas","doi":"10.1016/j.inffus.2026.104173","DOIUrl":"10.1016/j.inffus.2026.104173","url":null,"abstract":"<div><div>Reliable pain assessment is crucial in clinical practice, yet it remains a challenge because self-report-based assessment is inherently subjective. In this work, we introduce GIAFormer, a deep learning framework designed to provide an objective measure of multilevel pain by jointly analysing Electrodermal Activity (EDA) and functional Near-Infrared Spectroscopy (fNIRS) signals. By combining the complementary information from autonomic and cortical responses, the proposed model aims to capture both physiological and neural aspects of pain. GIAFormer integrates a Gradient-Infused Attention (GIA) module with a Transformer. The GIA module enhances signal representation by fusing the physiological signals with their temporal gradients and applying spatial attention to highlight inter-channel dependencies. The Transformer component follows, enabling the model to learn long-range temporal relationships. The framework was evaluated on the AI4Pain dataset comprising 65 subjects using a leave-one-subject-out validation protocol. GIAFormer achieved an accuracy of 90.51% and outperformed recent state-of-the-art approaches. These findings highlight the potential of gradient-aware attention and multimodal fusion for interpretable, non-invasive, and generalisable pain assessment suitable for clinical and real-world applications.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104173"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MulMoSenT: Multimodal sentiment analysis for a low-resource language using textual-visual cross-attention and fusion 基于文本-视觉交叉注意和融合的低资源语言多模态情感分析
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-15 DOI: 10.1016/j.inffus.2026.104129
Sadia Afroze , Md. Rajib Hossain , Mohammed Moshiul Hoque , Nazmul Siddique
The widespread availability of the Internet and the growing use of smart devices have fueled the rapid expansion of multimodal (image-text) sentiment analysis (MSA), a burgeoning research field. This growth is driven by the massive volume of image-text data generated by these technologies. However, MSA faces significant challenges, notably the misalignment between images and text, where an image may carry multiple interpretations or contradict its paired text. In addition, short textual content often lacks sufficient context, complicating sentiment prediction. These issues are particularly acute in low-resource languages, where annotated image-text corpora are scarce, and Vision-Language Models (VLMs) and Large Language Models (LLMs) exhibit limited performance. This research introduces MulMoSenT, a multimodal image-text sentiment analysis system tailored to tackle these challenges for low-resource languages. The development of MulMoSenT unfolds across four key phases: corpus development, baseline model evaluation and selection, hyperparameter adaptation, and model fine-tuning and inference. The proposed MulMoSenT model achieves a peak accuracy of 84.90%, surpassing all baseline models. Delivers a 37. 83% improvement over VLMs, a 35.28% gain over image-only models, and a 0.71% enhancement over text-only models. Both the dataset and the solution are publicly accessible at: https://github.com/sadia-afroze/MulMoSenT.
互联网的广泛使用和智能设备的日益普及推动了多模态(图像-文本)情感分析(MSA)这一新兴研究领域的迅速发展。这种增长是由这些技术产生的大量图像-文本数据驱动的。然而,MSA面临着重大挑战,特别是图像和文本之间的不对齐,其中图像可能包含多种解释或与其配对的文本相矛盾。此外,短文本内容往往缺乏足够的上下文,使情感预测复杂化。这些问题在低资源语言中尤其严重,其中注释的图像文本语料库很少,并且视觉语言模型(vlm)和大型语言模型(llm)表现出有限的性能。本研究介绍了MulMoSenT,这是一个多模态图像-文本情感分析系统,专门针对低资源语言解决这些挑战。MulMoSenT的开发分为四个关键阶段:语料库开发、基线模型评估和选择、超参数适应以及模型微调和推理。所提出的MulMoSenT模型达到了84.90%的峰值精度,超过了所有基线模型。输出37。比vlm提高83%,比纯图像模型提高35.28%,比纯文本模型提高0.71%。数据集和解决方案都可以公开访问:https://github.com/sadia-afroze/MulMoSenT。
{"title":"MulMoSenT: Multimodal sentiment analysis for a low-resource language using textual-visual cross-attention and fusion","authors":"Sadia Afroze ,&nbsp;Md. Rajib Hossain ,&nbsp;Mohammed Moshiul Hoque ,&nbsp;Nazmul Siddique","doi":"10.1016/j.inffus.2026.104129","DOIUrl":"10.1016/j.inffus.2026.104129","url":null,"abstract":"<div><div>The widespread availability of the Internet and the growing use of smart devices have fueled the rapid expansion of multimodal (image-text) sentiment analysis (MSA), a burgeoning research field. This growth is driven by the massive volume of image-text data generated by these technologies. However, MSA faces significant challenges, notably the misalignment between images and text, where an image may carry multiple interpretations or contradict its paired text. In addition, short textual content often lacks sufficient context, complicating sentiment prediction. These issues are particularly acute in low-resource languages, where annotated image-text corpora are scarce, and Vision-Language Models (VLMs) and Large Language Models (LLMs) exhibit limited performance. This research introduces <strong>MulMoSenT</strong>, a multimodal image-text sentiment analysis system tailored to tackle these challenges for low-resource languages. The development of <strong>MulMoSenT</strong> unfolds across four key phases: corpus development, baseline model evaluation and selection, hyperparameter adaptation, and model fine-tuning and inference. The proposed <strong>MulMoSenT</strong> model achieves a peak accuracy of 84.90%, surpassing all baseline models. Delivers a 37. 83% improvement over VLMs, a 35.28% gain over image-only models, and a 0.71% enhancement over text-only models. Both the dataset and the solution are publicly accessible at: <span><span>https://github.com/sadia-afroze/MulMoSenT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104129"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ExInCOACH: Strategic exploration meets interactive tutoring for context-aware game onboarding ExInCOACH:策略探索与情境感知游戏的互动辅导
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-14 DOI: 10.1016/j.inffus.2026.104151
Rui Hua , Zhaoyu Huang , Jinhao Lu , Yakun Li , Na Zhao
Traditional game tutorials often fail to deliver real-time contextual guidance, providing static instructions disconnected from dynamic gameplay states. This limitation stems from their inability to interpret evolving game environments and generate high-quality decisions during live player interactions. We present ExInCOACH, a hybrid framework that synergizes exploratory reinforcement learning (RL) with interactive large language models (LLMs) to enable state-aware adaptive tutoring. Our framework first employs deep RL to discover strategic patterns via self-play, constructing a Q-function. During player onboarding, LLMs map the Q-values of currently legal actions and their usage conditions into natural language rule explanations and strategic advice by analyzing live game states and player decisions.
Evaluations in Dou Di Zhu (a turn-based card game) reveal that learners using ExInCOACH experienced intuitive strategy internalization-all participants reported grasping advanced tactics faster than through rule-based tutorials, while most players highly valued the real-time contextual feedback. A comparative study demonstrated that players trained with ExInCOACH achieved a 70% win rate (14 wins/20 games) against those onboarded via traditional methods, as they benefited from adaptive guidance that evolved with their skill progression. To further validate the framework’s generalizability, evaluations were also conducted in StarCraft II, a high-complexity real-time strategy (RTS) game. In 2v2 cooperative battles, teams trained with ExInCOACH achieved a 66.7% win rate against teams assisted by Vision LLMs (VLLMs) and an impressive 100% win rate against teams relying on traditional static game wikis for learning. Cognitive load assessments indicated that ExInCOACH significantly reduced players- mental burden and frustration in complex scenarios involving real-time decision-making and multi-unit collaboration, while also outperforming traditional methods in information absorption efficiency and tactical adaptability. This work proposes a game tutorial design paradigm based on RL model exploration & LLM rule interpretation, making AI-generated strategies accessible through natural language interaction tailored to individual learning contexts.
传统的游戏教程通常无法提供实时情境指导,提供与动态游戏玩法状态脱节的静态指导。这种限制源于它们无法解释不断变化的游戏环境,无法在玩家互动过程中产生高质量的决策。我们提出了ExInCOACH,这是一个混合框架,它将探索性强化学习(RL)与交互式大型语言模型(llm)协同起来,以实现状态感知的自适应辅导。我们的框架首先采用深度强化学习,通过自我游戏来发现策略模式,构建一个q函数。在玩家入门阶段,法学硕士通过分析实时游戏状态和玩家决策,将当前法律行动的q值及其使用条件映射为自然语言规则解释和战略建议。对《豆地主》(一款回合制纸牌游戏)的评估显示,使用ExInCOACH的学习者体验到了直观的策略内化——所有参与者都表示,与使用基于规则的教程相比,他们掌握高级战术的速度更快,而大多数玩家都非常重视实时情境反馈。一项对比研究表明,使用ExInCOACH训练的玩家与使用传统方法训练的玩家相比,胜率达到70%(14胜/20场),因为他们受益于随着技能进步而发展的适应性指导。为了进一步验证该框架的普遍性,我们还在一款高复杂性即时战略游戏《星际争霸2》中进行了评估。在2v2的协同战斗中,使用ExInCOACH训练的队伍在对阵使用视觉llm (vllm)辅助的队伍时取得了66.7%的胜率,在对阵依靠传统静态游戏维基学习的队伍时取得了令人印象深刻的100%的胜率。认知负荷评估表明,ExInCOACH在涉及实时决策和多单位协作的复杂场景中显著降低了玩家的心理负担和挫败感,同时在信息吸收效率和战术适应性方面也优于传统方法。这项工作提出了一种基于强化学习模型探索和LLM规则解释的游戏教程设计范式,通过针对个人学习环境量身定制的自然语言交互,使人工智能生成的策略易于访问。
{"title":"ExInCOACH: Strategic exploration meets interactive tutoring for context-aware game onboarding","authors":"Rui Hua ,&nbsp;Zhaoyu Huang ,&nbsp;Jinhao Lu ,&nbsp;Yakun Li ,&nbsp;Na Zhao","doi":"10.1016/j.inffus.2026.104151","DOIUrl":"10.1016/j.inffus.2026.104151","url":null,"abstract":"<div><div>Traditional game tutorials often fail to deliver real-time contextual guidance, providing static instructions disconnected from dynamic gameplay states. This limitation stems from their inability to interpret evolving game environments and generate high-quality decisions during live player interactions. We present ExInCOACH, a hybrid framework that synergizes exploratory reinforcement learning (RL) with interactive large language models (LLMs) to enable state-aware adaptive tutoring. Our framework first employs deep RL to discover strategic patterns via self-play, constructing a Q-function. During player onboarding, LLMs map the Q-values of currently legal actions and their usage conditions into natural language rule explanations and strategic advice by analyzing live game states and player decisions.</div><div>Evaluations in Dou Di Zhu (a turn-based card game) reveal that learners using ExInCOACH experienced intuitive strategy internalization-all participants reported grasping advanced tactics faster than through rule-based tutorials, while most players highly valued the real-time contextual feedback. A comparative study demonstrated that players trained with ExInCOACH achieved a 70% win rate (14 wins/20 games) against those onboarded via traditional methods, as they benefited from adaptive guidance that evolved with their skill progression. To further validate the framework’s generalizability, evaluations were also conducted in StarCraft II, a high-complexity real-time strategy (RTS) game. In 2v2 cooperative battles, teams trained with ExInCOACH achieved a 66.7% win rate against teams assisted by Vision LLMs (VLLMs) and an impressive 100% win rate against teams relying on traditional static game wikis for learning. Cognitive load assessments indicated that ExInCOACH significantly reduced players- mental burden and frustration in complex scenarios involving real-time decision-making and multi-unit collaboration, while also outperforming traditional methods in information absorption efficiency and tactical adaptability. This work proposes a game tutorial design paradigm based on RL model exploration &amp; LLM rule interpretation, making AI-generated strategies accessible through natural language interaction tailored to individual learning contexts.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104151"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tokenized EEG signals with large language models for epilepsy detection via multimodal information fusion 基于多模态信息融合的大语言模型脑电信号标记化检测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-07 DOI: 10.1016/j.inffus.2026.104128
XingchiChen , Fushen Xie , Fa Zhu , Shuanglong Zhang , Xiaoyang Lu , Qing Li , Rong Chen , Dazhou Li , David Camacho
The detection of epileptic seizures using multi-sensor EEG signals is a challenging task due to the inherent complexity of the signals, the variability in sensor configurations, and the difficulty in distinguishing the weak inter-class difference. To address these challenges, we propose a novel multimodal information fusion framework that integrates a large language model (LLM) and a multimodal EEG feature tokenization method for enhanced epilepsy detection. This paper adopts a multimodal feature extraction (MFE) method to effectively generate multimodal feature representations from EEG signals and extract different feature representations of EEG signals from different signal domains. In addition, we design a multimodal EEG feature tokenization method to tokenize EEG signal features and fuse the semantic information, solving the problem of fusing epileptic EEG features with semantic information in prompt words. We use the powerful reasoning and pattern recognition capabilities of pre-trained LLMs to accurately and robustly detect epileptic events. The proposed method is evaluated on a public dataset. Extensive experimental results show that the proposed method outperforms the current comparative methods in multiple performance indicators.
由于信号固有的复杂性、传感器配置的可变性以及难以区分微弱的类间差异,利用多传感器脑电图信号检测癫痫发作是一项具有挑战性的任务。为了解决这些挑战,我们提出了一种新的多模态信息融合框架,该框架集成了大语言模型(LLM)和多模态EEG特征标记化方法,以增强癫痫检测。本文采用多模态特征提取(multimodal feature extraction, MFE)方法,有效地从脑电信号中生成多模态特征表示,并从不同的信号域中提取脑电信号的不同特征表示。此外,我们设计了一种多模态脑电信号特征标记方法,对脑电信号特征进行标记并融合语义信息,解决了癫痫脑电信号特征与提示词语义信息的融合问题。我们使用预训练llm的强大推理和模式识别能力来准确和稳健地检测癫痫事件。在一个公共数据集上对该方法进行了评估。大量的实验结果表明,该方法在多个性能指标上优于现有的比较方法。
{"title":"Tokenized EEG signals with large language models for epilepsy detection via multimodal information fusion","authors":"XingchiChen ,&nbsp;Fushen Xie ,&nbsp;Fa Zhu ,&nbsp;Shuanglong Zhang ,&nbsp;Xiaoyang Lu ,&nbsp;Qing Li ,&nbsp;Rong Chen ,&nbsp;Dazhou Li ,&nbsp;David Camacho","doi":"10.1016/j.inffus.2026.104128","DOIUrl":"10.1016/j.inffus.2026.104128","url":null,"abstract":"<div><div>The detection of epileptic seizures using multi-sensor EEG signals is a challenging task due to the inherent complexity of the signals, the variability in sensor configurations, and the difficulty in distinguishing the weak inter-class difference. To address these challenges, we propose a novel multimodal information fusion framework that integrates a large language model (LLM) and a multimodal EEG feature tokenization method for enhanced epilepsy detection. This paper adopts a multimodal feature extraction (MFE) method to effectively generate multimodal feature representations from EEG signals and extract different feature representations of EEG signals from different signal domains. In addition, we design a multimodal EEG feature tokenization method to tokenize EEG signal features and fuse the semantic information, solving the problem of fusing epileptic EEG features with semantic information in prompt words. We use the powerful reasoning and pattern recognition capabilities of pre-trained LLMs to accurately and robustly detect epileptic events. The proposed method is evaluated on a public dataset. Extensive experimental results show that the proposed method outperforms the current comparative methods in multiple performance indicators.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104128"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StegaFusion: Steganography for information hiding and fusion in multimodality 隐写融合:多模态信息隐藏与融合的隐写技术
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-22 DOI: 10.1016/j.inffus.2026.104150
Zihao Xu , Dawei Xu , Zihan Li , Juan Hu , Baokun Zheng , Chuan Zhang , Liehuang Zhu
Current generative steganography techniques have attracted considerable attention due to their security. However, different platforms and social environments exhibit varying preferred modalities, and existing generative steganography techniques are often restricted to a single modality. Inspired by advancements in inpainting techniques, we observe that the inpainting process is inherently generative. Moreover, cross-modal inpainting minimally perturbs unchanged regions and shares a consistent masking-and-fill procedure. Based on these insights, we introduce StegaFusion, a novel framework for unifying multimodal generative steganography. StegaFusion leverages shared generation seeds and conditional information, which enables the receiver to deterministically reconstruct the reference content. The receiver then performs differential analysis on the inpainting-generated stego content to extract the secret message. Compared to traditional unimodal methods, StegaFusion enhances controllability, security, compatibility, and interpretability without requiring additional model training. To the best of our knowledge, StegaFusion is the first framework to formalize and unify cross-modal generative steganography, offering wide applicability. Extensive qualitative and quantitative experiments demonstrate the superior performance of StegaFusion in terms of controllability, security, and cross-modal compatibility.
目前的生成隐写技术由于其安全性受到了广泛的关注。然而,不同的平台和社会环境表现出不同的首选模式,现有的生成隐写技术通常仅限于单一模式。受绘画技术进步的启发,我们观察到绘画过程本身就是生成的。此外,跨模态绘制最小程度地干扰未改变的区域,并共享一致的遮罩和填充过程。基于这些见解,我们介绍了StegaFusion,一个统一多模态生成隐写的新框架。StegaFusion利用共享的生成种子和条件信息,使接收者能够确定性地重建参考内容。然后,接收方对绘制生成的隐写内容执行差分分析以提取秘密消息。与传统的单模态方法相比,StegaFusion增强了可控性、安全性、兼容性和可解释性,而无需额外的模型训练。据我们所知,StegaFusion是第一个形式化和统一跨模态生成隐写的框架,具有广泛的适用性。大量的定性和定量实验证明了StegaFusion在可控性、安全性和跨模态兼容性方面的优越性能。
{"title":"StegaFusion: Steganography for information hiding and fusion in multimodality","authors":"Zihao Xu ,&nbsp;Dawei Xu ,&nbsp;Zihan Li ,&nbsp;Juan Hu ,&nbsp;Baokun Zheng ,&nbsp;Chuan Zhang ,&nbsp;Liehuang Zhu","doi":"10.1016/j.inffus.2026.104150","DOIUrl":"10.1016/j.inffus.2026.104150","url":null,"abstract":"<div><div>Current generative steganography techniques have attracted considerable attention due to their security. However, different platforms and social environments exhibit varying preferred modalities, and existing generative steganography techniques are often restricted to a single modality. Inspired by advancements in inpainting techniques, we observe that the inpainting process is inherently generative. Moreover, cross-modal inpainting minimally perturbs unchanged regions and shares a consistent masking-and-fill procedure. Based on these insights, we introduce StegaFusion, a novel framework for unifying multimodal generative steganography. StegaFusion leverages shared generation seeds and conditional information, which enables the receiver to deterministically reconstruct the reference content. The receiver then performs differential analysis on the inpainting-generated stego content to extract the secret message. Compared to traditional unimodal methods, StegaFusion enhances controllability, security, compatibility, and interpretability without requiring additional model training. To the best of our knowledge, StegaFusion is the first framework to formalize and unify cross-modal generative steganography, offering wide applicability. Extensive qualitative and quantitative experiments demonstrate the superior performance of StegaFusion in terms of controllability, security, and cross-modal compatibility.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104150"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EDGCN: An embedding-driven fusion framework for heterogeneity-aware motor imagery decoding 基于嵌入驱动的异构感知运动图像解码融合框架
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-22 DOI: 10.1016/j.inffus.2026.104170
Chaowen Shen, Yanwen Zhang, Zejing Zhao, Akio Namiki
Motor imagery electroencephalography (MI-EEG) captures neural activity associated with imagined motor tasks and has been widely applied in both basic neuroscience and clinical research. However, the intrinsic spatio-temporal heterogeneity of MI-EEG signals and pronounced inter-subject variability present major challenges for accurate decoding. Most existing deep learning methods rely on fixed architectures and shared parameters, which limits their ability to capture the complex, dynamic patterns driven by individual differences. To address these limitations, we propose an Embedding-Driven Graph Convolutional Network (EDGCN), which leverages a heterogeneity-aware spatio-temporal embedding fusion mechanism to adaptively generate graph convolutional kernel parameters from a shared embedding-driven parameter bank. Specifically, we design a Multi-Resolution Temporal Embedding (MRTE) strategy based on multi-resolution power spectral features and a Structure-Aware Spatial Embedding (SASE) mechanism that integrates both local and global connectivity structures. On this basis, we construct a heterogeneity-aware parameter generation mechanism based on Chebyshev graph convolution to effectively capture the spatiotemporal heterogeneity of EEG signals, with an orthogonality-constrained parameter space that enhances diversity and representational fusion. Experimental results demonstrate that the proposed model achieves superior classification accuracies of 86.50% and 90.14% on the BCIC-IV-2a and BCIC-IV-2b datasets, respectively, outperforming current state-of-the-art methods.
运动图像脑电图(MI-EEG)捕获与想象的运动任务相关的神经活动,已广泛应用于基础神经科学和临床研究。然而,MI-EEG信号固有的时空异质性和明显的主体间变异性给准确解码带来了重大挑战。大多数现有的深度学习方法依赖于固定的架构和共享参数,这限制了它们捕捉由个体差异驱动的复杂动态模式的能力。为了解决这些限制,我们提出了一种嵌入驱动的图卷积网络(EDGCN),它利用异构感知的时空嵌入融合机制从共享的嵌入驱动参数库中自适应地生成图卷积核参数。具体而言,我们设计了一种基于多分辨率功率谱特征的多分辨率时间嵌入(MRTE)策略和一种集成了局部和全局连接结构的结构感知空间嵌入(SASE)机制。在此基础上,构建了基于切比雪夫图卷积的异构感知参数生成机制,有效捕获脑电信号的时空异质性,正交性约束的参数空间增强了多样性和表征融合。实验结果表明,该模型在bbic - iv -2a和bbic - iv -2b数据集上的分类准确率分别达到了86.50%和90.14%,优于目前最先进的方法。
{"title":"EDGCN: An embedding-driven fusion framework for heterogeneity-aware motor imagery decoding","authors":"Chaowen Shen,&nbsp;Yanwen Zhang,&nbsp;Zejing Zhao,&nbsp;Akio Namiki","doi":"10.1016/j.inffus.2026.104170","DOIUrl":"10.1016/j.inffus.2026.104170","url":null,"abstract":"<div><div>Motor imagery electroencephalography (MI-EEG) captures neural activity associated with imagined motor tasks and has been widely applied in both basic neuroscience and clinical research. However, the intrinsic spatio-temporal heterogeneity of MI-EEG signals and pronounced inter-subject variability present major challenges for accurate decoding. Most existing deep learning methods rely on fixed architectures and shared parameters, which limits their ability to capture the complex, dynamic patterns driven by individual differences. To address these limitations, we propose an Embedding-Driven Graph Convolutional Network (EDGCN), which leverages a heterogeneity-aware spatio-temporal embedding fusion mechanism to adaptively generate graph convolutional kernel parameters from a shared embedding-driven parameter bank. Specifically, we design a Multi-Resolution Temporal Embedding (MRTE) strategy based on multi-resolution power spectral features and a Structure-Aware Spatial Embedding (SASE) mechanism that integrates both local and global connectivity structures. On this basis, we construct a heterogeneity-aware parameter generation mechanism based on Chebyshev graph convolution to effectively capture the spatiotemporal heterogeneity of EEG signals, with an orthogonality-constrained parameter space that enhances diversity and representational fusion. Experimental results demonstrate that the proposed model achieves superior classification accuracies of 86.50% and 90.14% on the BCIC-IV-2a and BCIC-IV-2b datasets, respectively, outperforming current state-of-the-art methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104170"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1