首页 > 最新文献

Information Fusion最新文献

英文 中文
Emotion Recognition Using Multimodal Physiological Signals Through Regional to Global Fusion with a Spatial-Temporal Semantic Alignment Mechanism 基于时空语义对齐机制的多模态生理信号区域到全局融合情感识别
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-12 DOI: 10.1016/j.inffus.2026.104224
Jian Shen, Huajian Liang, Ruirui Ma, Yihong Xu, Kexin Zhu, Haoran Gao, Kechen Hou, Yanan Zhang, Xiaowei Zhang, Bin Hu
With the continuous development of multimodal learning, emotion recognition using multimodal physiological signals has become a research hotspot. Studies have shown that combining electroencephalogram (EEG) signals and eye movements can significantly improve the results of emotion recognition. However, current research still faces the following challenges: (1) Individuals’ response times and durations to different emotions vary, leading to data diversity and variability. (2) Different modalities exhibit spatiotemporal discrepancies, which may result in varying semantic relevance and significance under the same spatiotemporal conditions. To address these challenges, we propose a Regional to Global Fusion Network with a Spatial-Temporal Semantic Alignment Mechanism (R2GFANet). Initially, R2GFANet addresses the first challenge by employing padding masks in conjunction with a 1D-CNN network to encode temporal semantic information from variable-length EEG signals and eye movements. Subsequently, R2GFANet leverages a Multi-Region Cross-Modal Attention mechanism for parallel temporal semantic alignment within each brain region and applies region-level spatial attention to highlight the semantic information of critical brain regions, effectively addressing spatiotemporal discrepancies across modalities. By comparing our method with numerous state-of-the-art approaches on two public datasets, SEED-IV and SEED-V, we demonstrate the outstanding performance and statistical significance of the proposed R2GFANet. Additionally, we conduct ablation studies and visualization analyses. The results indicate that aligning EEG signals with eye movements not only improves classification performance but also provides neuroscientific interpretability.
随着多模态学习的不断发展,利用多模态生理信号进行情绪识别已成为研究热点。研究表明,脑电图信号与眼动相结合可以显著提高情绪识别的结果。然而,目前的研究仍然面临以下挑战:(1)个体对不同情绪的反应时间和持续时间不同,导致数据的多样性和可变性。(2)不同的模态表现出时空差异,这可能导致在相同的时空条件下,语义相关性和意义存在差异。为了解决这些挑战,我们提出了一个具有时空语义对齐机制的区域到全球融合网络(R2GFANet)。最初,R2GFANet通过使用填充掩模与1D-CNN网络结合来编码来自变长EEG信号和眼动的时间语义信息,解决了第一个挑战。随后,R2GFANet利用多区域跨模态注意机制在每个大脑区域内进行平行时间语义对齐,并应用区域级空间注意来突出关键大脑区域的语义信息,有效地解决了模态间的时空差异。通过将我们的方法与两个公共数据集SEED-IV和SEED-V上的许多最新方法进行比较,我们证明了所提出的R2GFANet的卓越性能和统计显著性。此外,我们还进行消融研究和可视化分析。结果表明,将脑电信号与眼动相匹配不仅提高了分类性能,而且具有神经科学的可解释性。
{"title":"Emotion Recognition Using Multimodal Physiological Signals Through Regional to Global Fusion with a Spatial-Temporal Semantic Alignment Mechanism","authors":"Jian Shen, Huajian Liang, Ruirui Ma, Yihong Xu, Kexin Zhu, Haoran Gao, Kechen Hou, Yanan Zhang, Xiaowei Zhang, Bin Hu","doi":"10.1016/j.inffus.2026.104224","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104224","url":null,"abstract":"With the continuous development of multimodal learning, emotion recognition using multimodal physiological signals has become a research hotspot. Studies have shown that combining electroencephalogram (EEG) signals and eye movements can significantly improve the results of emotion recognition. However, current research still faces the following challenges: (1) Individuals’ response times and durations to different emotions vary, leading to data diversity and variability. (2) Different modalities exhibit spatiotemporal discrepancies, which may result in varying semantic relevance and significance under the same spatiotemporal conditions. To address these challenges, we propose a Regional to Global Fusion Network with a Spatial-Temporal Semantic Alignment Mechanism (R2GFANet). Initially, R2GFANet addresses the first challenge by employing padding masks in conjunction with a 1D-CNN network to encode temporal semantic information from variable-length EEG signals and eye movements. Subsequently, R2GFANet leverages a Multi-Region Cross-Modal Attention mechanism for parallel temporal semantic alignment within each brain region and applies region-level spatial attention to highlight the semantic information of critical brain regions, effectively addressing spatiotemporal discrepancies across modalities. By comparing our method with numerous state-of-the-art approaches on two public datasets, SEED-IV and SEED-V, we demonstrate the outstanding performance and statistical significance of the proposed R2GFANet. Additionally, we conduct ablation studies and visualization analyses. The results indicate that aligning EEG signals with eye movements not only improves classification performance but also provides neuroscientific interpretability.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"36 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146208863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PIA: Fusing Edge Prior Information into Attention for Semantic Segmentation in Vision Transformer PIA:边缘先验信息与注意力融合的视觉变换语义分割
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-11 DOI: 10.1016/j.inffus.2026.104222
Ruijie Xiao, Bo Yang, Qianyang Zhu
{"title":"PIA: Fusing Edge Prior Information into Attention for Semantic Segmentation in Vision Transformer","authors":"Ruijie Xiao, Bo Yang, Qianyang Zhu","doi":"10.1016/j.inffus.2026.104222","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104222","url":null,"abstract":"","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"30 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146160884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Dynamic Fusion Framework for Survival Prediction in Clear Cell Renal Cell Carcinoma 透明细胞肾细胞癌生存预测的多模态动态融合框架
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.inffus.2026.104217
Bangkang Fu, Wuchao Li, Junjie He, Zi Xu, Yunsong Peng, Zhen Liu, Yan Zhang, Pinhao Li, Ping Huang, Rongpin Wang
{"title":"Multimodal Dynamic Fusion Framework for Survival Prediction in Clear Cell Renal Cell Carcinoma","authors":"Bangkang Fu, Wuchao Li, Junjie He, Zi Xu, Yunsong Peng, Zhen Liu, Yan Zhang, Pinhao Li, Ping Huang, Rongpin Wang","doi":"10.1016/j.inffus.2026.104217","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104217","url":null,"abstract":"","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"43 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146152950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge Graph-Augmented Stacking for Accurate Bike-Sharing Demand Forecasting: The RideGraph Framework 基于知识图谱的共享单车需求预测:RideGraph框架
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.inffus.2026.104216
Debojyoti Ghosh, Rony Mitra, Adrijit Goswami
{"title":"Knowledge Graph-Augmented Stacking for Accurate Bike-Sharing Demand Forecasting: The RideGraph Framework","authors":"Debojyoti Ghosh, Rony Mitra, Adrijit Goswami","doi":"10.1016/j.inffus.2026.104216","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104216","url":null,"abstract":"","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"92 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146152952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
URL2Graph++: Unified Semantic-Structural-Character Learning for Malicious URL Detection URL2Graph++:用于恶意URL检测的统一语义-结构-字符学习
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.inffus.2026.104209
Ye Tian, Yifan Jia, Jianguo Sun, Yanbin Wang, Zhiquan Liu, Xiaowen Ling
{"title":"URL2Graph++: Unified Semantic-Structural-Character Learning for Malicious URL Detection","authors":"Ye Tian, Yifan Jia, Jianguo Sun, Yanbin Wang, Zhiquan Liu, Xiaowen Ling","doi":"10.1016/j.inffus.2026.104209","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104209","url":null,"abstract":"","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"9 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146152951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging cognition and emotion: Empathy-driven multimodal misinformation detection 桥梁认知和情感:共情驱动的多模态错误信息检测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-09 DOI: 10.1016/j.inffus.2026.104210
Lu Yuan , Zihan Wang , Zhengxuan Zhang , Lei Shi
In the digital era, social media accelerates the spread of misinformation. Existing detection methods often rely on shallow linguistic or propagation features and lack principled multimodal fusion, failing to capture creators’ emotional manipulation and readers’ psychological responses, which limits prediction accuracy. We propose the Dual-Aspect Empathy Framework (DAE), which derives creator and reader perspectives by fusing separately modeled cognitive and emotional empathy. Creators’ cognitive strategies and affective appeals are analyzed, while Large Language Models (LLMs) simulate readers’ judgments and emotional reactions, providing richer and more human-like signals than conventional classifiers, and partially alleviating the analytical challenge posed by insufficient human feedback. An empathy-aware filtering mechanism is further designed to refine outputs, enhancing authenticity and diversity. The pipeline integrates multimodal feature extraction, empathy-oriented representation learning, LLM-based reader simulation, and empathy-aware filtering. Experiments on benchmark datasets such as PolitiFact, GossipCop and Pheme show that the fusion-based DAE consistently outperforms state-of-the-art baselines, offering a novel and human-centric paradigm for misinformation detection.
在数字时代,社交媒体加速了错误信息的传播。现有的检测方法往往依赖于肤浅的语言或传播特征,缺乏原则性的多模态融合,无法捕捉创作者的情感操纵和读者的心理反应,从而限制了预测的准确性。我们提出了双重共情框架(DAE),该框架通过融合分别建模的认知共情和情感共情来衍生创造者和读者的视角。分析了创作者的认知策略和情感诉求,而大型语言模型(llm)模拟了读者的判断和情感反应,提供了比传统分类器更丰富、更像人类的信号,部分缓解了人类反馈不足带来的分析挑战。进一步设计了共情感知过滤机制,以细化输出,增强真实性和多样性。该管道集成了多模态特征提取、面向共情的表示学习、基于法学硕士的读者模拟和共情感知过滤。在PolitiFact、GossipCop和Pheme等基准数据集上的实验表明,基于融合的DAE始终优于最先进的基线,为错误信息检测提供了一种新颖的、以人为中心的范式。
{"title":"Bridging cognition and emotion: Empathy-driven multimodal misinformation detection","authors":"Lu Yuan ,&nbsp;Zihan Wang ,&nbsp;Zhengxuan Zhang ,&nbsp;Lei Shi","doi":"10.1016/j.inffus.2026.104210","DOIUrl":"10.1016/j.inffus.2026.104210","url":null,"abstract":"<div><div>In the digital era, social media accelerates the spread of misinformation. Existing detection methods often rely on shallow linguistic or propagation features and lack principled multimodal fusion, failing to capture creators’ emotional manipulation and readers’ psychological responses, which limits prediction accuracy. We propose the Dual-Aspect Empathy Framework (DAE), which derives creator and reader perspectives by fusing separately modeled cognitive and emotional empathy. Creators’ cognitive strategies and affective appeals are analyzed, while Large Language Models (LLMs) simulate readers’ judgments and emotional reactions, providing richer and more human-like signals than conventional classifiers, and partially alleviating the analytical challenge posed by insufficient human feedback. An empathy-aware filtering mechanism is further designed to refine outputs, enhancing authenticity and diversity. The pipeline integrates multimodal feature extraction, empathy-oriented representation learning, LLM-based reader simulation, and empathy-aware filtering. Experiments on benchmark datasets such as PolitiFact, GossipCop and Pheme show that the fusion-based DAE consistently outperforms state-of-the-art baselines, offering a novel and human-centric paradigm for misinformation detection.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104210"},"PeriodicalIF":15.5,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146146572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable visual question answering: A survey on methods, datasets and evaluation 可解释的可视化问答:方法、数据集和评估的调查
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-08 DOI: 10.1016/j.inffus.2026.104215
Yaxian Wang , Qikan Lin , Jiangbo Shi , Yisheng An , Jun Liu , Bifan Wei , Xudong Jiang
In recent years, visual question answering has become a significant task at the intersection of computer vision and natural language processing, requiring models to jointly understand images and textual queries. It has emerged as a popular benchmark for evaluating multimodal understanding and reasoning. With advancements in VQA accuracy, there is a growing demand for explainability and transparency for VQA models, which is crucial for improving their trust and applicability in critical domains. This survey explores the emerging field of eXplainable Visual Question Answering (XVQA), which aims not only to provide the correct answer but also to generate meaningful explanations that justify the predicted answers. Firstly, we systematically review existing methods on XVQA, and propose a three-level taxonomy to organize them. The proposed taxonomy primarily categorizes XVQA methods based on the timing of the rationale generation and the forms of the rationales. Secondly, we review the existing VQA datasets annotated with explanations in different forms, including textual, visual and multimodal rationales. Furthermore, we summarize the evaluation metrics of XVQA for different forms of rationales. Finally, we outline the challenges for XVQA and discuss potential future directions. We aim to organize existing research in this domain and inspire future investigations into the explainability of VQA models.
近年来,视觉问答已经成为计算机视觉和自然语言处理交叉领域的一项重要任务,需要模型共同理解图像和文本查询。它已成为评估多模态理解和推理的流行基准。随着VQA准确性的提高,对VQA模型的可解释性和透明度的需求日益增长,这对于提高它们在关键领域的信任和适用性至关重要。这项调查探索了可解释的视觉问题回答(XVQA)的新兴领域,其目的不仅是提供正确的答案,而且产生有意义的解释来证明预测的答案。首先,我们系统地回顾了现有的XVQA方法,并提出了一个三级分类法来组织它们。提出的分类法主要根据基本原理生成的时间和基本原理的形式对XVQA方法进行分类。其次,我们回顾了现有的VQA数据集,其中注释了不同形式的解释,包括文本解释、视觉解释和多模态解释。此外,我们总结了不同形式的理由的XVQA的评价指标。最后,我们概述了XVQA面临的挑战,并讨论了潜在的未来方向。我们的目标是组织这一领域的现有研究,并启发未来对VQA模型的可解释性的研究。
{"title":"Explainable visual question answering: A survey on methods, datasets and evaluation","authors":"Yaxian Wang ,&nbsp;Qikan Lin ,&nbsp;Jiangbo Shi ,&nbsp;Yisheng An ,&nbsp;Jun Liu ,&nbsp;Bifan Wei ,&nbsp;Xudong Jiang","doi":"10.1016/j.inffus.2026.104215","DOIUrl":"10.1016/j.inffus.2026.104215","url":null,"abstract":"<div><div>In recent years, visual question answering has become a significant task at the intersection of computer vision and natural language processing, requiring models to jointly understand images and textual queries. It has emerged as a popular benchmark for evaluating multimodal understanding and reasoning. With advancements in VQA accuracy, there is a growing demand for explainability and transparency for VQA models, which is crucial for improving their trust and applicability in critical domains. This survey explores the emerging field of e<strong>X</strong>plainable <strong>V</strong>isual <strong>Q</strong>uestion <strong>A</strong>nswering (XVQA), which aims not only to provide the correct answer but also to generate meaningful explanations that justify the predicted answers. Firstly, we systematically review existing methods on XVQA, and propose a three-level taxonomy to organize them. The proposed taxonomy primarily categorizes XVQA methods based on the timing of the rationale generation and the forms of the rationales. Secondly, we review the existing VQA datasets annotated with explanations in different forms, including textual, visual and multimodal rationales. Furthermore, we summarize the evaluation metrics of XVQA for different forms of rationales. Finally, we outline the challenges for XVQA and discuss potential future directions. We aim to organize existing research in this domain and inspire future investigations into the explainability of VQA models.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104215"},"PeriodicalIF":15.5,"publicationDate":"2026-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MuDeNet: A multi-patch descriptor network for anomaly modeling MuDeNet:用于异常建模的多补丁描述符网络
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-07 DOI: 10.1016/j.inffus.2026.104214
Miguel Campos-Romero , Manuel Carranza-García , Robert-Jan Sips , José C. Riquelme
Visual anomaly detection is a crucial task in industrial manufacturing, enabling early defect identification and minimizing production bottlenecks. Existing methods often struggle to effectively detect both structural anomalies, which appear as unexpected local patterns, and logical anomalies, which arise from violations of global contextual constraints. To address this challenge, we propose MuDeNet, an unsupervised Multi-patch Descriptor Network that performs multi-scale fusion of local structural features and global contextual information for comprehensive anomaly modeling. MuDeNet employs a lightweight teacher-student framework that jointly extracts and fuses local and global patch descriptors across multiple receptive fields within a single forward pass. Knowledge is first distilled from a pre-trained CNN to efficiently obtain semantic representations, which are then processed by two complementary modules: the structural module, targeting fine-grained defects at small receptive fields, and the logical module, modeling long-range contextual dependencies. Their outputs are fused at the decision level, yielding a unified anomaly score that integrates local and global evidence. Extensive experiments on three state-of-the-art datasets position MuDeNet as an efficient and scalable solution for real-time industrial anomaly detection and segmentation, consistently outperforming existing approaches.
视觉异常检测在工业制造中是一项至关重要的任务,它可以实现早期缺陷识别和最大限度地减少生产瓶颈。现有的方法常常难以有效地检测结构异常(表现为意想不到的局部模式)和逻辑异常(由于违反全局上下文约束而产生)。为了解决这一挑战,我们提出了MuDeNet,这是一种无监督的多补丁描述符网络,它执行局部结构特征和全局上下文信息的多尺度融合,用于综合异常建模。MuDeNet采用了一个轻量级的师生框架,可以在单个向前传递的多个接受域中共同提取和融合本地和全局补丁描述符。首先从预训练的CNN中提取知识以有效地获得语义表示,然后由两个互补模块进行处理:结构模块(针对小接受域的细粒度缺陷)和逻辑模块(建模远程上下文依赖性)。他们的输出在决策层面融合,产生统一的异常评分,整合了本地和全球证据。在三个最先进的数据集上进行的大量实验表明,MuDeNet是实时工业异常检测和分割的有效且可扩展的解决方案,始终优于现有方法。
{"title":"MuDeNet: A multi-patch descriptor network for anomaly modeling","authors":"Miguel Campos-Romero ,&nbsp;Manuel Carranza-García ,&nbsp;Robert-Jan Sips ,&nbsp;José C. Riquelme","doi":"10.1016/j.inffus.2026.104214","DOIUrl":"10.1016/j.inffus.2026.104214","url":null,"abstract":"<div><div>Visual anomaly detection is a crucial task in industrial manufacturing, enabling early defect identification and minimizing production bottlenecks. Existing methods often struggle to effectively detect both structural anomalies, which appear as unexpected local patterns, and logical anomalies, which arise from violations of global contextual constraints. To address this challenge, we propose MuDeNet, an unsupervised Multi-patch Descriptor Network that performs multi-scale fusion of local structural features and global contextual information for comprehensive anomaly modeling. MuDeNet employs a lightweight teacher-student framework that jointly extracts and fuses local and global patch descriptors across multiple receptive fields within a single forward pass. Knowledge is first distilled from a pre-trained CNN to efficiently obtain semantic representations, which are then processed by two complementary modules: the structural module, targeting fine-grained defects at small receptive fields, and the logical module, modeling long-range contextual dependencies. Their outputs are fused at the decision level, yielding a unified anomaly score that integrates local and global evidence. Extensive experiments on three state-of-the-art datasets position MuDeNet as an efficient and scalable solution for real-time industrial anomaly detection and segmentation, consistently outperforming existing approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104214"},"PeriodicalIF":15.5,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CMVF: Cross-Modal Unregistered Video Fusion via Spatio-Temporal Consistency CMVF:基于时空一致性的跨模态未注册视频融合
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-07 DOI: 10.1016/j.inffus.2026.104212
Jianfeng Ding, Hao Zhang, Zhongyuan Wang, Jinsheng Xiao, Xin Tian, Zhen Han, Jiayi Ma
{"title":"CMVF: Cross-Modal Unregistered Video Fusion via Spatio-Temporal Consistency","authors":"Jianfeng Ding, Hao Zhang, Zhongyuan Wang, Jinsheng Xiao, Xin Tian, Zhen Han, Jiayi Ma","doi":"10.1016/j.inffus.2026.104212","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104212","url":null,"abstract":"","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"91 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedFusionNet: Advancing oral cancer recurrence prediction through federated fusion modeling FedFusionNet:通过联邦融合建模推进口腔癌复发预测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-07 DOI: 10.1016/j.inffus.2026.104205
Al Rafi Aurnob , Sharia Arfin Tanim , Tahmid Enam Shrestha , M.F. Mridha , Durjoy Mistry
Oral cancer represents a considerable global medical problem that requires the development of new technologies that offer reliable advanced therapies. This study introduced FedFusionNet, a fusion-centric model that was meticulously developed to advance early oral cancer diagnosis while preserving data privacy. The primary objective was to develop a model using federated learning (FL) to train across diverse healthcare facilities globally without compromising patient data confidentiality. This model uses features from the ResNeXt101 32X8D and InceptionV3 models to implement a single-level fusion via feature concatenation. This helps to enhance the effectiveness and stability of the model. Specifically, the federated averaging (FedAvg) technique fosters collaborative model training across multiple hospitals while safeguarding sensitive patient information. This ensured that each participating hospital could contribute to the development of the model without sharing the raw data. The proposed model was trained on a dataset of 10,002 images that included both healthy and cancerous oral tissues. Rigorous training and evaluation were conducted for both Independent and Identically Distributed (IID) and Independent and Non-Identically Distributed (Non-IID) settings. FedFusionNet demonstrated superior performance compared with pre-trained and some custom models for oral cancer diagnosis. This scalable and secure framework has profound implications for healthcare analytics. It is a proof-of-concept demonstration that utilizes publicly available data to establish the technical feasibility of the FedFusionNet framework. Future deployment in actual collaborative environments would demonstrate its security-by-design capabilities across hospitals, where patient data confidentiality is a priority.
口腔癌是一个相当大的全球性医疗问题,需要开发提供可靠的先进治疗方法的新技术。本研究介绍了FedFusionNet,这是一种以融合为中心的模型,精心开发以推进早期口腔癌诊断,同时保护数据隐私。主要目标是开发一个使用联邦学习(FL)的模型,以便在不影响患者数据机密性的情况下跨全球不同医疗机构进行培训。该模型使用ResNeXt101 32X8D和InceptionV3模型的特征,通过特征连接实现单级融合。这有助于提高模型的有效性和稳定性。具体来说,联邦平均(fedag)技术促进了跨多家医院的协作模型培训,同时保护了敏感的患者信息。这确保了每个参与的医院都可以在不共享原始数据的情况下为模型的开发做出贡献。所提出的模型是在包括健康和癌变口腔组织的1002张图像的数据集上进行训练的。对独立与同分布(IID)和独立与非同分布(Non-IID)设置进行了严格的培训和评估。与预训练模型和一些自定义模型相比,FedFusionNet在口腔癌诊断方面表现出优越的性能。这种可扩展且安全的框架对医疗保健分析具有深远的影响。这是一个概念验证演示,利用公开可用的数据来建立FedFusionNet框架的技术可行性。未来在实际协作环境中的部署将展示其在医院中的设计安全能力,在医院中,患者数据保密性是一个优先事项。
{"title":"FedFusionNet: Advancing oral cancer recurrence prediction through federated fusion modeling","authors":"Al Rafi Aurnob ,&nbsp;Sharia Arfin Tanim ,&nbsp;Tahmid Enam Shrestha ,&nbsp;M.F. Mridha ,&nbsp;Durjoy Mistry","doi":"10.1016/j.inffus.2026.104205","DOIUrl":"10.1016/j.inffus.2026.104205","url":null,"abstract":"<div><div>Oral cancer represents a considerable global medical problem that requires the development of new technologies that offer reliable advanced therapies. This study introduced FedFusionNet, a fusion-centric model that was meticulously developed to advance early oral cancer diagnosis while preserving data privacy. The primary objective was to develop a model using federated learning (FL) to train across diverse healthcare facilities globally without compromising patient data confidentiality. This model uses features from the ResNeXt101 32X8D and InceptionV3 models to implement a single-level fusion via feature concatenation. This helps to enhance the effectiveness and stability of the model. Specifically, the federated averaging (FedAvg) technique fosters collaborative model training across multiple hospitals while safeguarding sensitive patient information. This ensured that each participating hospital could contribute to the development of the model without sharing the raw data. The proposed model was trained on a dataset of 10,002 images that included both healthy and cancerous oral tissues. Rigorous training and evaluation were conducted for both Independent and Identically Distributed (IID) and Independent and Non-Identically Distributed (Non-IID) settings. FedFusionNet demonstrated superior performance compared with pre-trained and some custom models for oral cancer diagnosis. This scalable and secure framework has profound implications for healthcare analytics. It is a proof-of-concept demonstration that utilizes publicly available data to establish the technical feasibility of the FedFusionNet framework. Future deployment in actual collaborative environments would demonstrate its security-by-design capabilities across hospitals, where patient data confidentiality is a priority.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104205"},"PeriodicalIF":15.5,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1