首页 > 最新文献

Information Fusion最新文献

英文 中文
Adversarial and generative AI-based anti-forensics in audio-visual deepfake detection: A comprehensive review and analysis 视听深度伪造检测中基于对抗和生成人工智能的反取证:综合回顾和分析
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-08-01 Epub Date: 2026-01-01 DOI: 10.1016/j.inffus.2025.104120
Qurat Ul Ain , Fatima Khalid , Hafsa Ilyas , Ali Javed , Khalid Mahmood Malik , Khan Muhammad , Aun Irtaza
As the technology behind deepfakes advances, detecting audio-visual deepfakes becomes more and more crucial, and the rise of traditional and generative AI-based adversarial/anti-forensics attacks and generative AI-based anti-forensics attacks on deepfake detection technologies is a growing concern. Securing applications against adversarial and generative AI-based attacks is critical for accurate and robust deepfake detection tools. Therefore, this paper provides a comprehensive overview of various adversarial and generative AI-based anti-forensic attacks, which represent one of the core elements of trustworthiness alongside transparency, explainability, and fairness, as well as defensive countermeasures for audio-visual deepfake generation and detection. It covers topics such as adversarial attacks on deepfake detection algorithms and defensive methods, including model fusion and decoy-based approaches, to mitigate these threats. Although extensive research has been conducted in recent years on adversarial attacks and defense on deepfake detection, there have been few attempts to compare existing work qualitatively and quantitatively. This paper aims to help identify and address key issues that need to be considered to bring transferable adversarial attacks and their countermeasures particularly through techniques such as generative defense, knowledge distillation, and beyond.
随着深度伪造背后技术的进步,检测视听深度伪造变得越来越重要,传统的和基于生成人工智能的对抗/反取证攻击以及基于生成人工智能的反取证攻击对深度伪造检测技术的兴起越来越受到关注。保护应用程序免受对抗性和基于生成人工智能的攻击对于准确和强大的深度伪造检测工具至关重要。因此,本文全面概述了各种基于对抗性和生成性人工智能的反取证攻击,这些攻击代表了可信度的核心要素之一,以及透明度、可解释性和公平性,以及用于视听深度伪造生成和检测的防御对策。它涵盖了诸如深度假检测算法和防御方法的对抗性攻击等主题,包括模型融合和基于诱饵的方法,以减轻这些威胁。尽管近年来对深度伪造检测的对抗性攻击和防御进行了广泛的研究,但很少有人尝试对现有工作进行定性和定量的比较。本文旨在帮助识别和解决需要考虑的关键问题,以带来可转移的对抗性攻击及其对策,特别是通过诸如生成防御,知识蒸馏等技术。
{"title":"Adversarial and generative AI-based anti-forensics in audio-visual deepfake detection: A comprehensive review and analysis","authors":"Qurat Ul Ain ,&nbsp;Fatima Khalid ,&nbsp;Hafsa Ilyas ,&nbsp;Ali Javed ,&nbsp;Khalid Mahmood Malik ,&nbsp;Khan Muhammad ,&nbsp;Aun Irtaza","doi":"10.1016/j.inffus.2025.104120","DOIUrl":"10.1016/j.inffus.2025.104120","url":null,"abstract":"<div><div>As the technology behind deepfakes advances, detecting audio-visual deepfakes becomes more and more crucial, and the rise of traditional and generative AI-based adversarial/anti-forensics attacks and generative AI-based anti-forensics attacks on deepfake detection technologies is a growing concern. Securing applications against adversarial and generative AI-based attacks is critical for accurate and robust deepfake detection tools. Therefore, this paper provides a comprehensive overview of various adversarial and generative AI-based anti-forensic attacks, which represent one of the core elements of trustworthiness alongside transparency, explainability, and fairness, as well as defensive countermeasures for audio-visual deepfake generation and detection. It covers topics such as adversarial attacks on deepfake detection algorithms and defensive methods, including model fusion and decoy-based approaches, to mitigate these threats. Although extensive research has been conducted in recent years on adversarial attacks and defense on deepfake detection, there have been few attempts to compare existing work qualitatively and quantitatively. This paper aims to help identify and address key issues that need to be considered to bring transferable adversarial attacks and their countermeasures particularly through techniques such as generative defense, knowledge distillation, and beyond.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104120"},"PeriodicalIF":15.5,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging cognition and emotion: Empathy-driven multimodal misinformation detection 桥梁认知和情感:共情驱动的多模态错误信息检测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-08-01 Epub Date: 2026-02-09 DOI: 10.1016/j.inffus.2026.104210
Lu Yuan , Zihan Wang , Zhengxuan Zhang , Lei Shi
In the digital era, social media accelerates the spread of misinformation. Existing detection methods often rely on shallow linguistic or propagation features and lack principled multimodal fusion, failing to capture creators’ emotional manipulation and readers’ psychological responses, which limits prediction accuracy. We propose the Dual-Aspect Empathy Framework (DAE), which derives creator and reader perspectives by fusing separately modeled cognitive and emotional empathy. Creators’ cognitive strategies and affective appeals are analyzed, while Large Language Models (LLMs) simulate readers’ judgments and emotional reactions, providing richer and more human-like signals than conventional classifiers, and partially alleviating the analytical challenge posed by insufficient human feedback. An empathy-aware filtering mechanism is further designed to refine outputs, enhancing authenticity and diversity. The pipeline integrates multimodal feature extraction, empathy-oriented representation learning, LLM-based reader simulation, and empathy-aware filtering. Experiments on benchmark datasets such as PolitiFact, GossipCop and Pheme show that the fusion-based DAE consistently outperforms state-of-the-art baselines, offering a novel and human-centric paradigm for misinformation detection.
在数字时代,社交媒体加速了错误信息的传播。现有的检测方法往往依赖于肤浅的语言或传播特征,缺乏原则性的多模态融合,无法捕捉创作者的情感操纵和读者的心理反应,从而限制了预测的准确性。我们提出了双重共情框架(DAE),该框架通过融合分别建模的认知共情和情感共情来衍生创造者和读者的视角。分析了创作者的认知策略和情感诉求,而大型语言模型(llm)模拟了读者的判断和情感反应,提供了比传统分类器更丰富、更像人类的信号,部分缓解了人类反馈不足带来的分析挑战。进一步设计了共情感知过滤机制,以细化输出,增强真实性和多样性。该管道集成了多模态特征提取、面向共情的表示学习、基于法学硕士的读者模拟和共情感知过滤。在PolitiFact、GossipCop和Pheme等基准数据集上的实验表明,基于融合的DAE始终优于最先进的基线,为错误信息检测提供了一种新颖的、以人为中心的范式。
{"title":"Bridging cognition and emotion: Empathy-driven multimodal misinformation detection","authors":"Lu Yuan ,&nbsp;Zihan Wang ,&nbsp;Zhengxuan Zhang ,&nbsp;Lei Shi","doi":"10.1016/j.inffus.2026.104210","DOIUrl":"10.1016/j.inffus.2026.104210","url":null,"abstract":"<div><div>In the digital era, social media accelerates the spread of misinformation. Existing detection methods often rely on shallow linguistic or propagation features and lack principled multimodal fusion, failing to capture creators’ emotional manipulation and readers’ psychological responses, which limits prediction accuracy. We propose the Dual-Aspect Empathy Framework (DAE), which derives creator and reader perspectives by fusing separately modeled cognitive and emotional empathy. Creators’ cognitive strategies and affective appeals are analyzed, while Large Language Models (LLMs) simulate readers’ judgments and emotional reactions, providing richer and more human-like signals than conventional classifiers, and partially alleviating the analytical challenge posed by insufficient human feedback. An empathy-aware filtering mechanism is further designed to refine outputs, enhancing authenticity and diversity. The pipeline integrates multimodal feature extraction, empathy-oriented representation learning, LLM-based reader simulation, and empathy-aware filtering. Experiments on benchmark datasets such as PolitiFact, GossipCop and Pheme show that the fusion-based DAE consistently outperforms state-of-the-art baselines, offering a novel and human-centric paradigm for misinformation detection.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104210"},"PeriodicalIF":15.5,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146146572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MuDeNet: A multi-patch descriptor network for anomaly modeling MuDeNet:用于异常建模的多补丁描述符网络
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-08-01 Epub Date: 2026-02-07 DOI: 10.1016/j.inffus.2026.104214
Miguel Campos-Romero , Manuel Carranza-García , Robert-Jan Sips , José C. Riquelme
Visual anomaly detection is a crucial task in industrial manufacturing, enabling early defect identification and minimizing production bottlenecks. Existing methods often struggle to effectively detect both structural anomalies, which appear as unexpected local patterns, and logical anomalies, which arise from violations of global contextual constraints. To address this challenge, we propose MuDeNet, an unsupervised Multi-patch Descriptor Network that performs multi-scale fusion of local structural features and global contextual information for comprehensive anomaly modeling. MuDeNet employs a lightweight teacher-student framework that jointly extracts and fuses local and global patch descriptors across multiple receptive fields within a single forward pass. Knowledge is first distilled from a pre-trained CNN to efficiently obtain semantic representations, which are then processed by two complementary modules: the structural module, targeting fine-grained defects at small receptive fields, and the logical module, modeling long-range contextual dependencies. Their outputs are fused at the decision level, yielding a unified anomaly score that integrates local and global evidence. Extensive experiments on three state-of-the-art datasets position MuDeNet as an efficient and scalable solution for real-time industrial anomaly detection and segmentation, consistently outperforming existing approaches.
视觉异常检测在工业制造中是一项至关重要的任务,它可以实现早期缺陷识别和最大限度地减少生产瓶颈。现有的方法常常难以有效地检测结构异常(表现为意想不到的局部模式)和逻辑异常(由于违反全局上下文约束而产生)。为了解决这一挑战,我们提出了MuDeNet,这是一种无监督的多补丁描述符网络,它执行局部结构特征和全局上下文信息的多尺度融合,用于综合异常建模。MuDeNet采用了一个轻量级的师生框架,可以在单个向前传递的多个接受域中共同提取和融合本地和全局补丁描述符。首先从预训练的CNN中提取知识以有效地获得语义表示,然后由两个互补模块进行处理:结构模块(针对小接受域的细粒度缺陷)和逻辑模块(建模远程上下文依赖性)。他们的输出在决策层面融合,产生统一的异常评分,整合了本地和全球证据。在三个最先进的数据集上进行的大量实验表明,MuDeNet是实时工业异常检测和分割的有效且可扩展的解决方案,始终优于现有方法。
{"title":"MuDeNet: A multi-patch descriptor network for anomaly modeling","authors":"Miguel Campos-Romero ,&nbsp;Manuel Carranza-García ,&nbsp;Robert-Jan Sips ,&nbsp;José C. Riquelme","doi":"10.1016/j.inffus.2026.104214","DOIUrl":"10.1016/j.inffus.2026.104214","url":null,"abstract":"<div><div>Visual anomaly detection is a crucial task in industrial manufacturing, enabling early defect identification and minimizing production bottlenecks. Existing methods often struggle to effectively detect both structural anomalies, which appear as unexpected local patterns, and logical anomalies, which arise from violations of global contextual constraints. To address this challenge, we propose MuDeNet, an unsupervised Multi-patch Descriptor Network that performs multi-scale fusion of local structural features and global contextual information for comprehensive anomaly modeling. MuDeNet employs a lightweight teacher-student framework that jointly extracts and fuses local and global patch descriptors across multiple receptive fields within a single forward pass. Knowledge is first distilled from a pre-trained CNN to efficiently obtain semantic representations, which are then processed by two complementary modules: the structural module, targeting fine-grained defects at small receptive fields, and the logical module, modeling long-range contextual dependencies. Their outputs are fused at the decision level, yielding a unified anomaly score that integrates local and global evidence. Extensive experiments on three state-of-the-art datasets position MuDeNet as an efficient and scalable solution for real-time industrial anomaly detection and segmentation, consistently outperforming existing approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104214"},"PeriodicalIF":15.5,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable visual question answering: A survey on methods, datasets and evaluation 可解释的可视化问答:方法、数据集和评估的调查
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-08-01 Epub Date: 2026-02-08 DOI: 10.1016/j.inffus.2026.104215
Yaxian Wang , Qikan Lin , Jiangbo Shi , Yisheng An , Jun Liu , Bifan Wei , Xudong Jiang
In recent years, visual question answering has become a significant task at the intersection of computer vision and natural language processing, requiring models to jointly understand images and textual queries. It has emerged as a popular benchmark for evaluating multimodal understanding and reasoning. With advancements in VQA accuracy, there is a growing demand for explainability and transparency for VQA models, which is crucial for improving their trust and applicability in critical domains. This survey explores the emerging field of eXplainable Visual Question Answering (XVQA), which aims not only to provide the correct answer but also to generate meaningful explanations that justify the predicted answers. Firstly, we systematically review existing methods on XVQA, and propose a three-level taxonomy to organize them. The proposed taxonomy primarily categorizes XVQA methods based on the timing of the rationale generation and the forms of the rationales. Secondly, we review the existing VQA datasets annotated with explanations in different forms, including textual, visual and multimodal rationales. Furthermore, we summarize the evaluation metrics of XVQA for different forms of rationales. Finally, we outline the challenges for XVQA and discuss potential future directions. We aim to organize existing research in this domain and inspire future investigations into the explainability of VQA models.
近年来,视觉问答已经成为计算机视觉和自然语言处理交叉领域的一项重要任务,需要模型共同理解图像和文本查询。它已成为评估多模态理解和推理的流行基准。随着VQA准确性的提高,对VQA模型的可解释性和透明度的需求日益增长,这对于提高它们在关键领域的信任和适用性至关重要。这项调查探索了可解释的视觉问题回答(XVQA)的新兴领域,其目的不仅是提供正确的答案,而且产生有意义的解释来证明预测的答案。首先,我们系统地回顾了现有的XVQA方法,并提出了一个三级分类法来组织它们。提出的分类法主要根据基本原理生成的时间和基本原理的形式对XVQA方法进行分类。其次,我们回顾了现有的VQA数据集,其中注释了不同形式的解释,包括文本解释、视觉解释和多模态解释。此外,我们总结了不同形式的理由的XVQA的评价指标。最后,我们概述了XVQA面临的挑战,并讨论了潜在的未来方向。我们的目标是组织这一领域的现有研究,并启发未来对VQA模型的可解释性的研究。
{"title":"Explainable visual question answering: A survey on methods, datasets and evaluation","authors":"Yaxian Wang ,&nbsp;Qikan Lin ,&nbsp;Jiangbo Shi ,&nbsp;Yisheng An ,&nbsp;Jun Liu ,&nbsp;Bifan Wei ,&nbsp;Xudong Jiang","doi":"10.1016/j.inffus.2026.104215","DOIUrl":"10.1016/j.inffus.2026.104215","url":null,"abstract":"<div><div>In recent years, visual question answering has become a significant task at the intersection of computer vision and natural language processing, requiring models to jointly understand images and textual queries. It has emerged as a popular benchmark for evaluating multimodal understanding and reasoning. With advancements in VQA accuracy, there is a growing demand for explainability and transparency for VQA models, which is crucial for improving their trust and applicability in critical domains. This survey explores the emerging field of e<strong>X</strong>plainable <strong>V</strong>isual <strong>Q</strong>uestion <strong>A</strong>nswering (XVQA), which aims not only to provide the correct answer but also to generate meaningful explanations that justify the predicted answers. Firstly, we systematically review existing methods on XVQA, and propose a three-level taxonomy to organize them. The proposed taxonomy primarily categorizes XVQA methods based on the timing of the rationale generation and the forms of the rationales. Secondly, we review the existing VQA datasets annotated with explanations in different forms, including textual, visual and multimodal rationales. Furthermore, we summarize the evaluation metrics of XVQA for different forms of rationales. Finally, we outline the challenges for XVQA and discuss potential future directions. We aim to organize existing research in this domain and inspire future investigations into the explainability of VQA models.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104215"},"PeriodicalIF":15.5,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CATCH: Causal attention enhanced meta-path semantic fusion for robust hyperbolic heterogeneous graph embedding 捕获:因果注意增强的元路径语义融合鲁棒双曲异构图嵌入
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-08-01 Epub Date: 2026-02-05 DOI: 10.1016/j.inffus.2026.104206
Bojia Liu , Conghui Zheng , Li Pan
Heterogeneous graph representation learning seeks to capture the complex structural and semantic properties in heterogeneous graphs. The integration of hyperbolic space, which is well-suited to modeling the intrinsic degree power-law distribution of graphs, has facilitated significant advancements in this area. Recent methods leverage hyperbolic attention mechanisms to fuse semantic information within metapath-induced subgraphs. Despite this progress, a major limitation remains: these methods leverage attention for information aggregation but fail to model the causal relationship between semantic fusion and downstream task performance, leading to spurious semantic associations that reduce robustness to noise and impair cross-task generalization. To address this challenge, we propose a Causal ATtention enhanCed Hyperbolic Heterogeneous Graph Neural Network (CATCH), intending to achieve sufficient semantic information fusion. To the best of our knowledge, CATCH is the first to integrate hyperbolic space with causal inference for heterogeneous graph representations, directly targeting spurious semantic correlations at the source. Specifically, CATCH explicitly encodes the Euclidean node attributes of different types into a shared semantic hyperbolic space. To capture the underlying semantics, context subgraphs based on one-order and high-order metapaths are constructed to facilitate hyperbolic attention-based intra-level and inter-level information aggregation, thus forming comprehensive representations. Finally, a causal attention enhancement mechanism is implemented with direct supervision on attention learning, leveraging counterfactual causal inference to generate counterfactual representations for computing direct causal effects. By jointly optimizing a task-specific objective alongside a causal loss, CATCH promotes more faithful semantic encoding, leading to improved robustness and generalization. Extensive experiments on four real-world datasets validate the superior performance of CATCH across multiple tasks. The implementation is available at https://github.com/Crystal-LiuBojia/CATCH.
Recommendation performance on Amazon-CD and Amazon-Book.
异构图表示学习旨在捕获异构图中复杂的结构和语义属性。双曲空间的积分非常适合于图形的内禀次幂律分布的建模,促进了这一领域的重大进展。最近的方法利用双曲注意机制来融合元路径诱导子图中的语义信息。尽管取得了这些进展,但仍然存在一个主要的限制:这些方法利用注意力进行信息聚合,但未能模拟语义融合与下游任务性能之间的因果关系,导致虚假的语义关联,从而降低了对噪声的鲁棒性并损害了跨任务泛化。为了解决这一挑战,我们提出了一种因果注意增强双曲异构图神经网络(CATCH),旨在实现足够的语义信息融合。据我们所知,CATCH是第一个将双曲空间与异构图表示的因果推理集成在一起的,直接针对来源的虚假语义关联。具体来说,CATCH将不同类型的欧几里得节点属性显式地编码到共享的语义双曲空间中。为了捕获底层语义,构建基于一阶和高阶元路径的上下文子图,促进基于双曲注意的层内和层间信息聚合,从而形成综合表征。最后,通过对注意学习的直接监督,实现了因果注意增强机制,利用反事实因果推理生成反事实表征来计算直接因果效应。通过联合优化特定于任务的目标和因果损失,CATCH促进了更忠实的语义编码,从而提高了鲁棒性和泛化。在四个真实数据集上进行的大量实验验证了CATCH跨多任务的卓越性能。该实现可在amazon.com - cd和Amazon-Book上的https://github.com/Crystal-LiuBojia/CATCH.Recommendation性能上获得。
{"title":"CATCH: Causal attention enhanced meta-path semantic fusion for robust hyperbolic heterogeneous graph embedding","authors":"Bojia Liu ,&nbsp;Conghui Zheng ,&nbsp;Li Pan","doi":"10.1016/j.inffus.2026.104206","DOIUrl":"10.1016/j.inffus.2026.104206","url":null,"abstract":"<div><div>Heterogeneous graph representation learning seeks to capture the complex structural and semantic properties in heterogeneous graphs. The integration of hyperbolic space, which is well-suited to modeling the intrinsic degree power-law distribution of graphs, has facilitated significant advancements in this area. Recent methods leverage hyperbolic attention mechanisms to fuse semantic information within metapath-induced subgraphs. Despite this progress, a major limitation remains: these methods leverage attention for information aggregation but fail to model the causal relationship between semantic fusion and downstream task performance, leading to spurious semantic associations that reduce robustness to noise and impair cross-task generalization. To address this challenge, we propose a <strong>C</strong>ausal <strong>AT</strong>tention enhan<strong>C</strong>ed <strong>H</strong>yperbolic Heterogeneous Graph Neural Network (<strong>CATCH</strong>), intending to achieve sufficient semantic information fusion. To the best of our knowledge, CATCH is the first to integrate hyperbolic space with causal inference for heterogeneous graph representations, directly targeting spurious semantic correlations at the source. Specifically, CATCH explicitly encodes the Euclidean node attributes of different types into a shared semantic hyperbolic space. To capture the underlying semantics, context subgraphs based on one-order and high-order metapaths are constructed to facilitate hyperbolic attention-based intra-level and inter-level information aggregation, thus forming comprehensive representations. Finally, a causal attention enhancement mechanism is implemented with direct supervision on attention learning, leveraging counterfactual causal inference to generate counterfactual representations for computing direct causal effects. By jointly optimizing a task-specific objective alongside a causal loss, CATCH promotes more faithful semantic encoding, leading to improved robustness and generalization. Extensive experiments on four real-world datasets validate the superior performance of CATCH across multiple tasks. The implementation is available at <span><span>https://github.com/Crystal-LiuBojia/CATCH</span><svg><path></path></svg></span>.</div><div>Recommendation performance on Amazon-CD and Amazon-Book.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104206"},"PeriodicalIF":15.5,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146134527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedFusionNet: Advancing oral cancer recurrence prediction through federated fusion modeling FedFusionNet:通过联邦融合建模推进口腔癌复发预测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-08-01 Epub Date: 2026-02-07 DOI: 10.1016/j.inffus.2026.104205
Al Rafi Aurnob , Sharia Arfin Tanim , Tahmid Enam Shrestha , M.F. Mridha , Durjoy Mistry
Oral cancer represents a considerable global medical problem that requires the development of new technologies that offer reliable advanced therapies. This study introduced FedFusionNet, a fusion-centric model that was meticulously developed to advance early oral cancer diagnosis while preserving data privacy. The primary objective was to develop a model using federated learning (FL) to train across diverse healthcare facilities globally without compromising patient data confidentiality. This model uses features from the ResNeXt101 32X8D and InceptionV3 models to implement a single-level fusion via feature concatenation. This helps to enhance the effectiveness and stability of the model. Specifically, the federated averaging (FedAvg) technique fosters collaborative model training across multiple hospitals while safeguarding sensitive patient information. This ensured that each participating hospital could contribute to the development of the model without sharing the raw data. The proposed model was trained on a dataset of 10,002 images that included both healthy and cancerous oral tissues. Rigorous training and evaluation were conducted for both Independent and Identically Distributed (IID) and Independent and Non-Identically Distributed (Non-IID) settings. FedFusionNet demonstrated superior performance compared with pre-trained and some custom models for oral cancer diagnosis. This scalable and secure framework has profound implications for healthcare analytics. It is a proof-of-concept demonstration that utilizes publicly available data to establish the technical feasibility of the FedFusionNet framework. Future deployment in actual collaborative environments would demonstrate its security-by-design capabilities across hospitals, where patient data confidentiality is a priority.
口腔癌是一个相当大的全球性医疗问题,需要开发提供可靠的先进治疗方法的新技术。本研究介绍了FedFusionNet,这是一种以融合为中心的模型,精心开发以推进早期口腔癌诊断,同时保护数据隐私。主要目标是开发一个使用联邦学习(FL)的模型,以便在不影响患者数据机密性的情况下跨全球不同医疗机构进行培训。该模型使用ResNeXt101 32X8D和InceptionV3模型的特征,通过特征连接实现单级融合。这有助于提高模型的有效性和稳定性。具体来说,联邦平均(fedag)技术促进了跨多家医院的协作模型培训,同时保护了敏感的患者信息。这确保了每个参与的医院都可以在不共享原始数据的情况下为模型的开发做出贡献。所提出的模型是在包括健康和癌变口腔组织的1002张图像的数据集上进行训练的。对独立与同分布(IID)和独立与非同分布(Non-IID)设置进行了严格的培训和评估。与预训练模型和一些自定义模型相比,FedFusionNet在口腔癌诊断方面表现出优越的性能。这种可扩展且安全的框架对医疗保健分析具有深远的影响。这是一个概念验证演示,利用公开可用的数据来建立FedFusionNet框架的技术可行性。未来在实际协作环境中的部署将展示其在医院中的设计安全能力,在医院中,患者数据保密性是一个优先事项。
{"title":"FedFusionNet: Advancing oral cancer recurrence prediction through federated fusion modeling","authors":"Al Rafi Aurnob ,&nbsp;Sharia Arfin Tanim ,&nbsp;Tahmid Enam Shrestha ,&nbsp;M.F. Mridha ,&nbsp;Durjoy Mistry","doi":"10.1016/j.inffus.2026.104205","DOIUrl":"10.1016/j.inffus.2026.104205","url":null,"abstract":"<div><div>Oral cancer represents a considerable global medical problem that requires the development of new technologies that offer reliable advanced therapies. This study introduced FedFusionNet, a fusion-centric model that was meticulously developed to advance early oral cancer diagnosis while preserving data privacy. The primary objective was to develop a model using federated learning (FL) to train across diverse healthcare facilities globally without compromising patient data confidentiality. This model uses features from the ResNeXt101 32X8D and InceptionV3 models to implement a single-level fusion via feature concatenation. This helps to enhance the effectiveness and stability of the model. Specifically, the federated averaging (FedAvg) technique fosters collaborative model training across multiple hospitals while safeguarding sensitive patient information. This ensured that each participating hospital could contribute to the development of the model without sharing the raw data. The proposed model was trained on a dataset of 10,002 images that included both healthy and cancerous oral tissues. Rigorous training and evaluation were conducted for both Independent and Identically Distributed (IID) and Independent and Non-Identically Distributed (Non-IID) settings. FedFusionNet demonstrated superior performance compared with pre-trained and some custom models for oral cancer diagnosis. This scalable and secure framework has profound implications for healthcare analytics. It is a proof-of-concept demonstration that utilizes publicly available data to establish the technical feasibility of the FedFusionNet framework. Future deployment in actual collaborative environments would demonstrate its security-by-design capabilities across hospitals, where patient data confidentiality is a priority.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104205"},"PeriodicalIF":15.5,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight music recommendation via multi-physiological feature fusion 基于多生理特征融合的轻量音乐推荐
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-08-01 Epub Date: 2026-02-07 DOI: 10.1016/j.inffus.2026.104211
Xiaoying Huang , Haonan Cheng , Sanyi Zhang , Xiaoxuan Guo , Long Ye
Music recommendation, as the core task of smart speakers, have an important impact on user experience in terms of recommendation speed and accuracy. However, existing music recommendation algorithms face challenges in generating adaptive playlists tailored to the user’s current state. This is primarily because achieving high recommendation accuracy typically necessitates substantial computing overheads. In addition, most of the existing music recommendation algorithms ignore smooth transitions between tracks, which further hurts the quality of the recommendations. To tackle these issues, we propose a novel Lightweight Music Recommendation (LMR) method via Multi-Physiological feature Fusion (MPF), which can be effectively applied in embedded smart speaker systems. Specifically, our proposed LMR method contains two core modules: a MPF-based music mapping module and a global-local similarity computation (GLSC) based playlist recommendation module. The lightweight MPF-based music mapping model is designed to solve the track-user adaptation problem. Furthermore, we propose a GLSC-based playlist recommendation algorithm to address the incoherence and unsmooth transitions within track sequences. Experiments demonstrate that the proposed method achieves more consistent playlist recommendations aligned with user contextual information, while also enabling smoother transitions between tracks and ensuring long-term content consistency across the entire sequence. Compared with other methods, our approach achieves a favorable balance between accuracy and efficiency.
音乐推荐作为智能音箱的核心任务,在推荐速度和准确性方面对用户体验有着重要的影响。然而,现有的音乐推荐算法在生成适合用户当前状态的自适应播放列表方面面临挑战。这主要是因为实现高推荐准确性通常需要大量的计算开销。此外,大多数现有的音乐推荐算法忽略了曲目之间的平滑过渡,这进一步损害了推荐的质量。为了解决这些问题,我们提出了一种新的基于多生理特征融合(MPF)的轻量级音乐推荐(LMR)方法,该方法可以有效地应用于嵌入式智能扬声器系统。具体来说,我们提出的LMR方法包含两个核心模块:基于mpf的音乐映射模块和基于全局-局部相似度计算(GLSC)的播放列表推荐模块。设计了一种基于mpf的轻量级音乐映射模型来解决音轨用户自适应问题。此外,我们提出了一种基于glsc的播放列表推荐算法来解决音轨序列中的不连贯和不平滑过渡问题。实验表明,所提出的方法实现了与用户上下文信息一致的更一致的播放列表推荐,同时也实现了音轨之间更平滑的过渡,并确保了整个序列的长期内容一致性。与其他方法相比,我们的方法在准确性和效率之间取得了良好的平衡。
{"title":"Lightweight music recommendation via multi-physiological feature fusion","authors":"Xiaoying Huang ,&nbsp;Haonan Cheng ,&nbsp;Sanyi Zhang ,&nbsp;Xiaoxuan Guo ,&nbsp;Long Ye","doi":"10.1016/j.inffus.2026.104211","DOIUrl":"10.1016/j.inffus.2026.104211","url":null,"abstract":"<div><div>Music recommendation, as the core task of smart speakers, have an important impact on user experience in terms of recommendation speed and accuracy. However, existing music recommendation algorithms face challenges in generating adaptive playlists tailored to the user’s current state. This is primarily because achieving high recommendation accuracy typically necessitates substantial computing overheads. In addition, most of the existing music recommendation algorithms ignore smooth transitions between tracks, which further hurts the quality of the recommendations. To tackle these issues, we propose a novel Lightweight Music Recommendation (LMR) method via Multi-Physiological feature Fusion (MPF), which can be effectively applied in embedded smart speaker systems. Specifically, our proposed LMR method contains two core modules: a MPF-based music mapping module and a global-local similarity computation (GLSC) based playlist recommendation module. The lightweight MPF-based music mapping model is designed to solve the track-user adaptation problem. Furthermore, we propose a GLSC-based playlist recommendation algorithm to address the incoherence and unsmooth transitions within track sequences. Experiments demonstrate that the proposed method achieves more consistent playlist recommendations aligned with user contextual information, while also enabling smoother transitions between tracks and ensuring long-term content consistency across the entire sequence. Compared with other methods, our approach achieves a favorable balance between accuracy and efficiency.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104211"},"PeriodicalIF":15.5,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-lingual approach for multi-modal emotion and sentiment recognition based on triple fusion 基于三重融合的多语言多模态情绪和情绪识别方法
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-08-01 Epub Date: 2026-02-04 DOI: 10.1016/j.inffus.2026.104207
Maxim Markitantov , Elena Ryumina , Anastasia Dvoynikova , Alexey Karpov
Affective states recognition is a challenging task that requires a large amount of input data, such as audio, video, and text. Current multi-modal approaches are often single-task and corpus-specific, resulting in overfitting, poor generalization across corpora, and reduced real-world performance. In this work, we address these limitations by: (1) multi-lingual training on corpora that include Russian (RAMAS) and English (MELD, CMU-MOSEI) speech; (2) multi-task learning for joint emotion and sentiment recognition; and (3) a novel Triple Fusion strategy that employs cross-modal integration at both hierarchical uni-modal and fused multi-modal feature levels, enhancing intra- and inter-modal relationships of different affective states and modalities. Additionally, to optimize performance of the approach proposed, we compare temporal encoders (Transformer-based, Mamba, xLSTM) and fusion strategies (double and triple fusion strategies with and without a label encoder) to comprehensively understand their capabilities and limitations. On the Test subset of the CMU-MOSEI corpus, the proposed approach showed mean weighted F1-score (mWF) of 88.6% for emotion recognition and weighted F1-score (WF) of 84.8% for sentiment recognition (respectively +9.5% and +6.0% absolute over prior multi-task baselines). On the Test subset of the MELD corpus, the proposed approach showed WF of 49.6% for emotion and 60.0% for sentiment (+8.4% WF for emotion recognition over the strongest multi-task baseline). On the Test subset of the RAMAS corpus, the proposed approach showed a competitive performance with WF of 71.8% and 90.0% for emotion and sentiment, respectively. We compare the performance of the approach proposed with that of the state-of-the-art ones. The source code and demo of the developed approach is publicly available at https://smil-spcras.github.io/MASAI/.
情感状态识别是一项具有挑战性的任务,需要大量的输入数据,如音频、视频和文本。当前的多模态方法通常是单任务和特定于语料库的,导致过拟合,跨语料库的泛化不良,并降低了实际性能。在这项工作中,我们通过以下方式解决了这些限制:(1)在包括俄语(RAMAS)和英语(MELD, CMU-MOSEI)语音的语料库上进行多语言训练;(2)联合情绪和情绪识别的多任务学习;(3)一种新颖的三重融合策略,该策略在分层单模态和融合多模态特征水平上采用跨模态整合,增强了不同情感状态和模态的模态内和模态间的关系。此外,为了优化所提出的方法的性能,我们比较了时间编码器(基于transformer的,Mamba的,xLSTM的)和融合策略(带和不带标签编码器的双重和三重融合策略),以全面了解它们的能力和局限性。在CMU-MOSEI语料库的Test子集上,所提出的方法显示情绪识别的平均加权f1分数(mWF)为88.6%,情绪识别的加权f1分数(WF)为84.8%(分别比先前的多任务基线绝对值+9.5%和+6.0%)。在MELD语料库的Test子集上,所提出的方法显示情绪识别的WF为49.6%,情绪识别的WF为60.0%(在最强的多任务基线上,情绪识别的WF为8.4%)。在RAMAS语料库的Test子集上,该方法在情感和情绪方面的WF分别为71.8%和90.0%,具有较强的竞争力。我们将提出的方法的性能与最先进的方法进行比较。开发的方法的源代码和演示可以在https://smil-spcras.github.io/MASAI/上公开获得。
{"title":"Multi-lingual approach for multi-modal emotion and sentiment recognition based on triple fusion","authors":"Maxim Markitantov ,&nbsp;Elena Ryumina ,&nbsp;Anastasia Dvoynikova ,&nbsp;Alexey Karpov","doi":"10.1016/j.inffus.2026.104207","DOIUrl":"10.1016/j.inffus.2026.104207","url":null,"abstract":"<div><div>Affective states recognition is a challenging task that requires a large amount of input data, such as audio, video, and text. Current multi-modal approaches are often single-task and corpus-specific, resulting in overfitting, poor generalization across corpora, and reduced real-world performance. In this work, we address these limitations by: (1) multi-lingual training on corpora that include Russian (RAMAS) and English (MELD, CMU-MOSEI) speech; (2) multi-task learning for joint emotion and sentiment recognition; and (3) a novel Triple Fusion strategy that employs cross-modal integration at both hierarchical uni-modal and fused multi-modal feature levels, enhancing intra- and inter-modal relationships of different affective states and modalities. Additionally, to optimize performance of the approach proposed, we compare temporal encoders (Transformer-based, Mamba, xLSTM) and fusion strategies (double and triple fusion strategies with and without a label encoder) to comprehensively understand their capabilities and limitations. On the Test subset of the CMU-MOSEI corpus, the proposed approach showed mean weighted F1-score (mWF) of 88.6% for emotion recognition and weighted F1-score (WF) of 84.8% for sentiment recognition (respectively +9.5% and +6.0% absolute over prior multi-task baselines). On the Test subset of the MELD corpus, the proposed approach showed WF of 49.6% for emotion and 60.0% for sentiment (+8.4% WF for emotion recognition over the strongest multi-task baseline). On the Test subset of the RAMAS corpus, the proposed approach showed a competitive performance with WF of 71.8% and 90.0% for emotion and sentiment, respectively. We compare the performance of the approach proposed with that of the state-of-the-art ones. The source code and demo of the developed approach is publicly available at <span><span>https://smil-spcras.github.io/MASAI/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104207"},"PeriodicalIF":15.5,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146134528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sharable and discriminative multi-view geometry-adaptive fusion network for 3D dental model segmentation 基于共享和判别的多视图几何自适应融合网络的牙齿三维模型分割
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-08-01 Epub Date: 2026-01-29 DOI: 10.1016/j.inffus.2026.104196
Yue Zhao , Xinning Chen , Kehan Li , Yifan Lin , Yang Liu , Fan Wang
Accurate 3D dental model segmentation is critical for digital dental treatment, as it provides valuable clinical references. Existing methods fail to adaptively evaluate the importance or contribution of different geometric attributes during heterogeneous features fusion, hindering the accuracy of end-to-end segmentation. In this paper, we pioneer the description of geometric attributes of 3D dental model as views. A multi-view geometry-adaptive fusion network (MGAFNet) is proposed to dynamically seek the optimal combination of views through distinctive and sharable features exploration for fine-grained 3D dental model segmentation. Specifically, during distinctive features extraction, we design geometry-aware enhancement module (GAE) to improve topological variations learning in teeth. After that, a multivariate sharable cross-interaction module (SCIM) is developed to facilitate the flow of information and capture sharable features among views. Subsequently, a multivariate adaptive representation fusion module (MARF) is implemented to adaptively balance the importance or contribution of views by constructing weight matrices for distinctive and sharable features from different feature sources. Compared to eight advanced methods, our MGAFNet achieves state-of-the-art performance on both a public benchmark and a private clinical dataset. It demonstrates robustness in handling various dental conditions (e.g., misaligned, missing and supernumerary teeth), avoiding category confusion and blurry boundary segmentation.
准确的三维牙科模型分割对于数字化牙科治疗至关重要,因为它提供了宝贵的临床参考。现有方法在异构特征融合过程中不能自适应地评估不同几何属性的重要性或贡献,影响了端到端分割的准确性。在本文中,我们率先将三维牙齿模型的几何属性描述为视图。提出了一种多视图几何自适应融合网络(MGAFNet),通过独特和可共享的特征探索,动态寻求视图的最佳组合,用于细粒度牙科三维模型分割。具体来说,在特征提取过程中,我们设计了几何感知增强模块(GAE)来改进牙齿的拓扑变化学习。然后,开发了一个多变量共享交叉交互模块(SCIM),以促进信息的流动和捕获视图之间的共享特征。然后,通过构建不同特征源中不同特征和可共享特征的权重矩阵,实现多变量自适应表示融合模块(MARF)自适应平衡视图的重要性或贡献。与八种先进的方法相比,我们的MGAFNet在公共基准和私人临床数据集上都达到了最先进的性能。它在处理各种牙齿状况(例如,牙齿错位,缺失和多余)方面表现出鲁棒性,避免了类别混淆和模糊的边界分割。
{"title":"Sharable and discriminative multi-view geometry-adaptive fusion network for 3D dental model segmentation","authors":"Yue Zhao ,&nbsp;Xinning Chen ,&nbsp;Kehan Li ,&nbsp;Yifan Lin ,&nbsp;Yang Liu ,&nbsp;Fan Wang","doi":"10.1016/j.inffus.2026.104196","DOIUrl":"10.1016/j.inffus.2026.104196","url":null,"abstract":"<div><div>Accurate 3D dental model segmentation is critical for digital dental treatment, as it provides valuable clinical references. <em>Existing methods</em> fail to adaptively evaluate the importance or contribution of different geometric attributes during heterogeneous features fusion, hindering the accuracy of end-to-end segmentation. <em>In this paper</em>, we pioneer the description of geometric attributes of 3D dental model as views. A multi-view geometry-adaptive fusion network (MGAFNet) is proposed to dynamically seek the optimal combination of views through distinctive and sharable features exploration for fine-grained 3D dental model segmentation. <em>Specifically</em>, during distinctive features extraction, we design geometry-aware enhancement module (GAE) to improve topological variations learning in teeth. After that, a multivariate sharable cross-interaction module (SCIM) is developed to facilitate the flow of information and capture sharable features among views. Subsequently, a multivariate adaptive representation fusion module (MARF) is implemented to adaptively balance the importance or contribution of views by constructing weight matrices for distinctive and sharable features from different feature sources. Compared to eight advanced methods, our MGAFNet achieves state-of-the-art performance on both a public benchmark and a private clinical dataset. It demonstrates robustness in handling various dental conditions (e.g., misaligned, missing and supernumerary teeth), avoiding category confusion and blurry boundary segmentation.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"132 ","pages":"Article 104196"},"PeriodicalIF":15.5,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
All-weather multi-modality image fusion: Unified framework and 100k benchmark 全天候多模态图像融合:统一框架和100k基准
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-23 DOI: 10.1016/j.inffus.2026.104130
Xilai Li , Wuyang Liu , Xiaosong Li , Fuqiang Zhou , Huafeng Li , Feiping Nie
Multi-modality image fusion (MMIF) combines complementary information from different image modalities to provide a comprehensive and objective interpretation of scenes. However, existing fusion methods cannot resist diverse weather interference in real-world scenes, limiting their practical applicability. To bridge this gap, we propose an end-to-end, unified all-weather MMIF model. Rather than focusing solely on pixel-level recovery, our method emphasizes maximizing the representation of key scene information through joint feature fusion and restoration. Specifically, we first decompose images into low-rank and sparse components, enabling effective feature separation for enhanced multi-modality perception. During feature recovery, we introduce a physically-aware clear feature prediction module, inferring variations in light transmission via illumination and reflectance. Clear features generated by the network are used to enhance the representation of salient information. We also construct a large-scale MMIF dataset with 100,000 image pairs comprehensively across rain, haze, and snow conditions, as well as covering various degradation levels and diverse scenes. Experimental results in both real-world and synthetic scenes demonstrate that the proposed method excels in image fusion and downstream tasks such as object detection, semantic segmentation, and depth estimation. The source code is available at https://github.com/ixilai/AWFusion.
多模态图像融合(MMIF)将来自不同图像模态的互补信息结合起来,提供对场景的全面、客观的解释。然而,现有的融合方法无法抵抗现实场景中的各种天气干扰,限制了其实际适用性。为了弥补这一差距,我们提出了一个端到端、统一的全天候MMIF模型。我们的方法不是仅仅关注像素级的恢复,而是强调通过联合特征融合和恢复来最大化地表示关键场景信息。具体来说,我们首先将图像分解为低秩和稀疏的组件,从而实现有效的特征分离,以增强多模态感知。在特征恢复过程中,我们引入了一个物理感知的清晰特征预测模块,通过照明和反射率推断光传输的变化。由网络生成的清晰特征被用来增强显著信息的表示。我们还构建了一个包含100,000对图像的大规模MMIF数据集,全面跨越雨、雾霾和雪条件,涵盖不同退化程度和不同场景。真实场景和合成场景的实验结果表明,该方法在图像融合和下游任务(如目标检测、语义分割和深度估计)方面表现优异。源代码可从https://github.com/ixilai/AWFusion获得。
{"title":"All-weather multi-modality image fusion: Unified framework and 100k benchmark","authors":"Xilai Li ,&nbsp;Wuyang Liu ,&nbsp;Xiaosong Li ,&nbsp;Fuqiang Zhou ,&nbsp;Huafeng Li ,&nbsp;Feiping Nie","doi":"10.1016/j.inffus.2026.104130","DOIUrl":"10.1016/j.inffus.2026.104130","url":null,"abstract":"<div><div>Multi-modality image fusion (MMIF) combines complementary information from different image modalities to provide a comprehensive and objective interpretation of scenes. However, existing fusion methods cannot resist diverse weather interference in real-world scenes, limiting their practical applicability. To bridge this gap, we propose an end-to-end, unified all-weather MMIF model. Rather than focusing solely on pixel-level recovery, our method emphasizes maximizing the representation of key scene information through joint feature fusion and restoration. Specifically, we first decompose images into low-rank and sparse components, enabling effective feature separation for enhanced multi-modality perception. During feature recovery, we introduce a physically-aware clear feature prediction module, inferring variations in light transmission via illumination and reflectance. Clear features generated by the network are used to enhance the representation of salient information. We also construct a large-scale MMIF dataset with 100,000 image pairs comprehensively across rain, haze, and snow conditions, as well as covering various degradation levels and diverse scenes. Experimental results in both real-world and synthetic scenes demonstrate that the proposed method excels in image fusion and downstream tasks such as object detection, semantic segmentation, and depth estimation. The source code is available at <span><span>https://github.com/ixilai/AWFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104130"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1