首页 > 最新文献

Information Fusion最新文献

英文 中文
FedEGL: Edge-assisted federated graph learning FedEGL:边缘辅助联邦图学习
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.inffus.2025.104118
Haitao Wang , Aojie Luo , Wenchao Xu , Haozhao Wang , Yichen Li , Yining Qi , Rui Zhang , Ruixuan Li
Federated graph learning excels in learning graph-structured data that are distributed across multiple clients. However, the partition of graph data results in each client only possessing a subgraph, lacking its neighbor nodes, which significantly degrades accuracy. Although exchanging original nodes can address this issue, it requires interaction with a remote server, not only causing significant communication delays but also leaking data privacy. To tackle this, this paper proposes an edge-server-assisted federated graph learning approach, namely FedEGL, which aggregates and exchanges intermediate features of approximated nodes through a third-party edge server, performing cross-client feature alignment and dynamic weighted aggregation while dynamically allocating privacy budgets with adaptive differential privacy to preserve node privacy. Additionally, differential privacy is introduced to protect the privacy of approximated node features by dynamically allocating privacy budgets. Experimental results show that our method achieves accuracy close to that in centralized settings, with the classification accuracy improved by up to 8% compared to the latest baseline. This method can improve model accuracy while protecting privacy, providing an effective solution to the subgraph partitioning problem in federated graph learning.
联邦图学习擅长学习分布在多个客户机上的图结构数据。然而,图数据的分区导致每个客户端只拥有一个子图,而缺乏它的邻居节点,这大大降低了精度。虽然交换原始节点可以解决这个问题,但它需要与远程服务器进行交互,这不仅会导致严重的通信延迟,还会泄露数据隐私。为了解决这个问题,本文提出了一种边缘服务器辅助的联邦图学习方法,即FedEGL,该方法通过第三方边缘服务器聚合和交换近似节点的中间特征,进行跨客户端特征对齐和动态加权聚合,同时使用自适应差分隐私动态分配隐私预算,以保护节点隐私。此外,引入差分隐私,通过动态分配隐私预算来保护近似节点特征的隐私。实验结果表明,该方法的准确率接近集中式设置,与最新基线相比,分类准确率提高了8%。该方法在保护隐私的同时提高了模型的准确性,为联邦图学习中的子图划分问题提供了一种有效的解决方案。
{"title":"FedEGL: Edge-assisted federated graph learning","authors":"Haitao Wang ,&nbsp;Aojie Luo ,&nbsp;Wenchao Xu ,&nbsp;Haozhao Wang ,&nbsp;Yichen Li ,&nbsp;Yining Qi ,&nbsp;Rui Zhang ,&nbsp;Ruixuan Li","doi":"10.1016/j.inffus.2025.104118","DOIUrl":"10.1016/j.inffus.2025.104118","url":null,"abstract":"<div><div>Federated graph learning excels in learning graph-structured data that are distributed across multiple clients. However, the partition of graph data results in each client only possessing a subgraph, lacking its neighbor nodes, which significantly degrades accuracy. Although exchanging original nodes can address this issue, it requires interaction with a remote server, not only causing significant communication delays but also leaking data privacy. To tackle this, this paper proposes an edge-server-assisted federated graph learning approach, namely FedEGL, which aggregates and exchanges intermediate features of <em>approximated</em> nodes through a third-party edge server, performing cross-client feature alignment and dynamic weighted aggregation while dynamically allocating privacy budgets with adaptive differential privacy to preserve node privacy. Additionally, differential privacy is introduced to protect the privacy of approximated node features by dynamically allocating privacy budgets. Experimental results show that our method achieves accuracy close to that in centralized settings, with the classification accuracy improved by up to 8% compared to the latest baseline. This method can improve model accuracy while protecting privacy, providing an effective solution to the subgraph partitioning problem in federated graph learning.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104118"},"PeriodicalIF":15.5,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regional defeats global: An efficient regional feature fusion via convolutional architecture for multispectral object detection 区域战胜全局:基于卷积结构的高效区域特征融合多光谱目标检测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.inffus.2025.104110
Zhenhao Wang, Tian Tian
Multispectral object detection continues to face significant challenges in achieving a balanced optimization between accuracy and efficiency. Most existing approaches rely heavily on global modeling, which, although capable of integrating multi-band information, incurs substantial computational overhead and fails to fully exploit the spatial correlations across spectral bands. To address this issue, this paper introduces a convolutional architecture-based region feature computation mechanism that leverages the inherent advantage of convolutional operations in preserving spatial structure, enabling spatial cues to be fully retained during feature representation learning and explicitly incorporated into multispectral feature interaction. Meanwhile, by reconstructing global attention computation into localized regional modeling, the proposed method markedly reduces computational cost while maintaining effective feature fusion, thereby facilitating a lightweight architectural design. Experimental results demonstrate that the proposed module achieves the lowest computational overhead while improving mAP@50 by 1.97% and 1.66% on the DroneVehicle and VEDAI remote-sensing datasets, respectively, compared with state-of-the-art methods. Moreover, it exhibits strong applicability on the pedestrian detection datasets FLIR and LLVIP. The code is available https://github.com/wzh326/LMFFM_CARFCOM.git.
在实现精度和效率之间的平衡优化方面,多光谱目标检测仍然面临着重大挑战。大多数现有方法严重依赖于全局建模,尽管能够集成多波段信息,但会产生大量的计算开销,并且不能充分利用光谱波段间的空间相关性。为了解决这一问题,本文引入了一种基于卷积架构的区域特征计算机制,该机制利用卷积运算在保留空间结构方面的固有优势,使空间线索在特征表示学习过程中得到充分保留,并明确地融入到多光谱特征交互中。同时,该方法通过将全局注意力计算重构为局部区域建模,在保持有效特征融合的同时显著降低了计算成本,从而实现了轻量化的建筑设计。实验结果表明,与现有方法相比,该模块在无人机和VEDAI遥感数据集上分别提高了1.97%和1.66%的mAP@50,实现了最低的计算开销。此外,该方法在行人检测数据集FLIR和LLVIP上具有较强的适用性。代码可以在https://github.com/wzh326/LMFFM_CARFCOM.git上找到。
{"title":"Regional defeats global: An efficient regional feature fusion via convolutional architecture for multispectral object detection","authors":"Zhenhao Wang,&nbsp;Tian Tian","doi":"10.1016/j.inffus.2025.104110","DOIUrl":"10.1016/j.inffus.2025.104110","url":null,"abstract":"<div><div>Multispectral object detection continues to face significant challenges in achieving a balanced optimization between accuracy and efficiency. Most existing approaches rely heavily on global modeling, which, although capable of integrating multi-band information, incurs substantial computational overhead and fails to fully exploit the spatial correlations across spectral bands. To address this issue, this paper introduces a convolutional architecture-based region feature computation mechanism that leverages the inherent advantage of convolutional operations in preserving spatial structure, enabling spatial cues to be fully retained during feature representation learning and explicitly incorporated into multispectral feature interaction. Meanwhile, by reconstructing global attention computation into localized regional modeling, the proposed method markedly reduces computational cost while maintaining effective feature fusion, thereby facilitating a lightweight architectural design. Experimental results demonstrate that the proposed module achieves the lowest computational overhead while improving mAP@50 by 1.97% and 1.66% on the DroneVehicle and VEDAI remote-sensing datasets, respectively, compared with state-of-the-art methods. Moreover, it exhibits strong applicability on the pedestrian detection datasets FLIR and LLVIP. The code is available <span><span>https://github.com/wzh326/LMFFM_CARFCOM.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104110"},"PeriodicalIF":15.5,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subgraph-focused biomedical knowledge embedding with bi-semantic encoder for multi-type drug-drug interaction prediction 基于双语义编码器的生物医学知识嵌入多类型药物-药物相互作用预测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-31 DOI: 10.1016/j.inffus.2025.104109
Xiangpeng Bi , Wenjian Ma , Huasen Jiang , Qing Cai , Jie Nie , Zhiqiang Wei , Shugang Zhang
Identifying multi-type drug-drug interactions (DDIs) enables more precise assessment of drug safety risks and provides targeted guidance for combination therapy, making it a critical task in pharmacology. Given it can directly integrate diverse biomedical information and effectively model the intricate mechanisms underlying drug interaction, knowledge graph (KG)-based approaches have emerged for predicting DDIs. Recent advances have shown great promise in this regard; however, existing solutions still overlook three critical issues: 1) neglect of information sparsity, 2) neglect of polyadic interactions, and 3) lack of fusion paradigm, which severely hinder the comprehensive identification and understanding of drug interaction patterns. To address these issues, we introduce a Bi-Semantic encoDer-dRiven knowledge sUbGraph representation learning framework (Bi-SemDRUG) for multi-type DDI prediction. Bi-SemDRUG proposes a multi-view knowledge subgraph partitioning strategy to extract drug-related refined topological structures from large-scale knowledge graphs, thereby reducing the interference of irrelevant information. Furthermore, Bi-SemDRUG incorporates a bi-semantic subgraph encoder that effectively uncovers multi-order semantic relationships embedded within the knowledge subgraphs. Finally, we propose a general paradigm for information fusion to facilitate the integration of multi-level drug-related information. Exhaustive experiments on three benchmark datasets demonstrate that our proposed model achieves state-of-the-art performance compared to other baseline methods and exhibits good generalization in large-scale DDI prediction. Additionally, case studies emphasize its capacity to offer a more comprehensive insight into the underlying mechanisms of DDIs.
多类型药物相互作用(ddi)的识别可以更精确地评估药物安全风险,为联合治疗提供有针对性的指导,是药理学中的一项重要任务。基于知识图(knowledge graph, KG)的ddi预测方法可以直接整合多种生物医学信息,有效地模拟药物相互作用的复杂机制。最近的进展在这方面显示出很大的希望;然而,现有的解决方案仍然忽视了三个关键问题:1)忽视信息稀疏性;2)忽视多元相互作用;3)缺乏融合范式,严重阻碍了对药物相互作用模式的全面识别和理解。为了解决这些问题,我们引入了一个用于多类型DDI预测的双语义编码器驱动的知识子图表示学习框架(Bi-SemDRUG)。Bi-SemDRUG提出了一种多视图知识子图划分策略,从大规模知识图中提取与药物相关的精细拓扑结构,从而减少不相关信息的干扰。此外,Bi-SemDRUG集成了一个双语义子图编码器,可以有效地揭示嵌入在知识子图中的多阶语义关系。最后,我们提出了一种通用的信息融合范式,以促进多层次药物相关信息的整合。在三个基准数据集上的详尽实验表明,与其他基准方法相比,我们提出的模型达到了最先进的性能,并且在大规模DDI预测中表现出良好的泛化。此外,案例研究强调其提供对ddi潜在机制的更全面洞察的能力。
{"title":"Subgraph-focused biomedical knowledge embedding with bi-semantic encoder for multi-type drug-drug interaction prediction","authors":"Xiangpeng Bi ,&nbsp;Wenjian Ma ,&nbsp;Huasen Jiang ,&nbsp;Qing Cai ,&nbsp;Jie Nie ,&nbsp;Zhiqiang Wei ,&nbsp;Shugang Zhang","doi":"10.1016/j.inffus.2025.104109","DOIUrl":"10.1016/j.inffus.2025.104109","url":null,"abstract":"<div><div>Identifying multi-type drug-drug interactions (DDIs) enables more precise assessment of drug safety risks and provides targeted guidance for combination therapy, making it a critical task in pharmacology. Given it can directly integrate diverse biomedical information and effectively model the intricate mechanisms underlying drug interaction, knowledge graph (KG)-based approaches have emerged for predicting DDIs. Recent advances have shown great promise in this regard; however, existing solutions still overlook three critical issues: 1) neglect of information sparsity, 2) neglect of polyadic interactions, and 3) lack of fusion paradigm, which severely hinder the comprehensive identification and understanding of drug interaction patterns. To address these issues, we introduce a <strong>Bi-Sem</strong>antic enco<strong>D</strong>er-d<strong>R</strong>iven knowledge s<strong>U</strong>b<strong>G</strong>raph representation learning framework (<strong>Bi-SemDRUG</strong>) for multi-type DDI prediction. Bi-SemDRUG proposes a multi-view knowledge subgraph partitioning strategy to extract drug-related refined topological structures from large-scale knowledge graphs, thereby reducing the interference of irrelevant information. Furthermore, Bi-SemDRUG incorporates a bi-semantic subgraph encoder that effectively uncovers multi-order semantic relationships embedded within the knowledge subgraphs. Finally, we propose a general paradigm for information fusion to facilitate the integration of multi-level drug-related information. Exhaustive experiments on three benchmark datasets demonstrate that our proposed model achieves state-of-the-art performance compared to other baseline methods and exhibits good generalization in large-scale DDI prediction. Additionally, case studies emphasize its capacity to offer a more comprehensive insight into the underlying mechanisms of DDIs.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104109"},"PeriodicalIF":15.5,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating vision-language navigation instructions incorporated fine-grained alignment annotations 生成包含细粒度对齐注释的视觉语言导航指令
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.inffus.2025.104107
Yibo Cui , Liang Xie , Yu Zhao , Jiawei Sun , Erwei Yin
Vision-Language Navigation (VLN) enables intelligent agents to navigate environments by integrating visual perception and natural language instructions, yet faces significant challenges due to the scarcity of fine-grained cross-modal alignment annotations. Existing datasets primarily focus on global instruction-trajectory matching, neglecting sub-instruction-level and entity-level alignments critical for accurate navigation action decision-making. To address this limitation, we propose FCA-NIG, a generative framework that automatically constructs navigation instructions with dual-level fine-grained cross-modal annotations. In this framework, an augmented trajectory is first divided into sub-trajectories, which are then processed through GLIP-based landmark detection, crafted instruction construction, OFA-Speaker based R2R-like instruction generation, and CLIP-powered entity selection, generating sub-instruction-trajectory pairs with entity-landmark annotations. Finally, these sub-pairs are aggregated to form a complete instruction-trajectory pair. The framework generates the FCA-R2R dataset, the first large-scale augmentation dataset featuring precise sub-instruction-sub-trajectory and entity-landmark alignments. Extensive experiments demonstrate that training with FCA-R2R significantly improves the performance of multiple state-of-the-art VLN agents, including SF, EnvDrop, RecBERT, HAMT, DUET, and BEVBERT. Incorporating sub-instruction-trajectory alignment enhances agents’ state awareness and decision accuracy, while entity-landmark alignment further boosts navigation performance and generalization. These results highlight the effectiveness of FCA-NIG in generating high-quality, scalable training data without manual annotation, advancing fine-grained cross-modal learning in complex navigation tasks.
视觉语言导航(VLN)使智能代理能够通过集成视觉感知和自然语言指令来导航环境,但由于缺乏细粒度跨模态对齐注释,因此面临重大挑战。现有数据集主要关注全局指令-轨迹匹配,而忽略了子指令级和实体级对齐,这对精确的导航行动决策至关重要。为了解决这一限制,我们提出了FCA-NIG,这是一个生成框架,可以自动构建带有双级细粒度跨模态注释的导航指令。在该框架中,首先将增强轨迹划分为子轨迹,然后通过基于glip的地标检测、精心设计的指令构建、基于OFA-Speaker的类r2r指令生成和基于clip的实体选择对子轨迹进行处理,生成带有实体地标注释的子指令轨迹对。最后,将这些子对聚合成一个完整的指令轨迹对。该框架生成FCA-R2R数据集,这是第一个具有精确子指令-子轨迹和实体-地标对齐的大规模增强数据集。大量实验表明,使用FCA-R2R进行训练可以显著提高多个最先进的VLN智能体的性能,包括SF、EnvDrop、RecBERT、HAMT、DUET和BEVBERT。子指令-轨迹对齐提高了智能体的状态感知和决策精度,而实体-地标对齐进一步提高了导航性能和泛化。这些结果突出了FCA-NIG在生成高质量、可扩展的训练数据方面的有效性,无需人工注释,推进了复杂导航任务中细粒度跨模态学习。
{"title":"Generating vision-language navigation instructions incorporated fine-grained alignment annotations","authors":"Yibo Cui ,&nbsp;Liang Xie ,&nbsp;Yu Zhao ,&nbsp;Jiawei Sun ,&nbsp;Erwei Yin","doi":"10.1016/j.inffus.2025.104107","DOIUrl":"10.1016/j.inffus.2025.104107","url":null,"abstract":"<div><div>Vision-Language Navigation (VLN) enables intelligent agents to navigate environments by integrating visual perception and natural language instructions, yet faces significant challenges due to the scarcity of fine-grained cross-modal alignment annotations. Existing datasets primarily focus on global instruction-trajectory matching, neglecting sub-instruction-level and entity-level alignments critical for accurate navigation action decision-making. To address this limitation, we propose FCA-NIG, a generative framework that automatically constructs navigation instructions with dual-level fine-grained cross-modal annotations. In this framework, an augmented trajectory is first divided into sub-trajectories, which are then processed through GLIP-based landmark detection, crafted instruction construction, OFA-Speaker based R2R-like instruction generation, and CLIP-powered entity selection, generating sub-instruction-trajectory pairs with entity-landmark annotations. Finally, these sub-pairs are aggregated to form a complete instruction-trajectory pair. The framework generates the FCA-R2R dataset, the first large-scale augmentation dataset featuring precise sub-instruction-sub-trajectory and entity-landmark alignments. Extensive experiments demonstrate that training with FCA-R2R significantly improves the performance of multiple state-of-the-art VLN agents, including SF, EnvDrop, RecBERT, HAMT, DUET, and BEVBERT. Incorporating sub-instruction-trajectory alignment enhances agents’ state awareness and decision accuracy, while entity-landmark alignment further boosts navigation performance and generalization. These results highlight the effectiveness of FCA-NIG in generating high-quality, scalable training data without manual annotation, advancing fine-grained cross-modal learning in complex navigation tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104107"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the sim-to-real gap in RF localization with large-scale synthetic pretraining 用大规模合成预训练弥合射频定位的模拟与真实差距
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.inffus.2025.104104
Armen Manukyan , Rafayel Mkrtchyan , Ararat Saribekyan , Theofanis P. Raptis , Hrant Khachatrian
Radio frequency (RF) fingerprinting is a promising localization technique for GPS-denied environments, yet it tends to suffer from a fundamental limitation: Poor generalization to previously unmapped areas. Traditional methods such as k-nearest neighbors (k-NN) perform well where data is available but may fail on unseen streets, limiting real-world deployment. Deep learning (DL) offers potential remedies by learning spatial-RF patterns that generalize, but requires far more training data than what simple real-world measurement campaigns can provide. In this paper, we investigate whether synthetic data can bridge this generalization gap. Using (i) a real-world dataset from Rome and (ii) NVIDIA’s open-source ray-tracing simulator Sionna, we generate synthetic datasets under varying realism and scale conditions. Specifically, we use Dataset A containing real-world measurements with real base stations (BS) and real signals, and create Dataset B using real BS locations but simulated signals, Dataset C with both simulated BS locations and signals, and Dataset B’ which represents an optimized version of Dataset B where BS parameters are calibrated via Gaussian Process to maximize signal correlation with Dataset A. Our evaluation reveals a pronounced sim-to-real gap: Models achieving 25m error on synthetic data degrade to 184m on real data. Nonetheless, pretraining on synthetic data reduces real-world localization error from 323m to 162m; a 50% improvement over real-only training. Notably, simulation fidelity proves more important than scale: A smaller calibrated dataset (53K samples) outperforms a larger uncalibrated one (274K samples). To further evaluate the generalization capabilities of the models, we conduct experiments on an unseen geographical region using a real-world dataset from Oslo. In the zero-shot setting, the models achieve a root mean square error (RMSE) of 132.2m on the entire dataset, and 61.5m on unseen streets after fine-tuning on Oslo data. While challenges remain before meeting more practical localization accuracy, this work provides a systematic study in the field of wireless communication of synthetic-to-real transfer in RF localization and highlights the value of simulation-aware pretraining for generalizing DL models to real-world scenarios.
射频(RF)指纹识别是一种很有前途的定位技术,适用于没有gps的环境,但它往往有一个基本的局限性:对以前未映射的区域的泛化能力差。传统的方法,如k近邻(k-NN)在数据可用的情况下表现良好,但在未知的街道上可能会失败,限制了现实世界的部署。深度学习(DL)通过学习泛化的空间射频模式提供了潜在的补救措施,但需要更多的训练数据,而不是简单的现实世界测量活动所能提供的。在本文中,我们研究了合成数据是否可以弥补这一泛化差距。使用(i)来自罗马的真实世界数据集和(ii) NVIDIA的开源光线追踪模拟器Sionna,我们在不同的现实主义和规模条件下生成合成数据集。具体来说,我们使用包含真实基站(BS)和真实信号的真实测量数据集A,并使用真实BS位置但模拟信号创建数据集B,使用模拟BS位置和信号创建数据集C,以及代表数据集B的优化版本的数据集B,其中BS参数通过高斯过程校准以最大化与数据集A的信号相关性。我们的评估揭示了明显的模拟与真实差距:在合成数据上达到25m误差的模型在真实数据上降低到184m。尽管如此,对合成数据的预训练将真实世界的定位误差从323m减少到162m;比真实训练提高了50%。值得注意的是,模拟保真度比规模更重要:较小的校准数据集(53K样本)优于较大的未校准数据集(274K样本)。为了进一步评估模型的泛化能力,我们使用来自奥斯陆的真实数据集在一个未知的地理区域进行了实验。在零射击设置下,模型在整个数据集上的均方根误差(RMSE)为132.2m,在对奥斯陆数据进行微调后,在看不见的街道上的均方根误差(RMSE)为61.5m。虽然在满足更实际的定位精度之前仍然存在挑战,但这项工作提供了射频定位中合成到真实传输的无线通信领域的系统研究,并强调了模拟感知预训练对将DL模型推广到现实场景的价值。
{"title":"Bridging the sim-to-real gap in RF localization with large-scale synthetic pretraining","authors":"Armen Manukyan ,&nbsp;Rafayel Mkrtchyan ,&nbsp;Ararat Saribekyan ,&nbsp;Theofanis P. Raptis ,&nbsp;Hrant Khachatrian","doi":"10.1016/j.inffus.2025.104104","DOIUrl":"10.1016/j.inffus.2025.104104","url":null,"abstract":"<div><div>Radio frequency (RF) fingerprinting is a promising localization technique for GPS-denied environments, yet it tends to suffer from a fundamental limitation: Poor generalization to previously unmapped areas. Traditional methods such as <em>k</em>-nearest neighbors (<em>k</em>-NN) perform well where data is available but may fail on unseen streets, limiting real-world deployment. Deep learning (DL) offers potential remedies by learning spatial-RF patterns that generalize, but requires far more training data than what simple real-world measurement campaigns can provide. In this paper, we investigate whether synthetic data can bridge this generalization gap. Using (i) a real-world dataset from Rome and (ii) NVIDIA’s open-source ray-tracing simulator Sionna, we generate synthetic datasets under varying realism and scale conditions. Specifically, we use Dataset A containing real-world measurements with real base stations (BS) and real signals, and create Dataset B using real BS locations but simulated signals, Dataset C with both simulated BS locations and signals, and Dataset B’ which represents an optimized version of Dataset B where BS parameters are calibrated via Gaussian Process to maximize signal correlation with Dataset A. Our evaluation reveals a pronounced sim-to-real gap: Models achieving 25m error on synthetic data degrade to 184m on real data. Nonetheless, pretraining on synthetic data reduces real-world localization error from 323m to 162m; a 50% improvement over real-only training. Notably, simulation fidelity proves more important than scale: A smaller calibrated dataset (53K samples) outperforms a larger uncalibrated one (274K samples). To further evaluate the generalization capabilities of the models, we conduct experiments on an unseen geographical region using a real-world dataset from Oslo. In the zero-shot setting, the models achieve a root mean square error (RMSE) of 132.2m on the entire dataset, and 61.5m on unseen streets after fine-tuning on Oslo data. While challenges remain before meeting more practical localization accuracy, this work provides a systematic study in the field of wireless communication of synthetic-to-real transfer in RF localization and highlights the value of simulation-aware pretraining for generalizing DL models to real-world scenarios.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104104"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dimensional compensation for small-sample and small-size insulator burn mark via RGB-point cloud fusion in power grid inspection 基于rgb点云融合的电网检测中小样本小尺寸绝缘子烧痕尺寸补偿
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.inffus.2025.104105
Junqiu Tang , Zhikang Yuan , Zixiang Wei , Shuojie Gao , Changyong Shen
To address the challenge of scarce burn mark samples in power infrastructure inspection, we introduce the Insulator Burn Mark RGB-Point Cloud (IBMR) dataset, the first publicly available benchmark featuring RGB-point clouds with pixel-level annotations for both insulators and burn marks. To tackle the critical issue of severe class imbalance caused by the vast number of background points and the small size of burn marks, we propose a novel two-stage RGB-point cloud segmentation framework. This framework integrates DCCU-Sampling, an innovative downsampling algorithm that effectively suppresses background points while preserving critical structures of the targets, and BB-Backtracking, a geometric recovery method that reconstructs fine-grained burn mark details lost during downsampling process. Experimental results validate the framework’s effectiveness, achieving 81.21% mIoU with 32 training samples and 68.37% mIoU with only 14 samples. The dataset is publicly available at https://huggingface.co/datasets/Junqiu-Tang/IBMR.
为了解决电力基础设施检测中烧痕样本稀缺的挑战,我们引入了绝缘子烧痕rgb点云(IBMR)数据集,这是第一个公开可用的基准测试,该数据集具有针对绝缘子和烧痕的像素级注释的rgb点云。为了解决大量背景点和小尺寸烧伤痕迹导致的严重类别不平衡的关键问题,我们提出了一种新的两阶段rgb -点云分割框架。该框架集成了DCCU-Sampling和BB-Backtracking, DCCU-Sampling是一种创新的下采样算法,可以有效地抑制背景点,同时保留目标的关键结构;BB-Backtracking是一种几何恢复方法,可以重建下采样过程中丢失的细粒度烧伤痕迹细节。实验结果验证了该框架的有效性,32个训练样本的mIoU达到81.21%,14个样本的mIoU达到68.37%。该数据集可在https://huggingface.co/datasets/Junqiu-Tang/IBMR上公开获取。
{"title":"Dimensional compensation for small-sample and small-size insulator burn mark via RGB-point cloud fusion in power grid inspection","authors":"Junqiu Tang ,&nbsp;Zhikang Yuan ,&nbsp;Zixiang Wei ,&nbsp;Shuojie Gao ,&nbsp;Changyong Shen","doi":"10.1016/j.inffus.2025.104105","DOIUrl":"10.1016/j.inffus.2025.104105","url":null,"abstract":"<div><div>To address the challenge of scarce burn mark samples in power infrastructure inspection, we introduce the Insulator Burn Mark RGB-Point Cloud (IBMR) dataset, the first publicly available benchmark featuring RGB-point clouds with pixel-level annotations for both insulators and burn marks. To tackle the critical issue of severe class imbalance caused by the vast number of background points and the small size of burn marks, we propose a novel two-stage RGB-point cloud segmentation framework. This framework integrates DCCU-Sampling, an innovative downsampling algorithm that effectively suppresses background points while preserving critical structures of the targets, and BB-Backtracking, a geometric recovery method that reconstructs fine-grained burn mark details lost during downsampling process. Experimental results validate the framework’s effectiveness, achieving 81.21% mIoU with 32 training samples and 68.37% mIoU with only 14 samples. The dataset is publicly available at <span><span>https://huggingface.co/datasets/Junqiu-Tang/IBMR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104105"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Style-augmented large-scale vision model with domain-generalized knowledge fusion for anomaly detection in powder bed additive manufacturing 基于领域广义知识融合的风格增强大尺度视觉模型用于粉末床增材制造异常检测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.inffus.2025.104108
Kang Wang , Xuan Liang , Jinghua Xu , Shuyou Zhang , Jianrong Tan
Metal Additive Manufacturing (AM) has revolutionized the production of complex parts across various industries, yet ensuring consistent quality remains a significant challenge. This study addresses the critical problem of reliable and efficient anomaly detection in metal AM processes, which is essential for maintaining product quality and reducing costly post-production inspections. In this study, we propose a novel full-life-cycle generalization method, namely style-augmented large-scale vision model (SLVM), for anomaly detection in metal additive manufacturing. Our approach leverages the power of large-scale vision models and incorporates style-based augmentation techniques to enhance the detection of anomalies in AM processes. A pre-trained large-scale vision model serves as the backbone in SLVM, providing robust feature extraction capabilities essential for capturing intricate details in AM images. Building upon this foundation, a style augmentation module generates diverse stylized versions of input images, significantly improving the model’s generalization across different AM processes and materials. An anomaly detection head utilizes these style-augmented features to effectively identify and localize defects, completing the comprehensive approach to AM quality control. We evaluate our SLVM on multiple metal AM datasets, including laser powder bed fusion and binder jetting processes, demonstrating its superior performance compared to existing state-of-the-art methods. Our experiments show that SLVM achieves higher detection accuracy, better generalization across different AM processes, and improved robustness to variations in part geometry and material properties. The proposed SLVM exhibits a promising solution for enhancing quality control in metal AM, potentially reducing the need for costly post-production inspections and improving overall manufacturing efficiency.
金属增材制造(AM)已经彻底改变了各个行业复杂零件的生产,但确保始终如一的质量仍然是一个重大挑战。本研究解决了金属增材制造过程中可靠和高效的异常检测的关键问题,这对于保持产品质量和减少昂贵的生产后检查至关重要。在本研究中,我们提出了一种新的全生命周期泛化方法,即风格增强大尺度视觉模型(SLVM),用于金属增材制造中的异常检测。我们的方法利用了大规模视觉模型的力量,并结合了基于风格的增强技术,以增强对增材制造过程中异常的检测。预训练的大规模视觉模型作为SLVM的主干,为捕获AM图像中的复杂细节提供了强大的特征提取功能。在此基础上,风格增强模块生成输入图像的不同风格版本,显着提高了模型在不同AM工艺和材料中的泛化性。异常检测头利用这些样式增强功能有效地识别和定位缺陷,完成AM质量控制的综合方法。我们在多个金属增材制造数据集上评估了我们的SLVM,包括激光粉末床熔化和粘合剂喷射工艺,与现有的最先进方法相比,证明了它的优越性能。我们的实验表明,SLVM实现了更高的检测精度,更好的跨不同增材制造工艺的泛化,并且提高了对零件几何形状和材料性能变化的鲁棒性。提出的SLVM展示了一个有前途的解决方案,可以加强金属AM的质量控制,潜在地减少昂贵的生产后检查的需要,提高整体制造效率。
{"title":"Style-augmented large-scale vision model with domain-generalized knowledge fusion for anomaly detection in powder bed additive manufacturing","authors":"Kang Wang ,&nbsp;Xuan Liang ,&nbsp;Jinghua Xu ,&nbsp;Shuyou Zhang ,&nbsp;Jianrong Tan","doi":"10.1016/j.inffus.2025.104108","DOIUrl":"10.1016/j.inffus.2025.104108","url":null,"abstract":"<div><div>Metal Additive Manufacturing (AM) has revolutionized the production of complex parts across various industries, yet ensuring consistent quality remains a significant challenge. This study addresses the critical problem of reliable and efficient anomaly detection in metal AM processes, which is essential for maintaining product quality and reducing costly post-production inspections. In this study, we propose a novel full-life-cycle generalization method, namely style-augmented large-scale vision model (SLVM), for anomaly detection in metal additive manufacturing. Our approach leverages the power of large-scale vision models and incorporates style-based augmentation techniques to enhance the detection of anomalies in AM processes. A pre-trained large-scale vision model serves as the backbone in SLVM, providing robust feature extraction capabilities essential for capturing intricate details in AM images. Building upon this foundation, a style augmentation module generates diverse stylized versions of input images, significantly improving the model’s generalization across different AM processes and materials. An anomaly detection head utilizes these style-augmented features to effectively identify and localize defects, completing the comprehensive approach to AM quality control. We evaluate our SLVM on multiple metal AM datasets, including laser powder bed fusion and binder jetting processes, demonstrating its superior performance compared to existing state-of-the-art methods. Our experiments show that SLVM achieves higher detection accuracy, better generalization across different AM processes, and improved robustness to variations in part geometry and material properties. The proposed SLVM exhibits a promising solution for enhancing quality control in metal AM, potentially reducing the need for costly post-production inspections and improving overall manufacturing efficiency.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104108"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-source information fusion through tucker tensor decomposition-based transfer learning for handwriting-Based Alzheimer's disease detection 基于tucker张量分解的多源信息融合迁移学习用于手写阿尔茨海默病检测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.inffus.2025.104112
Yao Yao , Zhuoxi Yu , Dehui Wang , Chengzhe Wang , Congting Sun
With Alzheimer’s disease affecting approximately 50 million people globally, early detection has emerged as a critical public health priority in aging societies. This paper proposes a novel multi-level information fusion framework for handwriting-based Alzheimer’s disease detection, addressing the fundamental challenges of data scarcity and high-dimensional feature representation. Our approach integrates: (1) structural fusion through tensor representation preserving the multi-dimensional nature of handwriting data, (2) feature-level fusion via Tucker decomposition achieving 80% parameter reduction while maintaining discriminative information, (3) knowledge fusion through our proposed transferable source domain detection algorithm that selectively integrates relevant knowledge from related domains, and (4) decision-level fusion with a two-stage transfer-debias mechanism that mitigates negative transfer risks. Experiments on the DARWIN dataset demonstrate that our transfer learning approach achieves 93.33% accuracy and 99.10% sensitivity, substantially outperforming existing handwriting-based AD detection methods (best reported: 88.29% accuracy, 90.28% sensitivity). The framework exhibits exceptional robustness in small sample scenarios, maintaining 87.50% accuracy with just 10% of the training data. Our comprehensive analysis reveals kinematic features with an importance score of 35.3%, while temporal features collectively contribute 25.7%—among which total time (9.4%) emerges as a key marker within the temporal category. The proposed framework presents a promising non-invasive approach for early Alzheimer’s detection in aging populations, with the potential to facilitate earlier intervention and substantial healthcare cost reductions.
由于阿尔茨海默病影响全球约5000万人,早期发现已成为老龄化社会的一项关键公共卫生优先事项。本文提出了一种基于手写体的阿尔茨海默病检测的多层次信息融合框架,解决了数据稀缺性和高维特征表示的基本挑战。我们的方法包括:(1)通过张量表示的结构融合,保留了手写数据的多维性;(2)通过Tucker分解的特征级融合,在保留判别信息的同时实现了80%的参数缩减;(3)通过我们提出的可转移源域检测算法,有选择性地整合了相关领域的相关知识,实现了知识融合。(4)决策层融合,采用两阶段转移-债务机制,降低负转移风险。在DARWIN数据集上的实验表明,我们的迁移学习方法达到了93.33%的准确率和99.10%的灵敏度,大大优于现有的基于手写的AD检测方法(目前报道的准确率为88.29%,灵敏度为90.28%)。该框架在小样本场景中表现出出色的鲁棒性,仅用10%的训练数据就保持了87.50%的准确率。我们的综合分析显示,运动特征的重要性得分为35.3%,而时间特征的重要性得分为25.7%,其中总时间(9.4%)成为时间类别中的关键标记。该框架提出了一种有希望的非侵入性方法,用于老年人群的早期阿尔茨海默氏症检测,具有促进早期干预和大幅降低医疗成本的潜力。
{"title":"Multi-source information fusion through tucker tensor decomposition-based transfer learning for handwriting-Based Alzheimer's disease detection","authors":"Yao Yao ,&nbsp;Zhuoxi Yu ,&nbsp;Dehui Wang ,&nbsp;Chengzhe Wang ,&nbsp;Congting Sun","doi":"10.1016/j.inffus.2025.104112","DOIUrl":"10.1016/j.inffus.2025.104112","url":null,"abstract":"<div><div>With Alzheimer’s disease affecting approximately 50 million people globally, early detection has emerged as a critical public health priority in aging societies. This paper proposes a novel multi-level information fusion framework for handwriting-based Alzheimer’s disease detection, addressing the fundamental challenges of data scarcity and high-dimensional feature representation. Our approach integrates: (1) structural fusion through tensor representation preserving the multi-dimensional nature of handwriting data, (2) feature-level fusion via Tucker decomposition achieving 80% parameter reduction while maintaining discriminative information, (3) knowledge fusion through our proposed transferable source domain detection algorithm that selectively integrates relevant knowledge from related domains, and (4) decision-level fusion with a two-stage transfer-debias mechanism that mitigates negative transfer risks. Experiments on the DARWIN dataset demonstrate that our transfer learning approach achieves 93.33% accuracy and 99.10% sensitivity, substantially outperforming existing handwriting-based AD detection methods (best reported: 88.29% accuracy, 90.28% sensitivity). The framework exhibits exceptional robustness in small sample scenarios, maintaining 87.50% accuracy with just 10% of the training data. Our comprehensive analysis reveals kinematic features with an importance score of 35.3%, while temporal features collectively contribute 25.7%—among which total time (9.4%) emerges as a key marker within the temporal category. The proposed framework presents a promising non-invasive approach for early Alzheimer’s detection in aging populations, with the potential to facilitate earlier intervention and substantial healthcare cost reductions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104112"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HFPN: Hierarchical fusion and prediction network with multi-level cross-modality relation learning for audio-visual event localization HFPN:基于多层次跨模态关系学习的分层融合预测网络
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.inffus.2025.104111
Pufen Zhang , Lei Jia , Jiaxiang Wang , Meng Wan , Sijie Chang , Tianle Zhang , Peng Shi
Audio-visual event localization (AVEL) task needs to fuse audio-visual modalities via mining their cross-modality relation (CMR). However, existing AVEL works encounter several challenges in CMR learning: (a) The event-unrelated visual regions are not filtered when learning the region-level CMR; (b) The segment-level CMR is modeled in a one-to-one way, ignoring the cross-modality locality context correlation; (c) The holistic semantics of audio and visual tracks of event are consistent, but such a track-level CMR is not explored; (d) The low- and middle-level visual semantics are ignored in existing fusion and CMR learning strategies. To address these issues, a Hierarchical Fusion and Prediction Network (HFPN) with Multi-level Cross-modality Relation Learning Framework (MCRLF) is proposed. Specifically, for challenge (a), MCRLF proposes an audio-adaptive region filter to dynamically filter out event-irrelevant image regions according to event audio. To deal with challenge (b), MCRLF designs a bilateral locality context attention, which captures the cross-modality locality context correlation via convolution windows to guide segment-level CMR learning. For challenge (c), MCRLF introduces a novel dual-track alignment loss to achieve the whole semantic alignment on the audio and visual tracks of event. Finally, to tackle challenge (d), HFPN uses MCRLF as unified fusion framework to hierarchically fuse audio signals with the low-, middle- and high-level visual features, obtaining comprehensive semantics for event prediction. With modest model complexity, HFPN achieves the state-of-the-art results on AVE (84.8 % and 80.2 %) and VGGSound-AVEL100k (67.2 % and 62.7 %) benchmarks under both fully- and weakly-supervised settings, it offers a significant reference for practical application.
视听事件定位(AVEL)任务需要通过挖掘跨模态关系(CMR)来融合视听模态。然而,现有的AVEL工作在CMR学习中遇到了几个挑战:(a)在学习区域级CMR时没有过滤事件无关的视觉区域;(b)段级CMR以一对一的方式建模,忽略了跨模态的局部上下文相关性;(c)事件的视听轨迹的整体语义是一致的,但没有探索这种轨迹级的CMR;(d)在现有的融合和CMR学习策略中忽略了中低层次视觉语义。为了解决这些问题,提出了一种多层次跨模态关系学习框架(MCRLF)的分层融合预测网络(HFPN)。具体来说,对于挑战(a), MCRLF提出了一种音频自适应区域滤波器,根据事件音频动态滤除与事件无关的图像区域。为了应对挑战(b), MCRLF设计了双边局部上下文关注,通过卷积窗口捕获跨模态局部上下文相关性,以指导分段级CMR学习。对于挑战(c), MCRLF引入了一种新的双轨对齐损失,以实现事件音视频轨道的全语义对齐。最后,为了解决挑战(d), HFPN采用MCRLF作为统一融合框架,对音频信号与低、中、高级视觉特征进行分层融合,获得用于事件预测的综合语义。在适度的模型复杂度下,HFPN在AVE(84.8 %和80.2 %)和VGGSound-AVEL100k(67.2 %和62.7 %)基准测试中,在完全监督和弱监督设置下都取得了最先进的结果,为实际应用提供了重要的参考。
{"title":"HFPN: Hierarchical fusion and prediction network with multi-level cross-modality relation learning for audio-visual event localization","authors":"Pufen Zhang ,&nbsp;Lei Jia ,&nbsp;Jiaxiang Wang ,&nbsp;Meng Wan ,&nbsp;Sijie Chang ,&nbsp;Tianle Zhang ,&nbsp;Peng Shi","doi":"10.1016/j.inffus.2025.104111","DOIUrl":"10.1016/j.inffus.2025.104111","url":null,"abstract":"<div><div>Audio-visual event localization (AVEL) task needs to fuse audio-visual modalities via mining their cross-modality relation (CMR). However, existing AVEL works encounter several challenges in CMR learning: (a) The event-unrelated visual regions are not filtered when learning the region-level CMR; (b) The segment-level CMR is modeled in a one-to-one way, ignoring the cross-modality locality context correlation; (c) The holistic semantics of audio and visual tracks of event are consistent, but such a track-level CMR is not explored; (d) The low- and middle-level visual semantics are ignored in existing fusion and CMR learning strategies. To address these issues, a Hierarchical Fusion and Prediction Network (HFPN) with Multi-level Cross-modality Relation Learning Framework (MCRLF) is proposed. Specifically, for challenge (a), MCRLF proposes an audio-adaptive region filter to dynamically filter out event-irrelevant image regions according to event audio. To deal with challenge (b), MCRLF designs a bilateral locality context attention, which captures the cross-modality locality context correlation via convolution windows to guide segment-level CMR learning. For challenge (c), MCRLF introduces a novel dual-track alignment loss to achieve the whole semantic alignment on the audio and visual tracks of event. Finally, to tackle challenge (d), HFPN uses MCRLF as unified fusion framework to hierarchically fuse audio signals with the low-, middle- and high-level visual features, obtaining comprehensive semantics for event prediction. With modest model complexity, HFPN achieves the state-of-the-art results on AVE (84.8 % and 80.2 %) and VGGSound-AVEL100k (67.2 % and 62.7 %) benchmarks under both fully- and weakly-supervised settings, it offers a significant reference for practical application.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104111"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Internet meme on social media: A comprehensive review and new perspectives 社交媒体中的网络模因:综合回顾与新视角
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-29 DOI: 10.1016/j.inffus.2025.104102
Bingbing Wang , Jingjie Lin , Zhixin Bai , Zihan Wang , Shengzhe Sun , Zhengda Jin , Zhiyuan Wen , Geng Tu , Jing Li , Erik Cambria , Ruifeng Xu
Internet memes have become a dominant yet complex form of online communication, spurring a rapid growth of computational research. However, existing surveys remain largely confined to narrow classification tasks and fail to reflect the paradigm shift introduced by Multimodal Large Language Models (MLLMs). To address this gap, we introduce the TriR Framework, comprising Redefinition, Reconsolidation, and Revolution. Within this framework, we redefine the research scope through a taxonomy of higher-order cognitive tasks for meme comprehension, reconsolidate fragmented methodological progress around the unique capabilities of MLLMs, and articulate a trajectory that highlights key challenges and opportunities for advancing compositional and inferential modeling. By offering this structured perspective, the survey anchors the current state of the field while providing a systematic guide for its future development, fostering research that is computationally rigorous, empirically grounded, and ethically responsible.
网络模因已经成为一种占主导地位但又复杂的在线交流形式,刺激了计算研究的快速发展。然而,现有的调查仍然主要局限于狭窄的分类任务,未能反映多模态大语言模型(Multimodal Large Language Models, MLLMs)带来的范式转变。为了解决这一差距,我们引入了TriR框架,包括重新定义,重新整合和革命。在此框架内,我们通过模因理解的高阶认知任务分类法重新定义了研究范围,围绕mllm的独特功能重新整合了支离破碎的方法进展,并阐明了强调推进组成和推理建模的关键挑战和机遇的轨迹。通过提供这种结构化的视角,该调查锚定了该领域的现状,同时为其未来发展提供了系统的指导,促进了计算严谨、经验基础和道德负责的研究。
{"title":"Internet meme on social media: A comprehensive review and new perspectives","authors":"Bingbing Wang ,&nbsp;Jingjie Lin ,&nbsp;Zhixin Bai ,&nbsp;Zihan Wang ,&nbsp;Shengzhe Sun ,&nbsp;Zhengda Jin ,&nbsp;Zhiyuan Wen ,&nbsp;Geng Tu ,&nbsp;Jing Li ,&nbsp;Erik Cambria ,&nbsp;Ruifeng Xu","doi":"10.1016/j.inffus.2025.104102","DOIUrl":"10.1016/j.inffus.2025.104102","url":null,"abstract":"<div><div>Internet memes have become a dominant yet complex form of online communication, spurring a rapid growth of computational research. However, existing surveys remain largely confined to narrow classification tasks and fail to reflect the paradigm shift introduced by Multimodal Large Language Models (MLLMs). To address this gap, we introduce the TriR Framework, comprising Redefinition, Reconsolidation, and Revolution. Within this framework, we redefine the research scope through a taxonomy of higher-order cognitive tasks for meme comprehension, reconsolidate fragmented methodological progress around the unique capabilities of MLLMs, and articulate a trajectory that highlights key challenges and opportunities for advancing compositional and inferential modeling. By offering this structured perspective, the survey anchors the current state of the field while providing a systematic guide for its future development, fostering research that is computationally rigorous, empirically grounded, and ethically responsible.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104102"},"PeriodicalIF":15.5,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1