Pub Date : 2026-01-02DOI: 10.1016/j.inffus.2025.104118
Haitao Wang , Aojie Luo , Wenchao Xu , Haozhao Wang , Yichen Li , Yining Qi , Rui Zhang , Ruixuan Li
Federated graph learning excels in learning graph-structured data that are distributed across multiple clients. However, the partition of graph data results in each client only possessing a subgraph, lacking its neighbor nodes, which significantly degrades accuracy. Although exchanging original nodes can address this issue, it requires interaction with a remote server, not only causing significant communication delays but also leaking data privacy. To tackle this, this paper proposes an edge-server-assisted federated graph learning approach, namely FedEGL, which aggregates and exchanges intermediate features of approximated nodes through a third-party edge server, performing cross-client feature alignment and dynamic weighted aggregation while dynamically allocating privacy budgets with adaptive differential privacy to preserve node privacy. Additionally, differential privacy is introduced to protect the privacy of approximated node features by dynamically allocating privacy budgets. Experimental results show that our method achieves accuracy close to that in centralized settings, with the classification accuracy improved by up to 8% compared to the latest baseline. This method can improve model accuracy while protecting privacy, providing an effective solution to the subgraph partitioning problem in federated graph learning.
{"title":"FedEGL: Edge-assisted federated graph learning","authors":"Haitao Wang , Aojie Luo , Wenchao Xu , Haozhao Wang , Yichen Li , Yining Qi , Rui Zhang , Ruixuan Li","doi":"10.1016/j.inffus.2025.104118","DOIUrl":"10.1016/j.inffus.2025.104118","url":null,"abstract":"<div><div>Federated graph learning excels in learning graph-structured data that are distributed across multiple clients. However, the partition of graph data results in each client only possessing a subgraph, lacking its neighbor nodes, which significantly degrades accuracy. Although exchanging original nodes can address this issue, it requires interaction with a remote server, not only causing significant communication delays but also leaking data privacy. To tackle this, this paper proposes an edge-server-assisted federated graph learning approach, namely FedEGL, which aggregates and exchanges intermediate features of <em>approximated</em> nodes through a third-party edge server, performing cross-client feature alignment and dynamic weighted aggregation while dynamically allocating privacy budgets with adaptive differential privacy to preserve node privacy. Additionally, differential privacy is introduced to protect the privacy of approximated node features by dynamically allocating privacy budgets. Experimental results show that our method achieves accuracy close to that in centralized settings, with the classification accuracy improved by up to 8% compared to the latest baseline. This method can improve model accuracy while protecting privacy, providing an effective solution to the subgraph partitioning problem in federated graph learning.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104118"},"PeriodicalIF":15.5,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02DOI: 10.1016/j.inffus.2025.104110
Zhenhao Wang, Tian Tian
Multispectral object detection continues to face significant challenges in achieving a balanced optimization between accuracy and efficiency. Most existing approaches rely heavily on global modeling, which, although capable of integrating multi-band information, incurs substantial computational overhead and fails to fully exploit the spatial correlations across spectral bands. To address this issue, this paper introduces a convolutional architecture-based region feature computation mechanism that leverages the inherent advantage of convolutional operations in preserving spatial structure, enabling spatial cues to be fully retained during feature representation learning and explicitly incorporated into multispectral feature interaction. Meanwhile, by reconstructing global attention computation into localized regional modeling, the proposed method markedly reduces computational cost while maintaining effective feature fusion, thereby facilitating a lightweight architectural design. Experimental results demonstrate that the proposed module achieves the lowest computational overhead while improving mAP@50 by 1.97% and 1.66% on the DroneVehicle and VEDAI remote-sensing datasets, respectively, compared with state-of-the-art methods. Moreover, it exhibits strong applicability on the pedestrian detection datasets FLIR and LLVIP. The code is available https://github.com/wzh326/LMFFM_CARFCOM.git.
{"title":"Regional defeats global: An efficient regional feature fusion via convolutional architecture for multispectral object detection","authors":"Zhenhao Wang, Tian Tian","doi":"10.1016/j.inffus.2025.104110","DOIUrl":"10.1016/j.inffus.2025.104110","url":null,"abstract":"<div><div>Multispectral object detection continues to face significant challenges in achieving a balanced optimization between accuracy and efficiency. Most existing approaches rely heavily on global modeling, which, although capable of integrating multi-band information, incurs substantial computational overhead and fails to fully exploit the spatial correlations across spectral bands. To address this issue, this paper introduces a convolutional architecture-based region feature computation mechanism that leverages the inherent advantage of convolutional operations in preserving spatial structure, enabling spatial cues to be fully retained during feature representation learning and explicitly incorporated into multispectral feature interaction. Meanwhile, by reconstructing global attention computation into localized regional modeling, the proposed method markedly reduces computational cost while maintaining effective feature fusion, thereby facilitating a lightweight architectural design. Experimental results demonstrate that the proposed module achieves the lowest computational overhead while improving mAP@50 by 1.97% and 1.66% on the DroneVehicle and VEDAI remote-sensing datasets, respectively, compared with state-of-the-art methods. Moreover, it exhibits strong applicability on the pedestrian detection datasets FLIR and LLVIP. The code is available <span><span>https://github.com/wzh326/LMFFM_CARFCOM.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104110"},"PeriodicalIF":15.5,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1016/j.inffus.2025.104109
Xiangpeng Bi , Wenjian Ma , Huasen Jiang , Qing Cai , Jie Nie , Zhiqiang Wei , Shugang Zhang
Identifying multi-type drug-drug interactions (DDIs) enables more precise assessment of drug safety risks and provides targeted guidance for combination therapy, making it a critical task in pharmacology. Given it can directly integrate diverse biomedical information and effectively model the intricate mechanisms underlying drug interaction, knowledge graph (KG)-based approaches have emerged for predicting DDIs. Recent advances have shown great promise in this regard; however, existing solutions still overlook three critical issues: 1) neglect of information sparsity, 2) neglect of polyadic interactions, and 3) lack of fusion paradigm, which severely hinder the comprehensive identification and understanding of drug interaction patterns. To address these issues, we introduce a Bi-Semantic encoDer-dRiven knowledge sUbGraph representation learning framework (Bi-SemDRUG) for multi-type DDI prediction. Bi-SemDRUG proposes a multi-view knowledge subgraph partitioning strategy to extract drug-related refined topological structures from large-scale knowledge graphs, thereby reducing the interference of irrelevant information. Furthermore, Bi-SemDRUG incorporates a bi-semantic subgraph encoder that effectively uncovers multi-order semantic relationships embedded within the knowledge subgraphs. Finally, we propose a general paradigm for information fusion to facilitate the integration of multi-level drug-related information. Exhaustive experiments on three benchmark datasets demonstrate that our proposed model achieves state-of-the-art performance compared to other baseline methods and exhibits good generalization in large-scale DDI prediction. Additionally, case studies emphasize its capacity to offer a more comprehensive insight into the underlying mechanisms of DDIs.
{"title":"Subgraph-focused biomedical knowledge embedding with bi-semantic encoder for multi-type drug-drug interaction prediction","authors":"Xiangpeng Bi , Wenjian Ma , Huasen Jiang , Qing Cai , Jie Nie , Zhiqiang Wei , Shugang Zhang","doi":"10.1016/j.inffus.2025.104109","DOIUrl":"10.1016/j.inffus.2025.104109","url":null,"abstract":"<div><div>Identifying multi-type drug-drug interactions (DDIs) enables more precise assessment of drug safety risks and provides targeted guidance for combination therapy, making it a critical task in pharmacology. Given it can directly integrate diverse biomedical information and effectively model the intricate mechanisms underlying drug interaction, knowledge graph (KG)-based approaches have emerged for predicting DDIs. Recent advances have shown great promise in this regard; however, existing solutions still overlook three critical issues: 1) neglect of information sparsity, 2) neglect of polyadic interactions, and 3) lack of fusion paradigm, which severely hinder the comprehensive identification and understanding of drug interaction patterns. To address these issues, we introduce a <strong>Bi-Sem</strong>antic enco<strong>D</strong>er-d<strong>R</strong>iven knowledge s<strong>U</strong>b<strong>G</strong>raph representation learning framework (<strong>Bi-SemDRUG</strong>) for multi-type DDI prediction. Bi-SemDRUG proposes a multi-view knowledge subgraph partitioning strategy to extract drug-related refined topological structures from large-scale knowledge graphs, thereby reducing the interference of irrelevant information. Furthermore, Bi-SemDRUG incorporates a bi-semantic subgraph encoder that effectively uncovers multi-order semantic relationships embedded within the knowledge subgraphs. Finally, we propose a general paradigm for information fusion to facilitate the integration of multi-level drug-related information. Exhaustive experiments on three benchmark datasets demonstrate that our proposed model achieves state-of-the-art performance compared to other baseline methods and exhibits good generalization in large-scale DDI prediction. Additionally, case studies emphasize its capacity to offer a more comprehensive insight into the underlying mechanisms of DDIs.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104109"},"PeriodicalIF":15.5,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1016/j.inffus.2025.104107
Yibo Cui , Liang Xie , Yu Zhao , Jiawei Sun , Erwei Yin
Vision-Language Navigation (VLN) enables intelligent agents to navigate environments by integrating visual perception and natural language instructions, yet faces significant challenges due to the scarcity of fine-grained cross-modal alignment annotations. Existing datasets primarily focus on global instruction-trajectory matching, neglecting sub-instruction-level and entity-level alignments critical for accurate navigation action decision-making. To address this limitation, we propose FCA-NIG, a generative framework that automatically constructs navigation instructions with dual-level fine-grained cross-modal annotations. In this framework, an augmented trajectory is first divided into sub-trajectories, which are then processed through GLIP-based landmark detection, crafted instruction construction, OFA-Speaker based R2R-like instruction generation, and CLIP-powered entity selection, generating sub-instruction-trajectory pairs with entity-landmark annotations. Finally, these sub-pairs are aggregated to form a complete instruction-trajectory pair. The framework generates the FCA-R2R dataset, the first large-scale augmentation dataset featuring precise sub-instruction-sub-trajectory and entity-landmark alignments. Extensive experiments demonstrate that training with FCA-R2R significantly improves the performance of multiple state-of-the-art VLN agents, including SF, EnvDrop, RecBERT, HAMT, DUET, and BEVBERT. Incorporating sub-instruction-trajectory alignment enhances agents’ state awareness and decision accuracy, while entity-landmark alignment further boosts navigation performance and generalization. These results highlight the effectiveness of FCA-NIG in generating high-quality, scalable training data without manual annotation, advancing fine-grained cross-modal learning in complex navigation tasks.
{"title":"Generating vision-language navigation instructions incorporated fine-grained alignment annotations","authors":"Yibo Cui , Liang Xie , Yu Zhao , Jiawei Sun , Erwei Yin","doi":"10.1016/j.inffus.2025.104107","DOIUrl":"10.1016/j.inffus.2025.104107","url":null,"abstract":"<div><div>Vision-Language Navigation (VLN) enables intelligent agents to navigate environments by integrating visual perception and natural language instructions, yet faces significant challenges due to the scarcity of fine-grained cross-modal alignment annotations. Existing datasets primarily focus on global instruction-trajectory matching, neglecting sub-instruction-level and entity-level alignments critical for accurate navigation action decision-making. To address this limitation, we propose FCA-NIG, a generative framework that automatically constructs navigation instructions with dual-level fine-grained cross-modal annotations. In this framework, an augmented trajectory is first divided into sub-trajectories, which are then processed through GLIP-based landmark detection, crafted instruction construction, OFA-Speaker based R2R-like instruction generation, and CLIP-powered entity selection, generating sub-instruction-trajectory pairs with entity-landmark annotations. Finally, these sub-pairs are aggregated to form a complete instruction-trajectory pair. The framework generates the FCA-R2R dataset, the first large-scale augmentation dataset featuring precise sub-instruction-sub-trajectory and entity-landmark alignments. Extensive experiments demonstrate that training with FCA-R2R significantly improves the performance of multiple state-of-the-art VLN agents, including SF, EnvDrop, RecBERT, HAMT, DUET, and BEVBERT. Incorporating sub-instruction-trajectory alignment enhances agents’ state awareness and decision accuracy, while entity-landmark alignment further boosts navigation performance and generalization. These results highlight the effectiveness of FCA-NIG in generating high-quality, scalable training data without manual annotation, advancing fine-grained cross-modal learning in complex navigation tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104107"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Radio frequency (RF) fingerprinting is a promising localization technique for GPS-denied environments, yet it tends to suffer from a fundamental limitation: Poor generalization to previously unmapped areas. Traditional methods such as k-nearest neighbors (k-NN) perform well where data is available but may fail on unseen streets, limiting real-world deployment. Deep learning (DL) offers potential remedies by learning spatial-RF patterns that generalize, but requires far more training data than what simple real-world measurement campaigns can provide. In this paper, we investigate whether synthetic data can bridge this generalization gap. Using (i) a real-world dataset from Rome and (ii) NVIDIA’s open-source ray-tracing simulator Sionna, we generate synthetic datasets under varying realism and scale conditions. Specifically, we use Dataset A containing real-world measurements with real base stations (BS) and real signals, and create Dataset B using real BS locations but simulated signals, Dataset C with both simulated BS locations and signals, and Dataset B’ which represents an optimized version of Dataset B where BS parameters are calibrated via Gaussian Process to maximize signal correlation with Dataset A. Our evaluation reveals a pronounced sim-to-real gap: Models achieving 25m error on synthetic data degrade to 184m on real data. Nonetheless, pretraining on synthetic data reduces real-world localization error from 323m to 162m; a 50% improvement over real-only training. Notably, simulation fidelity proves more important than scale: A smaller calibrated dataset (53K samples) outperforms a larger uncalibrated one (274K samples). To further evaluate the generalization capabilities of the models, we conduct experiments on an unseen geographical region using a real-world dataset from Oslo. In the zero-shot setting, the models achieve a root mean square error (RMSE) of 132.2m on the entire dataset, and 61.5m on unseen streets after fine-tuning on Oslo data. While challenges remain before meeting more practical localization accuracy, this work provides a systematic study in the field of wireless communication of synthetic-to-real transfer in RF localization and highlights the value of simulation-aware pretraining for generalizing DL models to real-world scenarios.
{"title":"Bridging the sim-to-real gap in RF localization with large-scale synthetic pretraining","authors":"Armen Manukyan , Rafayel Mkrtchyan , Ararat Saribekyan , Theofanis P. Raptis , Hrant Khachatrian","doi":"10.1016/j.inffus.2025.104104","DOIUrl":"10.1016/j.inffus.2025.104104","url":null,"abstract":"<div><div>Radio frequency (RF) fingerprinting is a promising localization technique for GPS-denied environments, yet it tends to suffer from a fundamental limitation: Poor generalization to previously unmapped areas. Traditional methods such as <em>k</em>-nearest neighbors (<em>k</em>-NN) perform well where data is available but may fail on unseen streets, limiting real-world deployment. Deep learning (DL) offers potential remedies by learning spatial-RF patterns that generalize, but requires far more training data than what simple real-world measurement campaigns can provide. In this paper, we investigate whether synthetic data can bridge this generalization gap. Using (i) a real-world dataset from Rome and (ii) NVIDIA’s open-source ray-tracing simulator Sionna, we generate synthetic datasets under varying realism and scale conditions. Specifically, we use Dataset A containing real-world measurements with real base stations (BS) and real signals, and create Dataset B using real BS locations but simulated signals, Dataset C with both simulated BS locations and signals, and Dataset B’ which represents an optimized version of Dataset B where BS parameters are calibrated via Gaussian Process to maximize signal correlation with Dataset A. Our evaluation reveals a pronounced sim-to-real gap: Models achieving 25m error on synthetic data degrade to 184m on real data. Nonetheless, pretraining on synthetic data reduces real-world localization error from 323m to 162m; a 50% improvement over real-only training. Notably, simulation fidelity proves more important than scale: A smaller calibrated dataset (53K samples) outperforms a larger uncalibrated one (274K samples). To further evaluate the generalization capabilities of the models, we conduct experiments on an unseen geographical region using a real-world dataset from Oslo. In the zero-shot setting, the models achieve a root mean square error (RMSE) of 132.2m on the entire dataset, and 61.5m on unseen streets after fine-tuning on Oslo data. While challenges remain before meeting more practical localization accuracy, this work provides a systematic study in the field of wireless communication of synthetic-to-real transfer in RF localization and highlights the value of simulation-aware pretraining for generalizing DL models to real-world scenarios.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104104"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the challenge of scarce burn mark samples in power infrastructure inspection, we introduce the Insulator Burn Mark RGB-Point Cloud (IBMR) dataset, the first publicly available benchmark featuring RGB-point clouds with pixel-level annotations for both insulators and burn marks. To tackle the critical issue of severe class imbalance caused by the vast number of background points and the small size of burn marks, we propose a novel two-stage RGB-point cloud segmentation framework. This framework integrates DCCU-Sampling, an innovative downsampling algorithm that effectively suppresses background points while preserving critical structures of the targets, and BB-Backtracking, a geometric recovery method that reconstructs fine-grained burn mark details lost during downsampling process. Experimental results validate the framework’s effectiveness, achieving 81.21% mIoU with 32 training samples and 68.37% mIoU with only 14 samples. The dataset is publicly available at https://huggingface.co/datasets/Junqiu-Tang/IBMR.
{"title":"Dimensional compensation for small-sample and small-size insulator burn mark via RGB-point cloud fusion in power grid inspection","authors":"Junqiu Tang , Zhikang Yuan , Zixiang Wei , Shuojie Gao , Changyong Shen","doi":"10.1016/j.inffus.2025.104105","DOIUrl":"10.1016/j.inffus.2025.104105","url":null,"abstract":"<div><div>To address the challenge of scarce burn mark samples in power infrastructure inspection, we introduce the Insulator Burn Mark RGB-Point Cloud (IBMR) dataset, the first publicly available benchmark featuring RGB-point clouds with pixel-level annotations for both insulators and burn marks. To tackle the critical issue of severe class imbalance caused by the vast number of background points and the small size of burn marks, we propose a novel two-stage RGB-point cloud segmentation framework. This framework integrates DCCU-Sampling, an innovative downsampling algorithm that effectively suppresses background points while preserving critical structures of the targets, and BB-Backtracking, a geometric recovery method that reconstructs fine-grained burn mark details lost during downsampling process. Experimental results validate the framework’s effectiveness, achieving 81.21% mIoU with 32 training samples and 68.37% mIoU with only 14 samples. The dataset is publicly available at <span><span>https://huggingface.co/datasets/Junqiu-Tang/IBMR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104105"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1016/j.inffus.2025.104108
Kang Wang , Xuan Liang , Jinghua Xu , Shuyou Zhang , Jianrong Tan
Metal Additive Manufacturing (AM) has revolutionized the production of complex parts across various industries, yet ensuring consistent quality remains a significant challenge. This study addresses the critical problem of reliable and efficient anomaly detection in metal AM processes, which is essential for maintaining product quality and reducing costly post-production inspections. In this study, we propose a novel full-life-cycle generalization method, namely style-augmented large-scale vision model (SLVM), for anomaly detection in metal additive manufacturing. Our approach leverages the power of large-scale vision models and incorporates style-based augmentation techniques to enhance the detection of anomalies in AM processes. A pre-trained large-scale vision model serves as the backbone in SLVM, providing robust feature extraction capabilities essential for capturing intricate details in AM images. Building upon this foundation, a style augmentation module generates diverse stylized versions of input images, significantly improving the model’s generalization across different AM processes and materials. An anomaly detection head utilizes these style-augmented features to effectively identify and localize defects, completing the comprehensive approach to AM quality control. We evaluate our SLVM on multiple metal AM datasets, including laser powder bed fusion and binder jetting processes, demonstrating its superior performance compared to existing state-of-the-art methods. Our experiments show that SLVM achieves higher detection accuracy, better generalization across different AM processes, and improved robustness to variations in part geometry and material properties. The proposed SLVM exhibits a promising solution for enhancing quality control in metal AM, potentially reducing the need for costly post-production inspections and improving overall manufacturing efficiency.
{"title":"Style-augmented large-scale vision model with domain-generalized knowledge fusion for anomaly detection in powder bed additive manufacturing","authors":"Kang Wang , Xuan Liang , Jinghua Xu , Shuyou Zhang , Jianrong Tan","doi":"10.1016/j.inffus.2025.104108","DOIUrl":"10.1016/j.inffus.2025.104108","url":null,"abstract":"<div><div>Metal Additive Manufacturing (AM) has revolutionized the production of complex parts across various industries, yet ensuring consistent quality remains a significant challenge. This study addresses the critical problem of reliable and efficient anomaly detection in metal AM processes, which is essential for maintaining product quality and reducing costly post-production inspections. In this study, we propose a novel full-life-cycle generalization method, namely style-augmented large-scale vision model (SLVM), for anomaly detection in metal additive manufacturing. Our approach leverages the power of large-scale vision models and incorporates style-based augmentation techniques to enhance the detection of anomalies in AM processes. A pre-trained large-scale vision model serves as the backbone in SLVM, providing robust feature extraction capabilities essential for capturing intricate details in AM images. Building upon this foundation, a style augmentation module generates diverse stylized versions of input images, significantly improving the model’s generalization across different AM processes and materials. An anomaly detection head utilizes these style-augmented features to effectively identify and localize defects, completing the comprehensive approach to AM quality control. We evaluate our SLVM on multiple metal AM datasets, including laser powder bed fusion and binder jetting processes, demonstrating its superior performance compared to existing state-of-the-art methods. Our experiments show that SLVM achieves higher detection accuracy, better generalization across different AM processes, and improved robustness to variations in part geometry and material properties. The proposed SLVM exhibits a promising solution for enhancing quality control in metal AM, potentially reducing the need for costly post-production inspections and improving overall manufacturing efficiency.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104108"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1016/j.inffus.2025.104112
Yao Yao , Zhuoxi Yu , Dehui Wang , Chengzhe Wang , Congting Sun
With Alzheimer’s disease affecting approximately 50 million people globally, early detection has emerged as a critical public health priority in aging societies. This paper proposes a novel multi-level information fusion framework for handwriting-based Alzheimer’s disease detection, addressing the fundamental challenges of data scarcity and high-dimensional feature representation. Our approach integrates: (1) structural fusion through tensor representation preserving the multi-dimensional nature of handwriting data, (2) feature-level fusion via Tucker decomposition achieving 80% parameter reduction while maintaining discriminative information, (3) knowledge fusion through our proposed transferable source domain detection algorithm that selectively integrates relevant knowledge from related domains, and (4) decision-level fusion with a two-stage transfer-debias mechanism that mitigates negative transfer risks. Experiments on the DARWIN dataset demonstrate that our transfer learning approach achieves 93.33% accuracy and 99.10% sensitivity, substantially outperforming existing handwriting-based AD detection methods (best reported: 88.29% accuracy, 90.28% sensitivity). The framework exhibits exceptional robustness in small sample scenarios, maintaining 87.50% accuracy with just 10% of the training data. Our comprehensive analysis reveals kinematic features with an importance score of 35.3%, while temporal features collectively contribute 25.7%—among which total time (9.4%) emerges as a key marker within the temporal category. The proposed framework presents a promising non-invasive approach for early Alzheimer’s detection in aging populations, with the potential to facilitate earlier intervention and substantial healthcare cost reductions.
{"title":"Multi-source information fusion through tucker tensor decomposition-based transfer learning for handwriting-Based Alzheimer's disease detection","authors":"Yao Yao , Zhuoxi Yu , Dehui Wang , Chengzhe Wang , Congting Sun","doi":"10.1016/j.inffus.2025.104112","DOIUrl":"10.1016/j.inffus.2025.104112","url":null,"abstract":"<div><div>With Alzheimer’s disease affecting approximately 50 million people globally, early detection has emerged as a critical public health priority in aging societies. This paper proposes a novel multi-level information fusion framework for handwriting-based Alzheimer’s disease detection, addressing the fundamental challenges of data scarcity and high-dimensional feature representation. Our approach integrates: (1) structural fusion through tensor representation preserving the multi-dimensional nature of handwriting data, (2) feature-level fusion via Tucker decomposition achieving 80% parameter reduction while maintaining discriminative information, (3) knowledge fusion through our proposed transferable source domain detection algorithm that selectively integrates relevant knowledge from related domains, and (4) decision-level fusion with a two-stage transfer-debias mechanism that mitigates negative transfer risks. Experiments on the DARWIN dataset demonstrate that our transfer learning approach achieves 93.33% accuracy and 99.10% sensitivity, substantially outperforming existing handwriting-based AD detection methods (best reported: 88.29% accuracy, 90.28% sensitivity). The framework exhibits exceptional robustness in small sample scenarios, maintaining 87.50% accuracy with just 10% of the training data. Our comprehensive analysis reveals kinematic features with an importance score of 35.3%, while temporal features collectively contribute 25.7%—among which total time (9.4%) emerges as a key marker within the temporal category. The proposed framework presents a promising non-invasive approach for early Alzheimer’s detection in aging populations, with the potential to facilitate earlier intervention and substantial healthcare cost reductions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104112"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1016/j.inffus.2025.104111
Pufen Zhang , Lei Jia , Jiaxiang Wang , Meng Wan , Sijie Chang , Tianle Zhang , Peng Shi
Audio-visual event localization (AVEL) task needs to fuse audio-visual modalities via mining their cross-modality relation (CMR). However, existing AVEL works encounter several challenges in CMR learning: (a) The event-unrelated visual regions are not filtered when learning the region-level CMR; (b) The segment-level CMR is modeled in a one-to-one way, ignoring the cross-modality locality context correlation; (c) The holistic semantics of audio and visual tracks of event are consistent, but such a track-level CMR is not explored; (d) The low- and middle-level visual semantics are ignored in existing fusion and CMR learning strategies. To address these issues, a Hierarchical Fusion and Prediction Network (HFPN) with Multi-level Cross-modality Relation Learning Framework (MCRLF) is proposed. Specifically, for challenge (a), MCRLF proposes an audio-adaptive region filter to dynamically filter out event-irrelevant image regions according to event audio. To deal with challenge (b), MCRLF designs a bilateral locality context attention, which captures the cross-modality locality context correlation via convolution windows to guide segment-level CMR learning. For challenge (c), MCRLF introduces a novel dual-track alignment loss to achieve the whole semantic alignment on the audio and visual tracks of event. Finally, to tackle challenge (d), HFPN uses MCRLF as unified fusion framework to hierarchically fuse audio signals with the low-, middle- and high-level visual features, obtaining comprehensive semantics for event prediction. With modest model complexity, HFPN achieves the state-of-the-art results on AVE (84.8 % and 80.2 %) and VGGSound-AVEL100k (67.2 % and 62.7 %) benchmarks under both fully- and weakly-supervised settings, it offers a significant reference for practical application.
{"title":"HFPN: Hierarchical fusion and prediction network with multi-level cross-modality relation learning for audio-visual event localization","authors":"Pufen Zhang , Lei Jia , Jiaxiang Wang , Meng Wan , Sijie Chang , Tianle Zhang , Peng Shi","doi":"10.1016/j.inffus.2025.104111","DOIUrl":"10.1016/j.inffus.2025.104111","url":null,"abstract":"<div><div>Audio-visual event localization (AVEL) task needs to fuse audio-visual modalities via mining their cross-modality relation (CMR). However, existing AVEL works encounter several challenges in CMR learning: (a) The event-unrelated visual regions are not filtered when learning the region-level CMR; (b) The segment-level CMR is modeled in a one-to-one way, ignoring the cross-modality locality context correlation; (c) The holistic semantics of audio and visual tracks of event are consistent, but such a track-level CMR is not explored; (d) The low- and middle-level visual semantics are ignored in existing fusion and CMR learning strategies. To address these issues, a Hierarchical Fusion and Prediction Network (HFPN) with Multi-level Cross-modality Relation Learning Framework (MCRLF) is proposed. Specifically, for challenge (a), MCRLF proposes an audio-adaptive region filter to dynamically filter out event-irrelevant image regions according to event audio. To deal with challenge (b), MCRLF designs a bilateral locality context attention, which captures the cross-modality locality context correlation via convolution windows to guide segment-level CMR learning. For challenge (c), MCRLF introduces a novel dual-track alignment loss to achieve the whole semantic alignment on the audio and visual tracks of event. Finally, to tackle challenge (d), HFPN uses MCRLF as unified fusion framework to hierarchically fuse audio signals with the low-, middle- and high-level visual features, obtaining comprehensive semantics for event prediction. With modest model complexity, HFPN achieves the state-of-the-art results on AVE (84.8 % and 80.2 %) and VGGSound-AVEL100k (67.2 % and 62.7 %) benchmarks under both fully- and weakly-supervised settings, it offers a significant reference for practical application.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104111"},"PeriodicalIF":15.5,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1016/j.inffus.2025.104102
Bingbing Wang , Jingjie Lin , Zhixin Bai , Zihan Wang , Shengzhe Sun , Zhengda Jin , Zhiyuan Wen , Geng Tu , Jing Li , Erik Cambria , Ruifeng Xu
Internet memes have become a dominant yet complex form of online communication, spurring a rapid growth of computational research. However, existing surveys remain largely confined to narrow classification tasks and fail to reflect the paradigm shift introduced by Multimodal Large Language Models (MLLMs). To address this gap, we introduce the TriR Framework, comprising Redefinition, Reconsolidation, and Revolution. Within this framework, we redefine the research scope through a taxonomy of higher-order cognitive tasks for meme comprehension, reconsolidate fragmented methodological progress around the unique capabilities of MLLMs, and articulate a trajectory that highlights key challenges and opportunities for advancing compositional and inferential modeling. By offering this structured perspective, the survey anchors the current state of the field while providing a systematic guide for its future development, fostering research that is computationally rigorous, empirically grounded, and ethically responsible.
网络模因已经成为一种占主导地位但又复杂的在线交流形式,刺激了计算研究的快速发展。然而,现有的调查仍然主要局限于狭窄的分类任务,未能反映多模态大语言模型(Multimodal Large Language Models, MLLMs)带来的范式转变。为了解决这一差距,我们引入了TriR框架,包括重新定义,重新整合和革命。在此框架内,我们通过模因理解的高阶认知任务分类法重新定义了研究范围,围绕mllm的独特功能重新整合了支离破碎的方法进展,并阐明了强调推进组成和推理建模的关键挑战和机遇的轨迹。通过提供这种结构化的视角,该调查锚定了该领域的现状,同时为其未来发展提供了系统的指导,促进了计算严谨、经验基础和道德负责的研究。
{"title":"Internet meme on social media: A comprehensive review and new perspectives","authors":"Bingbing Wang , Jingjie Lin , Zhixin Bai , Zihan Wang , Shengzhe Sun , Zhengda Jin , Zhiyuan Wen , Geng Tu , Jing Li , Erik Cambria , Ruifeng Xu","doi":"10.1016/j.inffus.2025.104102","DOIUrl":"10.1016/j.inffus.2025.104102","url":null,"abstract":"<div><div>Internet memes have become a dominant yet complex form of online communication, spurring a rapid growth of computational research. However, existing surveys remain largely confined to narrow classification tasks and fail to reflect the paradigm shift introduced by Multimodal Large Language Models (MLLMs). To address this gap, we introduce the TriR Framework, comprising Redefinition, Reconsolidation, and Revolution. Within this framework, we redefine the research scope through a taxonomy of higher-order cognitive tasks for meme comprehension, reconsolidate fragmented methodological progress around the unique capabilities of MLLMs, and articulate a trajectory that highlights key challenges and opportunities for advancing compositional and inferential modeling. By offering this structured perspective, the survey anchors the current state of the field while providing a systematic guide for its future development, fostering research that is computationally rigorous, empirically grounded, and ethically responsible.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104102"},"PeriodicalIF":15.5,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}