Information Fusion最新文献

Robust privacy-preserving aggregation against poisoning attacks for secure distributed data fusion

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-24 DOI: 10.1016/j.inffus.2025.103223

Chao Huang , Yanqing Yao , Xiaojun Zhang

Privacy-preserving data aggregation could be well applied in federated learning, enabling an aggregator to learn a specified fusion statistics over private data held by clients. Besides, robustness is a critical requirement in federated learning, since a malicious client is able to readily launch poisoning attacks by submitting artificial and malformed model updates to central server. To this end, we present a robust privacy-preserving data aggregation protocol based on distributed trust model, which achieves privacy protection by three-party computation based on replicated secret sharing with honest-majority. The protocol also achieves robustness by securely computing an input validation strategy called norm bounding, including

ℓ_{\infty}

-norm and

ℓ_{2}

-norm bounding, which has been proven effective to defend against poisoning attacks. Following the best practice of hybrid protocol design, we exploit both Boolean sharing and arithmetic sharing to efficiently enforce

ℓ_{\infty}

and

ℓ_{2}

-norm bounding respectively. Additionally, we propose a novel share conversion protocol converting Boolean shares into arithmetic ones, which is of independent interest and could be used in other protocols. We provide security analysis of the protocol based on standard simulation paradigm and modular composition theorem, reaching the conclusion that presented protocol achieves secure aggregation functionality with norm bounding with computational security in the presence of one static semi-honest server. Comprehensive efficiency analysis and empirical experiments demonstrate its superiority compared with related protocols.

{"title":"Robust privacy-preserving aggregation against poisoning attacks for secure distributed data fusion","authors":"Chao Huang , Yanqing Yao , Xiaojun Zhang","doi":"10.1016/j.inffus.2025.103223","DOIUrl":"10.1016/j.inffus.2025.103223","url":null,"abstract":"<div><div>Privacy-preserving data aggregation could be well applied in federated learning, enabling an aggregator to learn a specified fusion statistics over private data held by clients. Besides, robustness is a critical requirement in federated learning, since a malicious client is able to readily launch poisoning attacks by submitting artificial and malformed model updates to central server. To this end, we present a robust privacy-preserving data aggregation protocol based on distributed trust model, which achieves privacy protection by three-party computation based on replicated secret sharing with honest-majority. The protocol also achieves robustness by securely computing an input validation strategy called norm bounding, including <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mi>∞</mi></mrow></msub></math></span>-norm and <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm bounding, which has been proven effective to defend against poisoning attacks. Following the best practice of hybrid protocol design, we exploit both Boolean sharing and arithmetic sharing to efficiently enforce <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mi>∞</mi></mrow></msub></math></span> and <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm bounding respectively. Additionally, we propose a novel share conversion protocol converting Boolean shares into arithmetic ones, which is of independent interest and could be used in other protocols. We provide security analysis of the protocol based on standard simulation paradigm and modular composition theorem, reaching the conclusion that presented protocol achieves secure aggregation functionality with norm bounding with computational security in the presence of one static semi-honest server. Comprehensive efficiency analysis and empirical experiments demonstrate its superiority compared with related protocols.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103223"},"PeriodicalIF":14.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PJPFL: Personalized federated learning with privacy preservation based on sample similarity

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-24 DOI: 10.1016/j.inffus.2025.103221

Hongming Zhang, Qianqian Su

Federated learning (FL) is a distributed machine learning paradigm that. However, existing approaches struggle to achieve both privacy protection and effective personalization. Moreover, existing methods they assume users will always adopt personalized updates, overlooking the need for flexible control—allowing users to decide whether to personalize based on their specific requirements. In this paper, we propose PJPFL, a novel personalized federated learning method that enables a flexible trade-off between the global model’s generalization ability and personalized updates derived from local data. By integrating private set intersection (PSI) and Jaccard similarity, PJPFL allows users to customize model updates based on their individual needs while preserving privacy. To further enhance security, we employ homomorphic encryption (HE) to protect model gradients and parameters from inference attacks, a known vulnerability in FL. Experimental results demonstrate that PJPFL significantly improves model adaptability to local data environments, outperforming both FedAvg and FedProx in personalized update scenarios without incurring additional computational or communication overhead.

{"title":"PJPFL: Personalized federated learning with privacy preservation based on sample similarity","authors":"Hongming Zhang, Qianqian Su","doi":"10.1016/j.inffus.2025.103221","DOIUrl":"10.1016/j.inffus.2025.103221","url":null,"abstract":"<div><div>Federated learning (FL) is a distributed machine learning paradigm that. However, existing approaches struggle to achieve both privacy protection and effective personalization. Moreover, existing methods they assume users will always adopt personalized updates, overlooking the need for flexible control—allowing users to decide whether to personalize based on their specific requirements. In this paper, we propose PJPFL, a novel personalized federated learning method that enables a flexible trade-off between the global model’s generalization ability and personalized updates derived from local data. By integrating private set intersection (PSI) and Jaccard similarity, PJPFL allows users to customize model updates based on their individual needs while preserving privacy. To further enhance security, we employ homomorphic encryption (HE) to protect model gradients and parameters from inference attacks, a known vulnerability in FL. Experimental results demonstrate that PJPFL significantly improves model adaptability to local data environments, outperforming both FedAvg and FedProx in personalized update scenarios without incurring additional computational or communication overhead.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103221"},"PeriodicalIF":14.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CLDM-MMNNs: Cross-layer defense mechanisms through multi-modal neural networks fusion for end-to-end cybersecurity—Issues, challenges, and future directions

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-24 DOI: 10.1016/j.inffus.2025.103222

Sijjad Ali , Jia Wang , Victor C.M. Leung , Farhan Bashir , Uzair Aslam Bhatti , Shuaib Ahmed Wadho , Mamoona Humayun

Cybersecurity threats have grown in complexity and scale, necessitating robust defense mechanisms that integrate multiple layers of network security. Multi-modal neural networks (MMNNs) have emerged as a powerful tool for addressing such challenges due to their ability to process and integrate heterogeneous data sources. This review provides an in-depth analysis of cross-layer defense mechanisms that leverage MMNNs for end-to-end cybersecurity. The study explores the foundational principles of MMNNs, their applications in intrusion detection, malware analysis, anomaly detection, and advanced persistent threat (APT) mitigation. The paper emphasizes the synergy between multi-modal data integration and neural network architectures, enabling real-time threat detection and adaptive response. By categorizing existing approaches and highlighting key advancements, this review outlines current limitations, including computational overhead and model interpretability, and suggests future research directions for developing efficient, scalable, and explainable MMNN-based defense systems.

{"title":"CLDM-MMNNs: Cross-layer defense mechanisms through multi-modal neural networks fusion for end-to-end cybersecurity—Issues, challenges, and future directions","authors":"Sijjad Ali , Jia Wang , Victor C.M. Leung , Farhan Bashir , Uzair Aslam Bhatti , Shuaib Ahmed Wadho , Mamoona Humayun","doi":"10.1016/j.inffus.2025.103222","DOIUrl":"10.1016/j.inffus.2025.103222","url":null,"abstract":"<div><div>Cybersecurity threats have grown in complexity and scale, necessitating robust defense mechanisms that integrate multiple layers of network security. Multi-modal neural networks (MMNNs) have emerged as a powerful tool for addressing such challenges due to their ability to process and integrate heterogeneous data sources. This review provides an in-depth analysis of cross-layer defense mechanisms that leverage MMNNs for end-to-end cybersecurity. The study explores the foundational principles of MMNNs, their applications in intrusion detection, malware analysis, anomaly detection, and advanced persistent threat (APT) mitigation. The paper emphasizes the synergy between multi-modal data integration and neural network architectures, enabling real-time threat detection and adaptive response. By categorizing existing approaches and highlighting key advancements, this review outlines current limitations, including computational overhead and model interpretability, and suggests future research directions for developing efficient, scalable, and explainable MMNN-based defense systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103222"},"PeriodicalIF":14.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ethically Responsible Decision Making for Anomaly Detection in Complex Driving Scenes

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-24 DOI: 10.1016/j.inffus.2025.103226

Liming Huang, Yulei Wu

The rise of machine and deep learning has revolutionized artificial intelligence (AI) across diverse domains. However, most AI research focuses on optimizing detection accuracy or decision-making precision for specific input data, often overlooking the integration of ethical considerations needed to address the complexities of real-world scenarios. Applications like autonomous driving require not only reliable data processing performance but also strict adherence to ethical principles that align with societal values. This paper introduces an Ethically Responsible Decision-Making (ER-DM) model, wherein ethical principles are mathematically formulated and integrated into the reinforcement learning (RL) framework. To address the challenges in operationalizing abstract ethical principles, we introduce a dual ethical paradigm based on Deontology and Consequentialism, enabling regulatory constraints in state transitions, policy networks and outcome evaluation in reward functions, respectively. Additionally, we propose a novel task, Ethically Responsible Anomaly Detection (ER-AD), which leverages enriched ethical scenario information to classify obstacles into four risk levels based on their ethical abnormality. The ER-DM model is systematically validated in complex driving scenarios through experiments, demonstrating at least a 6% improvement in decision-making accuracy compared to baseline models. Furthermore, by integrating the ER-DM model with deep learning segmentation models, we establish an end-to-end detection system, achieving significant enhancements in image-based anomaly detection tasks.

{"title":"Ethically Responsible Decision Making for Anomaly Detection in Complex Driving Scenes","authors":"Liming Huang, Yulei Wu","doi":"10.1016/j.inffus.2025.103226","DOIUrl":"10.1016/j.inffus.2025.103226","url":null,"abstract":"<div><div>The rise of machine and deep learning has revolutionized artificial intelligence (AI) across diverse domains. However, most AI research focuses on optimizing detection accuracy or decision-making precision for specific input data, often overlooking the integration of ethical considerations needed to address the complexities of real-world scenarios. Applications like autonomous driving require not only reliable data processing performance but also strict adherence to ethical principles that align with societal values. This paper introduces an Ethically Responsible Decision-Making (ER-DM) model, wherein ethical principles are mathematically formulated and integrated into the reinforcement learning (RL) framework. To address the challenges in operationalizing abstract ethical principles, we introduce a dual ethical paradigm based on Deontology and Consequentialism, enabling regulatory constraints in state transitions, policy networks and outcome evaluation in reward functions, respectively. Additionally, we propose a novel task, Ethically Responsible Anomaly Detection (ER-AD), which leverages enriched ethical scenario information to classify obstacles into four risk levels based on their ethical abnormality. The ER-DM model is systematically validated in complex driving scenarios through experiments, demonstrating at least a 6% improvement in decision-making accuracy compared to baseline models. Furthermore, by integrating the ER-DM model with deep learning segmentation models, we establish an end-to-end detection system, achieving significant enhancements in image-based anomaly detection tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103226"},"PeriodicalIF":14.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-24 DOI: 10.1016/j.inffus.2025.103218

Hang Chen , Chenxi Wang , Qing Wang , Jun Du , Sabato Marco Siniscalchi , Genshun Wan , Jia Pan , Huijun Ding

We have developed an innovative speech enhancement (SE) model backbone that utilizes cross-attention among spectrum, waveform and self-supervised learned representations (CA-SW-SSL) to integrate knowledge from diverse feature domains. The CA-SW-SSL model integrates the cross spectrum and waveform attention (CSWA) model to connect the spectrum and waveform branches, along with a dual-path cross-attention module to select outputs from different layers of the self-supervised learning (SSL) model. To handle the increased complexity of SSL integration, we introduce a bidirectional knowledge distillation (BiKD) framework for model compression. The proposed adaptive layered distance measure (ALDM) maximizes the Gaussian likelihood between clean and enhanced multi-level SSL features during the backward knowledge distillation (BKD) process. Meanwhile, in the forward process, the CA-SW-SSL model acts as a teacher, using the novel teacher–student Barlow Twins (TSBT) loss to guide the training of the CSWA student models, including both lite and tiny versions. Experiments on the DNS-Challenge and Voicebank+Demand datasets demonstrate that the CSWA-Lite+BiKD model outperforms existing joint spectrum-waveform methods and surpasses the state-of-the-art on the DNS-Challenge non-blind test set with half the computational load. Further, the CA-SW-SSL+BiKD model outperforms all CSWA models and current SSL-based methods.

{"title":"Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement","authors":"Hang Chen , Chenxi Wang , Qing Wang , Jun Du , Sabato Marco Siniscalchi , Genshun Wan , Jia Pan , Huijun Ding","doi":"10.1016/j.inffus.2025.103218","DOIUrl":"10.1016/j.inffus.2025.103218","url":null,"abstract":"<div><div>We have developed an innovative speech enhancement (SE) model backbone that utilizes cross-attention among spectrum, waveform and self-supervised learned representations (CA-SW-SSL) to integrate knowledge from diverse feature domains. The CA-SW-SSL model integrates the cross spectrum and waveform attention (CSWA) model to connect the spectrum and waveform branches, along with a dual-path cross-attention module to select outputs from different layers of the self-supervised learning (SSL) model. To handle the increased complexity of SSL integration, we introduce a bidirectional knowledge distillation (BiKD) framework for model compression. The proposed adaptive layered distance measure (ALDM) maximizes the Gaussian likelihood between clean and enhanced multi-level SSL features during the backward knowledge distillation (BKD) process. Meanwhile, in the forward process, the CA-SW-SSL model acts as a teacher, using the novel teacher–student Barlow Twins (TSBT) loss to guide the training of the CSWA student models, including both lite and tiny versions. Experiments on the DNS-Challenge and Voicebank+Demand datasets demonstrate that the CSWA-Lite+BiKD model outperforms existing joint spectrum-waveform methods and surpasses the state-of-the-art on the DNS-Challenge non-blind test set with half the computational load. Further, the CA-SW-SSL+BiKD model outperforms all CSWA models and current SSL-based methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103218"},"PeriodicalIF":14.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-modal prototype based multimodal federated learning under severely missing modality

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-24 DOI: 10.1016/j.inffus.2025.103219

Huy Q. Le , Chu Myaet Thwal , Yu Qiao , Ye Lin Tun , Minh N.H. Nguyen , Eui-Nam Huh , Choong Seon Hong

Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a global model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, significantly impacting the performance of global model. The occurrence of missing modalities in real-world applications, such as autonomous driving, often arises from factors like sensor failures, leading knowledge gaps during the training process. Specifically, the absence of a modality introduces misalignment during the local training phase, stemming from zero-filling in the case of clients with missing modalities. Consequently, achieving robust generalization in global model becomes imperative, especially when dealing with clients that have incomplete data. In this paper, we propose Multimodal Federated Cross Prototype Learning (MFCPL), a novel approach for MFL under severely missing modalities. Our MFCPL leverages the complete prototypes to provide diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism. Additionally, our approach introduces the cross-modal alignment to provide regularization for modality-specific features, thereby enhancing the overall performance, particularly in scenarios involving severely missing modalities. Through extensive experiments on four multimodal datasets, we demonstrate the effectiveness of MFCPL in mitigating the challenges of data heterogeneity and severely missing modalities while improving the overall performance and robustness of MFL.

{"title":"Cross-modal prototype based multimodal federated learning under severely missing modality","authors":"Huy Q. Le , Chu Myaet Thwal , Yu Qiao , Ye Lin Tun , Minh N.H. Nguyen , Eui-Nam Huh , Choong Seon Hong","doi":"10.1016/j.inffus.2025.103219","DOIUrl":"10.1016/j.inffus.2025.103219","url":null,"abstract":"<div><div>Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a global model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, significantly impacting the performance of global model. The occurrence of missing modalities in real-world applications, such as autonomous driving, often arises from factors like sensor failures, leading knowledge gaps during the training process. Specifically, the absence of a modality introduces misalignment during the local training phase, stemming from zero-filling in the case of clients with missing modalities. Consequently, achieving robust generalization in global model becomes imperative, especially when dealing with clients that have incomplete data. In this paper, we propose <strong>Multimodal Federated Cross Prototype Learning (MFCPL</strong>), a novel approach for MFL under severely missing modalities. Our MFCPL leverages the complete prototypes to provide diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism. Additionally, our approach introduces the cross-modal alignment to provide regularization for modality-specific features, thereby enhancing the overall performance, particularly in scenarios involving severely missing modalities. Through extensive experiments on four multimodal datasets, we demonstrate the effectiveness of MFCPL in mitigating the challenges of data heterogeneity and severely missing modalities while improving the overall performance and robustness of MFL.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103219"},"PeriodicalIF":14.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Denoising diffusion fusion network for semantic segmentation based on degradation analysis modeling with graph networks

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-23 DOI: 10.1016/j.inffus.2025.103205

Aiqing Fang , Ying Li , Jiang Long , Xiaodong Wang , Yangming Guo

Accurate semantic segmentation is critical for autonomous driving safety, yet real-world degradations severely compromise segmentation robustness, risking safety-critical failures. Current research primarily focuses on architectural innovations and optimization strategies to improve fusion quality. However, these approaches suffer from poor interpretability and inadequate adaptability in adverse weather conditions such as extreme illumination and noise. To address these issues, we propose a denoising diffusion fusion network for semantic segmentation based on degradation modeling. We first construct a multi-scale wavelet-domain fusion network to address the distribution characteristics of different degradations. Building on this decomposition, the denoising diffusion process and wavelet-domain fusion operations are combined to enhance fusion quality. Finally, we develop a denoising diffusion-optimized fusion loss function to guide parameter optimization while suppressing degradation artifacts. Extensive experiments on public datasets show that the proposed method achieves state-of-the-art performance with measurable gains: 42.8% improvement in edge integrity, 37.6% higher spatial frequency for texture preservation, and 39.8% reduction in noise artifacts. Through graph network analysis, we also reveal the interplay mechanisms among different degradations, various fusion quality assessment metrics, and semantic segmentation performance. These advancements exhibit superior robustness to degradations and enhance safety for real-world autonomous systems.

{"title":"Denoising diffusion fusion network for semantic segmentation based on degradation analysis modeling with graph networks","authors":"Aiqing Fang , Ying Li , Jiang Long , Xiaodong Wang , Yangming Guo","doi":"10.1016/j.inffus.2025.103205","DOIUrl":"10.1016/j.inffus.2025.103205","url":null,"abstract":"<div><div>Accurate semantic segmentation is critical for autonomous driving safety, yet real-world degradations severely compromise segmentation robustness, risking safety-critical failures. Current research primarily focuses on architectural innovations and optimization strategies to improve fusion quality. However, these approaches suffer from poor interpretability and inadequate adaptability in adverse weather conditions such as extreme illumination and noise. To address these issues, we propose a denoising diffusion fusion network for semantic segmentation based on degradation modeling. We first construct a multi-scale wavelet-domain fusion network to address the distribution characteristics of different degradations. Building on this decomposition, the denoising diffusion process and wavelet-domain fusion operations are combined to enhance fusion quality. Finally, we develop a denoising diffusion-optimized fusion loss function to guide parameter optimization while suppressing degradation artifacts. Extensive experiments on public datasets show that the proposed method achieves state-of-the-art performance with measurable gains: 42.8% improvement in edge integrity, 37.6% higher spatial frequency for texture preservation, and 39.8% reduction in noise artifacts. Through graph network analysis, we also reveal the interplay mechanisms among different degradations, various fusion quality assessment metrics, and semantic segmentation performance. These advancements exhibit superior robustness to degradations and enhance safety for real-world autonomous systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103205"},"PeriodicalIF":14.7,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Logic Augmented Multi-Decision Fusion Framework for Stance Detection on Social Media

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-22 DOI: 10.1016/j.inffus.2025.103214

Bowen Zhang , Jun Ma , Xianghua Fu , Genan Dai

Stance detection in social media has become increasingly crucial for understanding public opinions on controversial issues. While large language models (LLMs) have shown promising results in stance detection, existing methods face challenges in reconciling inconsistent predictions and logical reasoning processes across different LLMs. To address these limitations, we propose LogiMDF, a Logic Augmented Multi-Decision Fusion framework that effectively integrates multiple LLMs’ decision processes through a unified logical framework. Our approach first employs zero-shot prompting to extract first-order logic (FOL) rules representing each LLM’s prediction rationale, then constructs a Logical Fusion Schema (LFS) to bridge different LLMs’ knowledge representations. We further develop a Multi-view Hypergraph Convolutional Network (MvHGCN) that effectively models and encodes the integrated logical knowledge. Extensive experiments on benchmark datasets demonstrate that LogiMDF significantly outperforms existing methods, achieving state-of-the-art performance in stance detection tasks. The results confirm that our framework effectively leverages the complementary strengths of multiple LLMs while maintaining consistent logical reasoning across different targets.

{"title":"Logic Augmented Multi-Decision Fusion Framework for Stance Detection on Social Media","authors":"Bowen Zhang , Jun Ma , Xianghua Fu , Genan Dai","doi":"10.1016/j.inffus.2025.103214","DOIUrl":"10.1016/j.inffus.2025.103214","url":null,"abstract":"<div><div>Stance detection in social media has become increasingly crucial for understanding public opinions on controversial issues. While large language models (LLMs) have shown promising results in stance detection, existing methods face challenges in reconciling inconsistent predictions and logical reasoning processes across different LLMs. To address these limitations, we propose LogiMDF, a Logic Augmented Multi-Decision Fusion framework that effectively integrates multiple LLMs’ decision processes through a unified logical framework. Our approach first employs zero-shot prompting to extract first-order logic (FOL) rules representing each LLM’s prediction rationale, then constructs a Logical Fusion Schema (LFS) to bridge different LLMs’ knowledge representations. We further develop a Multi-view Hypergraph Convolutional Network (MvHGCN) that effectively models and encodes the integrated logical knowledge. Extensive experiments on benchmark datasets demonstrate that LogiMDF significantly outperforms existing methods, achieving state-of-the-art performance in stance detection tasks. The results confirm that our framework effectively leverages the complementary strengths of multiple LLMs while maintaining consistent logical reasoning across different targets.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103214"},"PeriodicalIF":14.7,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring Embodied Multimodal Large Models: Development, datasets, and future directions

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-22 DOI: 10.1016/j.inffus.2025.103198

Shoubin Chen , Zehao Wu , Kai Zhang , Chunyu Li , Baiyang Zhang , Fei Ma , Fei Richard Yu , Qingquan Li

Embodied Multimodal Large Models (EMLMs) have gained significant attention in recent years due to their potential to bridge the gap between perception, cognition, and action in complex, real-world environments. This comprehensive review explores the development of such models, including Large Language Models (LLMs), Large Vision Models (LVMs), and other models, while also examining other emerging architectures. We discuss the evolution of EMLMs, with a focus on embodied perception, navigation, interaction, and simulation. Furthermore, the review provides a detailed analysis of the datasets used for training and evaluating these models, highlighting the importance of diverse, high-quality data for effective learning. The paper also identifies key challenges faced by EMLMs, including issues of scalability, generalization, and real-time decision-making. Finally, we outline future directions, emphasizing the integration of multimodal sensing, reasoning, and action to advance the development of increasingly autonomous systems. By providing an in-depth analysis of state-of-the-art methods and identifying critical gaps, this paper aims to inspire future advancements in EMLMs and their applications across diverse domains. Project resources are accessible via https://github.com/BurryChen/Embodied-Multimodal-Large-Models.

{"title":"Exploring Embodied Multimodal Large Models: Development, datasets, and future directions","authors":"Shoubin Chen , Zehao Wu , Kai Zhang , Chunyu Li , Baiyang Zhang , Fei Ma , Fei Richard Yu , Qingquan Li","doi":"10.1016/j.inffus.2025.103198","DOIUrl":"10.1016/j.inffus.2025.103198","url":null,"abstract":"<div><div>Embodied Multimodal Large Models (EMLMs) have gained significant attention in recent years due to their potential to bridge the gap between perception, cognition, and action in complex, real-world environments. This comprehensive review explores the development of such models, including Large Language Models (LLMs), Large Vision Models (LVMs), and other models, while also examining other emerging architectures. We discuss the evolution of EMLMs, with a focus on embodied perception, navigation, interaction, and simulation. Furthermore, the review provides a detailed analysis of the datasets used for training and evaluating these models, highlighting the importance of diverse, high-quality data for effective learning. The paper also identifies key challenges faced by EMLMs, including issues of scalability, generalization, and real-time decision-making. Finally, we outline future directions, emphasizing the integration of multimodal sensing, reasoning, and action to advance the development of increasingly autonomous systems. By providing an in-depth analysis of state-of-the-art methods and identifying critical gaps, this paper aims to inspire future advancements in EMLMs and their applications across diverse domains. Project resources are accessible via <span><span>https://github.com/BurryChen/Embodied-Multimodal-Large-Models</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103198"},"PeriodicalIF":14.7,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-resolution synthetic aperture imaging method and benchmark based on event-frame fusion

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-04-22 DOI: 10.1016/j.inffus.2025.103211

Siqi Li , Yipeng Li , Yu-Shen Liu , Shaoyi Du , Jun-Hai Yong , Yue Gao

Existing event-based synthetic aperture imaging (SAI) methods can reconstruct unobstructed images of the background target scene behind dense foreground occlusions from visual information captured by an event camera. However, limited by the spatial resolution of existing event cameras and the computational paradigm of existing event-based SAI methods, the resolution of images reconstructed by existing methods is insufficient. In this paper, we propose a high-resolution (HR) SAI method based on event-frame fusion. Our method trades the high temporal resolution of the event camera for spatial resolution and fuses events and frames to reconstruct the HR unobstructed image. Our proposed method leverages an event-guided occlusion segmentation mechanism to predict the occlusion masks, and extracts multi-view multi-modal valid visual features while invalid parts are discarded. Then, an adaptive fusion module is proposed to align and fuse the multi-view features, and the HR unobstructed scene image is reconstructed from the fused feature. In addition, we construct a multi-sensor vision acquisition system and collect the first Event-based High-Resolution Synthetic Aperture Imaging (THU

^{E-HRSAI}

) dataset containing low-resolution occluded frames, event streams, and HR ground truth unobstructed scene images, which will be released as the first benchmark. We conduct experiments on our THU

^{E-HRSAI}

dataset, and the experimental results demonstrate that our proposed method achieves state-of-the-art performance. Downstream application results further prove the necessity of our method. Code and dataset are available at: https://github.com/lisiqi19971013/E-HRSAI.

{"title":"High-resolution synthetic aperture imaging method and benchmark based on event-frame fusion","authors":"Siqi Li , Yipeng Li , Yu-Shen Liu , Shaoyi Du , Jun-Hai Yong , Yue Gao","doi":"10.1016/j.inffus.2025.103211","DOIUrl":"10.1016/j.inffus.2025.103211","url":null,"abstract":"<div><div>Existing event-based synthetic aperture imaging (SAI) methods can reconstruct unobstructed images of the background target scene behind dense foreground occlusions from visual information captured by an event camera. However, limited by the spatial resolution of existing event cameras and the computational paradigm of existing event-based SAI methods, the resolution of images reconstructed by existing methods is insufficient. In this paper, we propose a high-resolution (HR) SAI method based on event-frame fusion. Our method trades the high temporal resolution of the event camera for spatial resolution and fuses events and frames to reconstruct the HR unobstructed image. Our proposed method leverages an event-guided occlusion segmentation mechanism to predict the occlusion masks, and extracts multi-view multi-modal valid visual features while invalid parts are discarded. Then, an adaptive fusion module is proposed to align and fuse the multi-view features, and the HR unobstructed scene image is reconstructed from the fused feature. In addition, we construct a multi-sensor vision acquisition system and collect the first <strong>E</strong>vent-based <strong>H</strong>igh-<strong>R</strong>esolution <strong>S</strong>ynthetic <strong>A</strong>perture <strong>I</strong>maging (THU<span><math><msup><mrow></mrow><mrow><mtext>E-HRSAI</mtext></mrow></msup></math></span>) dataset containing low-resolution occluded frames, event streams, and HR ground truth unobstructed scene images, which will be released as the first benchmark. We conduct experiments on our THU<span><math><msup><mrow></mrow><mrow><mtext>E-HRSAI</mtext></mrow></msup></math></span> dataset, and the experimental results demonstrate that our proposed method achieves state-of-the-art performance. Downstream application results further prove the necessity of our method. Code and dataset are available at: <span><span>https://github.com/lisiqi19971013/E-HRSAI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103211"},"PeriodicalIF":14.7,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0