Pub Date : 2026-01-11DOI: 10.1016/j.inffus.2026.104142
He Li, Taiyu Liao, Weihang Kong, Xingchen Zhang
Multi-view pedestrian detection is an important task and has many applications in areas such as surveillance and smart cities. Despite the significant performance improvements achieved in recent multi-view pedestrian detection methods, there are still three main challenges for this task. 1) In crowded areas, neighboring connected components may merge in dense regions, resulting in unclear localization of pixel peaks for each pedestrian. 2) The loss functions used in previous multi-view pedestrian detection methods have a high response to the background. 3) The camera parameters have not been fully utilized; they are only used to generate a fixed-value projection matrix. To address these challenges, we propose a novel multi-view pedestrian detection framework with Central Inverse Nearest Neighbor map and View Adaptive Module (MCIVA). A Central Inverse Nearest Neighbor (CINN) map is introduced to generate the ground-truth Probability Occupancy Map (POM) based on annotations, providing more precise location information for each pedestrian. To enhance the model’s attention to local structural information, we propose a local structural similarity loss to reduce the influence of false local maximum in background regions. Moreover, a novel plug-and-pull View Adaptive Module (VAM) is introduced to utilize the camera parameters to generate learnable weights for multi-view features fusion. We evaluate the proposed method on three benchmark datasets, and the results show that the proposed MCIVA significantly improves the quality of prediction map and achieves state-of-the-art performance.
多视角行人检测是一项重要的任务,在监控和智慧城市等领域有着广泛的应用。尽管最近的多视角行人检测方法取得了显著的性能进步,但这项任务仍然存在三个主要挑战。1)在拥挤区域,相邻的连通组件可能在密集区域合并,导致每个行人像素峰值定位不清。2)以往多视角行人检测方法中使用的损失函数对背景的响应较高。3)相机参数没有被充分利用;它们仅用于生成定值投影矩阵。为了解决这些挑战,我们提出了一种新的多视图行人检测框架,该框架具有中心逆最近邻地图和视图自适应模块(MCIVA)。引入中心逆近邻图(Central Inverse Nearest Neighbor, CINN)生成基于标注的地真概率占用图(ground-truth Probability Occupancy map, POM),为每个行人提供更精确的位置信息。为了增强模型对局部结构信息的关注,我们提出了局部结构相似度损失来减少背景区域虚假局部极大值的影响。此外,引入了一种新型的即插即用视图自适应模块(VAM),利用摄像机参数生成可学习的权重,用于多视图特征融合。我们在三个基准数据集上对所提出的方法进行了评估,结果表明所提出的MCIVA方法显著提高了预测图的质量,达到了最先进的性能。
{"title":"MCIVA: A Multi-View Pedestrian Detection Framework with Central Inverse Nearest Neighbor Map and View Adaptive Module","authors":"He Li, Taiyu Liao, Weihang Kong, Xingchen Zhang","doi":"10.1016/j.inffus.2026.104142","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104142","url":null,"abstract":"Multi-view pedestrian detection is an important task and has many applications in areas such as surveillance and smart cities. Despite the significant performance improvements achieved in recent multi-view pedestrian detection methods, there are still three main challenges for this task. 1) In crowded areas, neighboring connected components may merge in dense regions, resulting in unclear localization of pixel peaks for each pedestrian. 2) The loss functions used in previous multi-view pedestrian detection methods have a high response to the background. 3) The camera parameters have not been fully utilized; they are only used to generate a fixed-value projection matrix. To address these challenges, we propose a novel multi-view pedestrian detection framework with Central Inverse Nearest Neighbor map and View Adaptive Module (<ce:bold>MCIVA</ce:bold>). <ce:italic>A Central Inverse Nearest Neighbor (CINN) map</ce:italic> is introduced to generate the ground-truth Probability Occupancy Map (POM) based on annotations, providing more precise location information for each pedestrian. To enhance the model’s attention to local structural information, we propose <ce:italic>a local structural similarity loss</ce:italic> to reduce the influence of false local maximum in background regions. Moreover, a novel plug-and-pull <ce:italic>View Adaptive Module</ce:italic> (VAM) is introduced to utilize the camera parameters to generate learnable weights for multi-view features fusion. We evaluate the proposed method on three benchmark datasets, and the results show that the proposed MCIVA significantly improves the quality of prediction map and achieves state-of-the-art performance.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"12 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.inffus.2025.104121
Shengjia Chen , Huihua Hu , Hongfu Zeng , Chenxin Li , Qing Xu , Longfeng Zhang , Haipeng Xu
Leptomeningeal metastasis (LM) diagnosis represents a significant clinical challenge. Existing diagnostic approaches are often limited by their reliance on single-modality data and the inherent difficulties in effectively integrating heterogeneous information from imaging and genomics. To address these challenges, we propose OMD, an Optimal Transport-guided Multimodal Disentangled Learning framework that integrates MRI data with genomic information for enhanced diagnostic accuracy. Our method combines optimal transport-based cross-modal attention to robustly align heterogeneous features, information bottleneck compression to mitigate noise and redundancy, and feature disentanglement to explicitly model shared and modality-specific representations, integrated with hierarchical attention for MRI processing and graph-based cross-modal reasoning. Experimental results show that OMD achieves superior diagnostic accuracy, sensitivity, and specificity on our clinical dataset, substantially outperforming current state-of-the-art methods across all evaluation metrics. The model also provides interpretable insights into the cross-modal biomarkers associated with LM. The proposed OMD framework establishes a new paradigm for multimodal medical diagnosis that effectively addresses the complementary strengths of imaging and genomic data. Beyond its immediate application to LM diagnosis, our approach offers a generalizable methodology for integrating heterogeneous medical data sources while providing clinically relevant interpretability. This work represents an important step toward personalized medicine approaches that combine multiple data modalities for improved diagnostic accuracy and treatment planning.
{"title":"OMD: optimal transport-guided multimodal disentangled learning for leptomeningeal metastasis diagnosis","authors":"Shengjia Chen , Huihua Hu , Hongfu Zeng , Chenxin Li , Qing Xu , Longfeng Zhang , Haipeng Xu","doi":"10.1016/j.inffus.2025.104121","DOIUrl":"10.1016/j.inffus.2025.104121","url":null,"abstract":"<div><div>Leptomeningeal metastasis (LM) diagnosis represents a significant clinical challenge. Existing diagnostic approaches are often limited by their reliance on single-modality data and the inherent difficulties in effectively integrating heterogeneous information from imaging and genomics. To address these challenges, we propose OMD, an <u>O</u>ptimal Transport-guided <u>M</u>ultimodal <u>D</u>isentangled Learning framework that integrates MRI data with genomic information for enhanced diagnostic accuracy. Our method combines optimal transport-based cross-modal attention to robustly align heterogeneous features, information bottleneck compression to mitigate noise and redundancy, and feature disentanglement to explicitly model shared and modality-specific representations, integrated with hierarchical attention for MRI processing and graph-based cross-modal reasoning. Experimental results show that OMD achieves superior diagnostic accuracy, sensitivity, and specificity on our clinical dataset, substantially outperforming current state-of-the-art methods across all evaluation metrics. The model also provides interpretable insights into the cross-modal biomarkers associated with LM. The proposed OMD framework establishes a new paradigm for multimodal medical diagnosis that effectively addresses the complementary strengths of imaging and genomic data. Beyond its immediate application to LM diagnosis, our approach offers a generalizable methodology for integrating heterogeneous medical data sources while providing clinically relevant interpretability. This work represents an important step toward personalized medicine approaches that combine multiple data modalities for improved diagnostic accuracy and treatment planning.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104121"},"PeriodicalIF":15.5,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.inffus.2026.104146
Shuo Yang , Raquel Caballero-Águila , Jun Hu , Antonia Oya-Lechuga
In this paper, the secure Tobit filtering (TF) problem is investigated for nonlinear systems subject to measurement censoring under a multi-node random access protocol (MNRAP). A multi-rate sampling framework is considered, which allows the system states and measurement outputs to operate with distinct sampling periods, thus reflecting practical engineering constraints. Furthermore, to mitigate data collisions and improve resource utilization, the MNRAP is adopted to regulate the transmission order of measurement signals over communication networks. In addition, to safeguard the communication confidentiality between the sensor node and the filter, the Paillier encryption-decryption mechanism is incorporated. This protects the transmitted information from being intercepted by unauthorized third parties. This paper concentrates on developing an innovative secure TF scheme that guarantees the existence of an upper bound (UB) on the filtering error second moment. Subsequently, the minimization of the obtained UB is carried out in the trace sense by designing a proper filter gain. Additionally, the uniform boundedness of the filtering error is verified in the mean-square sense by establishing a sufficient criterion. Finally, the efficacy and advantages of the proposed secure TF approach are demonstrated through a simulation example.
{"title":"Secure Tobit filtering for multi-rate nonlinear systems under multi-node random access protocol: A Paillier encryption-decryption mechanism","authors":"Shuo Yang , Raquel Caballero-Águila , Jun Hu , Antonia Oya-Lechuga","doi":"10.1016/j.inffus.2026.104146","DOIUrl":"10.1016/j.inffus.2026.104146","url":null,"abstract":"<div><div>In this paper, the secure Tobit filtering (TF) problem is investigated for nonlinear systems subject to measurement censoring under a multi-node random access protocol (MNRAP). A multi-rate sampling framework is considered, which allows the system states and measurement outputs to operate with distinct sampling periods, thus reflecting practical engineering constraints. Furthermore, to mitigate data collisions and improve resource utilization, the MNRAP is adopted to regulate the transmission order of measurement signals over communication networks. In addition, to safeguard the communication confidentiality between the sensor node and the filter, the Paillier encryption-decryption mechanism is incorporated. This protects the transmitted information from being intercepted by unauthorized third parties. This paper concentrates on developing an innovative secure TF scheme that guarantees the existence of an upper bound (UB) on the filtering error second moment. Subsequently, the minimization of the obtained UB is carried out in the trace sense by designing a proper filter gain. Additionally, the uniform boundedness of the filtering error is verified in the mean-square sense by establishing a sufficient criterion. Finally, the efficacy and advantages of the proposed secure TF approach are demonstrated through a simulation example.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104146"},"PeriodicalIF":15.5,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.inffus.2026.104125
Xia Wang , Jun Liu , Chris D. Nugent , Shaobing Xu , Guanfeng Wu
Multi-agent pathfinding and its reliable execution in stochastic environments represent a critical challenge for real-world applications, demanding both the planning of efficient paths and the formal assurance of safe, conflict-free operation. This paper introduces a novel methodology framework to address this dual requirement. To maximize operational efficiency, we introduce a strategy for optimal goal allocation for team collaboration, integrating it with the conflict-based search algorithm to minimize the total move counts required for mission completion. The second component is an integrated verification process grounded in probabilistic model checking. We model the multi-agent path execution process under stochastic uncertainties using a Markov decision process. By leveraging the probabilistic model checker and probabilistic computation tree logic, the framework formally verifies critical safety properties, ensuring conflict-free and deadlock-free path execution. Furthermore, it evaluates the effectiveness of proposed behavioral constraints designed to mitigate stochastic delays, thereby verifying the overall system safety. By fusing multi-agent planning, probabilistic reasoning, and formal logic-based verification, the proposed framework establishes a foundation amenable to natural extension for addressing multi-agent decision-making and uncertainty estimation. Case study results demonstrate that our methodology effectively selects the pathfinding solution with the minimum move count while significantly enhancing overall system safety through these formally verified behavioral constraints.
{"title":"Team collaboration-oriented multi-agent pathfinding and probabilistic verification","authors":"Xia Wang , Jun Liu , Chris D. Nugent , Shaobing Xu , Guanfeng Wu","doi":"10.1016/j.inffus.2026.104125","DOIUrl":"10.1016/j.inffus.2026.104125","url":null,"abstract":"<div><div>Multi-agent pathfinding and its reliable execution in stochastic environments represent a critical challenge for real-world applications, demanding both the planning of efficient paths and the formal assurance of safe, conflict-free operation. This paper introduces a novel methodology framework to address this dual requirement. To maximize operational efficiency, we introduce a strategy for optimal goal allocation for team collaboration, integrating it with the conflict-based search algorithm to minimize the total move counts required for mission completion. The second component is an integrated verification process grounded in probabilistic model checking. We model the multi-agent path execution process under stochastic uncertainties using a Markov decision process. By leveraging the probabilistic model checker and probabilistic computation tree logic, the framework formally verifies critical safety properties, ensuring conflict-free and deadlock-free path execution. Furthermore, it evaluates the effectiveness of proposed behavioral constraints designed to mitigate stochastic delays, thereby verifying the overall system safety. By fusing multi-agent planning, probabilistic reasoning, and formal logic-based verification, the proposed framework establishes a foundation amenable to natural extension for addressing multi-agent decision-making and uncertainty estimation. Case study results demonstrate that our methodology effectively selects the pathfinding solution with the minimum move count while significantly enhancing overall system safety through these formally verified behavioral constraints.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104125"},"PeriodicalIF":15.5,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.inffus.2026.104144
Kang Wang , Liangliang Wang , Zhiquan Liu , Yiyuan Luo , Kai Zhang , Weiwei Li
Federated Learning (FL) is vulnerable to backdoor attacks, where hidden triggers in model updates can induce malicious behavior on specific inputs, ultimately compromising the reliability of FL. However, existing backdoor detection methods require decryption of locally uploaded encrypted models on the server before further detection can be performed. In this paper, we propose SHIFT containing three parts: transferring the backdoor detection task to the client side to significantly reduce the computational burden on the server; employing client-side code obfuscation to prevent malicious clients from analyzing or bypassing the detection mechanism; and utilizing a dynamic risk level mapping mechanism to adaptively adjust the results of the backdoor detection output. SHIFT can directly detect unencrypted data on the client side. We evaluated the time overhead of SHIFT compared with various backdoor detection schemes based on different encryption methods. Additionally, we assessed its performance in handwritten digit recognition and image classification tasks under single-client and multi-client backdoor attacks, specifically in non-independent and identically distributed (non-IID) scenarios. Experimental results indicate that SHIFT improves backdoor detection efficiency by a factor ranging from 1.28 to 36.65 over existing schemes, while also demonstrating robust performance in detecting and defending against various backdoor attacks, particularly in large-scale, multi-client distributed federated learning systems.
{"title":"SHIFT: Enhancing federated learning robustness through client-side backdoor detection","authors":"Kang Wang , Liangliang Wang , Zhiquan Liu , Yiyuan Luo , Kai Zhang , Weiwei Li","doi":"10.1016/j.inffus.2026.104144","DOIUrl":"10.1016/j.inffus.2026.104144","url":null,"abstract":"<div><div>Federated Learning (FL) is vulnerable to backdoor attacks, where hidden triggers in model updates can induce malicious behavior on specific inputs, ultimately compromising the reliability of FL. However, existing backdoor detection methods require decryption of locally uploaded encrypted models on the server before further detection can be performed. In this paper, we propose SHIFT containing three parts: transferring the backdoor detection task to the client side to significantly reduce the computational burden on the server; employing client-side code obfuscation to prevent malicious clients from analyzing or bypassing the detection mechanism; and utilizing a dynamic risk level mapping mechanism to adaptively adjust the results of the backdoor detection output. SHIFT can directly detect unencrypted data on the client side. We evaluated the time overhead of SHIFT compared with various backdoor detection schemes based on different encryption methods. Additionally, we assessed its performance in handwritten digit recognition and image classification tasks under single-client and multi-client backdoor attacks, specifically in non-independent and identically distributed (non-IID) scenarios. Experimental results indicate that SHIFT improves backdoor detection efficiency by a factor ranging from 1.28 to 36.65 over existing schemes, while also demonstrating robust performance in detecting and defending against various backdoor attacks, particularly in large-scale, multi-client distributed federated learning systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104144"},"PeriodicalIF":15.5,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.inffus.2026.104126
Zhonglin Wu , Hongliang Wang , Tongze Zhang , Hongyuan Liu , Jinxia Guo , Qinli Yang , Junming Shao
Class overlap in data streams presents a significant challenge for real-time classification, particularly when confronted with the high dimensionality and evolving distributions inherent in such streams. Traditional classification methods, typically designed for static datasets, struggle to adapt to the dynamic nature of data streams, where both high-dimensional feature spaces and class imbalance exacerbate the complexity of classifying overlapping regions. In this paper, we propose a novel deep metric learning framework specifically tailored to address the challenges of class overlap in high-dimensional data streams. Our approach introduces two key innovations. First, we develop a multi-anchor sample mining mechanism based on neighborhood rough set theory, which partitions the data into non-overlapping and overlapping regions. By utilizing region-specific triplet-margin losses and hinge embedding loss, we construct a more refined discriminative metric space that significantly enhances the separation of overlapping classes. Furthermore, we introduce a dynamic, density-aware real-time label propagation mechanism with class-imbalance compensation. This component integrates real-time distribution estimation with a nonlinear adaptive threshold controller, enabling dual adaptivity: (1) dynamically re-weighting density contributions via inverse-frequency scaling to mitigate the dominance of majority classes and (2) adjusting threshold boundaries for frequent classes while relaxing propagation criteria for rare classes through nonlinear adjustments. Empirical evaluations on both synthetic and real-world data streams demonstrate that our method not only improves balanced accuracy but also enhances robustness in the presence of class overlap and class imbalance, outperforming state-of-the-art techniques.
{"title":"Region-based deep metric learning for tackling class overlap in online semi-supervised data stream classification","authors":"Zhonglin Wu , Hongliang Wang , Tongze Zhang , Hongyuan Liu , Jinxia Guo , Qinli Yang , Junming Shao","doi":"10.1016/j.inffus.2026.104126","DOIUrl":"10.1016/j.inffus.2026.104126","url":null,"abstract":"<div><div>Class overlap in data streams presents a significant challenge for real-time classification, particularly when confronted with the high dimensionality and evolving distributions inherent in such streams. Traditional classification methods, typically designed for static datasets, struggle to adapt to the dynamic nature of data streams, where both high-dimensional feature spaces and class imbalance exacerbate the complexity of classifying overlapping regions. In this paper, we propose a novel deep metric learning framework specifically tailored to address the challenges of class overlap in high-dimensional data streams. Our approach introduces two key innovations. First, we develop a multi-anchor sample mining mechanism based on neighborhood rough set theory, which partitions the data into non-overlapping and overlapping regions. By utilizing region-specific triplet-margin losses and hinge embedding loss, we construct a more refined discriminative metric space that significantly enhances the separation of overlapping classes. Furthermore, we introduce a dynamic, density-aware real-time label propagation mechanism with class-imbalance compensation. This component integrates real-time distribution estimation with a nonlinear adaptive threshold controller, enabling dual adaptivity: (1) dynamically re-weighting density contributions via inverse-frequency scaling to mitigate the dominance of majority classes and (2) adjusting threshold boundaries for frequent classes while relaxing propagation criteria for rare classes through nonlinear adjustments. Empirical evaluations on both synthetic and real-world data streams demonstrate that our method not only improves balanced accuracy but also enhances robustness in the presence of class overlap and class imbalance, outperforming state-of-the-art techniques.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104126"},"PeriodicalIF":15.5,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.inffus.2026.104128
XingchiChen , Fushen Xie , Fa Zhu , Shuanglong Zhang , Xiaoyang Lu , Qing Li , Rong Chen , Dazhou Li , David Camacho
The detection of epileptic seizures using multi-sensor EEG signals is a challenging task due to the inherent complexity of the signals, the variability in sensor configurations, and the difficulty in distinguishing the weak inter-class difference. To address these challenges, we propose a novel multimodal information fusion framework that integrates a large language model (LLM) and a multimodal EEG feature tokenization method for enhanced epilepsy detection. This paper adopts a multimodal feature extraction (MFE) method to effectively generate multimodal feature representations from EEG signals and extract different feature representations of EEG signals from different signal domains. In addition, we design a multimodal EEG feature tokenization method to tokenize EEG signal features and fuse the semantic information, solving the problem of fusing epileptic EEG features with semantic information in prompt words. We use the powerful reasoning and pattern recognition capabilities of pre-trained LLMs to accurately and robustly detect epileptic events. The proposed method is evaluated on a public dataset. Extensive experimental results show that the proposed method outperforms the current comparative methods in multiple performance indicators.
{"title":"Tokenized EEG signals with large language models for epilepsy detection via multimodal information fusion","authors":"XingchiChen , Fushen Xie , Fa Zhu , Shuanglong Zhang , Xiaoyang Lu , Qing Li , Rong Chen , Dazhou Li , David Camacho","doi":"10.1016/j.inffus.2026.104128","DOIUrl":"10.1016/j.inffus.2026.104128","url":null,"abstract":"<div><div>The detection of epileptic seizures using multi-sensor EEG signals is a challenging task due to the inherent complexity of the signals, the variability in sensor configurations, and the difficulty in distinguishing the weak inter-class difference. To address these challenges, we propose a novel multimodal information fusion framework that integrates a large language model (LLM) and a multimodal EEG feature tokenization method for enhanced epilepsy detection. This paper adopts a multimodal feature extraction (MFE) method to effectively generate multimodal feature representations from EEG signals and extract different feature representations of EEG signals from different signal domains. In addition, we design a multimodal EEG feature tokenization method to tokenize EEG signal features and fuse the semantic information, solving the problem of fusing epileptic EEG features with semantic information in prompt words. We use the powerful reasoning and pattern recognition capabilities of pre-trained LLMs to accurately and robustly detect epileptic events. The proposed method is evaluated on a public dataset. Extensive experimental results show that the proposed method outperforms the current comparative methods in multiple performance indicators.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104128"},"PeriodicalIF":15.5,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.inffus.2026.104127
Menglin Yu , Shuxia Lu , Jiacheng Cong
Graph neural networks (GNNs) perform exceptionally well in node classification, but graph neural networks face severe challenges when dealing with imbalanced node classification. On the one hand, the model is prone to overfitting due to the small number of minority class samples. GNN’s message passing mechanism amplifies this problem, causing the model to overfit specific features and local neighborhood structures of minority class nodes rather than learning general patterns, resulting in poor generalization ability. On the other hand, the scarcity of samples leads to high variance in model training. Model performance is highly dependent on specific training samples and local graph structures, and is extremely sensitive to data partitioning, ultimately resulting in severe performance fluctuations and unstable results. In this work, to address the issues of minority class overfitting and high model variance faced by GNNs in imbalanced scenarios, we propose the dual-graph framework, A similarity-Guided Dual-Graph Learning Framework (SG-DGLF). To address the problem of overfitting for minority classes, the framework introduces a dynamic threshold random capture mechanism based on similarity, which supplements minority class samples by generating pseudo labels. Secondly, we leverage graph diffusion-based propagation and random edge dropping strategy to create new graphs, thereby increasing node diversity to alleviate the problem of excessive model variance. Empirically, SG-DGLF significantly outperforms advanced baseline methods on multiple imbalanced datasets. This validates the effectiveness of our framework in mitigating the problems of overfitting minority classes and high model variance.
{"title":"SG-DGLF: A similarity-guided dual-graph learning framework","authors":"Menglin Yu , Shuxia Lu , Jiacheng Cong","doi":"10.1016/j.inffus.2026.104127","DOIUrl":"10.1016/j.inffus.2026.104127","url":null,"abstract":"<div><div>Graph neural networks (GNNs) perform exceptionally well in node classification, but graph neural networks face severe challenges when dealing with imbalanced node classification. On the one hand, the model is prone to overfitting due to the small number of minority class samples. GNN’s message passing mechanism amplifies this problem, causing the model to overfit specific features and local neighborhood structures of minority class nodes rather than learning general patterns, resulting in poor generalization ability. On the other hand, the scarcity of samples leads to high variance in model training. Model performance is highly dependent on specific training samples and local graph structures, and is extremely sensitive to data partitioning, ultimately resulting in severe performance fluctuations and unstable results. In this work, to address the issues of minority class overfitting and high model variance faced by GNNs in imbalanced scenarios, we propose the dual-graph framework, A similarity-Guided Dual-Graph Learning Framework (SG-DGLF). To address the problem of overfitting for minority classes, the framework introduces a dynamic threshold random capture mechanism based on similarity, which supplements minority class samples by generating pseudo labels. Secondly, we leverage graph diffusion-based propagation and random edge dropping strategy to create new graphs, thereby increasing node diversity to alleviate the problem of excessive model variance. Empirically, SG-DGLF significantly outperforms advanced baseline methods on multiple imbalanced datasets. This validates the effectiveness of our framework in mitigating the problems of overfitting minority classes and high model variance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104127"},"PeriodicalIF":15.5,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.1016/j.inffus.2025.104116
Jun Lyu , Xunkang Zhao , Jing Qin , Chengyan Wang
Cardiac cine MRI is the clinical gold standard for dynamic cardiac assessment, but reducing k-space sampling to accelerate acquisition results in low-resolution images that fail to depict fine anatomical details. Existing super-resolution methods struggle to preserve spatial details and temporal coherence due to limitations in handling non-rigid cardiac deformations and lossy feature downsampling. This paper proposes a Wavelet-based Deformable Attention Super-Resolution Network (WDASR) that addresses these limitations through two key innovations. First, a Frequency Subband Adaptive Alignment (FSAA) module applies deformable convolution to wavelet-decomposed frequency subbands, enabling lossless downsampling that prevents offset over-shifting and allows targeted alignment across neighboring and remote frames. Second, a Cross-Resolution Wavelet Attention (CRWA) module uses temporally-aggregated frequency subbands as low-resolution keys and values, and the current frame as high-resolution query, reducing computational complexity by 75% while effectively integrating multi-scale spatiotemporal information for enhanced texture representation. A bidirectional recurrent mechanism further propagates the enhanced features to maintain temporal consistency. Experiments on public and private datasets demonstrate that WDASR achieves 4 × super-resolution with state-of-the-art performance and potential for clinical application.
{"title":"WDASR: A wavelet-based deformable attention network for cardiac cine MRI super-resolution with spatiotemporal motion modeling","authors":"Jun Lyu , Xunkang Zhao , Jing Qin , Chengyan Wang","doi":"10.1016/j.inffus.2025.104116","DOIUrl":"10.1016/j.inffus.2025.104116","url":null,"abstract":"<div><div>Cardiac cine MRI is the clinical gold standard for dynamic cardiac assessment, but reducing k-space sampling to accelerate acquisition results in low-resolution images that fail to depict fine anatomical details. Existing super-resolution methods struggle to preserve spatial details and temporal coherence due to limitations in handling non-rigid cardiac deformations and lossy feature downsampling. This paper proposes a Wavelet-based Deformable Attention Super-Resolution Network (WDASR) that addresses these limitations through two key innovations. First, a Frequency Subband Adaptive Alignment (FSAA) module applies deformable convolution to wavelet-decomposed frequency subbands, enabling lossless downsampling that prevents offset over-shifting and allows targeted alignment across neighboring and remote frames. Second, a Cross-Resolution Wavelet Attention (CRWA) module uses temporally-aggregated frequency subbands as low-resolution keys and values, and the current frame as high-resolution query, reducing computational complexity by 75% while effectively integrating multi-scale spatiotemporal information for enhanced texture representation. A bidirectional recurrent mechanism further propagates the enhanced features to maintain temporal consistency. Experiments on public and private datasets demonstrate that WDASR achieves 4 × super-resolution with state-of-the-art performance and potential for clinical application.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104116"},"PeriodicalIF":15.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.1016/j.inffus.2026.104123
Yinan Li , Zhi Liu , Jiajun Tang , Binghong Chen , Mingjin Kuai , Jun Long , Zhan Yang
Hashing has been extensively applied in cross-modal retrieval by mapping diverse modalities data into binary codes. Semantic transfer aims to enhance the relevance of heterogeneous representations through migrating valuable information from one modality to another in the unsupervised paradigm. The combination of semantic transfer and hash learning substitutes the dense vector search with Hamming distance, significantly reducing storage requirements and increasing retrieval efficiency. However, the current unsupervised mechanism demonstrates ordinary performance in retrieval precision, which requires more improvement from semantic annotation. Particularly, the mediocre information fusion strategy directly affects the quality of learned hash codes. In this paper, we propose a novel Semantic Transfer framework for Semi-supervised Cross-modal Hashing, denoted as STSCH. Initially, we utilize multiple auto-encoders to learn the high-level semantic representation of each modality. To guarantee the completeness of heterogeneous data, we incorporate them via semantic transfer and analyse the feature distribution of diverse modalities. Furthermore, an asymmetric hash learning framework between individual modality-specific representation and minor semantic labels is constructed. Finally, an effective optimization algorithm is proposed. Comprehensive experiments on Wiki, MIRFlickr, and NUS-WIDE datasets demonstrate the superior performance of STSCH to state-of-the-art hashing approaches.
{"title":"Rethink: reveal the impact of semantic distribution transfer from the cross-modal hashing perspective","authors":"Yinan Li , Zhi Liu , Jiajun Tang , Binghong Chen , Mingjin Kuai , Jun Long , Zhan Yang","doi":"10.1016/j.inffus.2026.104123","DOIUrl":"10.1016/j.inffus.2026.104123","url":null,"abstract":"<div><div>Hashing has been extensively applied in cross-modal retrieval by mapping diverse modalities data into binary codes. Semantic transfer aims to enhance the relevance of heterogeneous representations through migrating valuable information from one modality to another in the unsupervised paradigm. The combination of semantic transfer and hash learning substitutes the dense vector search with Hamming distance, significantly reducing storage requirements and increasing retrieval efficiency. However, the current unsupervised mechanism demonstrates ordinary performance in retrieval precision, which requires more improvement from semantic annotation. Particularly, the mediocre information fusion strategy directly affects the quality of learned hash codes. In this paper, we propose a novel <strong>S</strong>emantic <strong>T</strong>ransfer framework for <strong>S</strong>emi-supervised <strong>C</strong>ross-modal <strong>H</strong>ashing, denoted as STSCH. Initially, we utilize multiple auto-encoders to learn the high-level semantic representation of each modality. To guarantee the completeness of heterogeneous data, we incorporate them via semantic transfer and analyse the feature distribution of diverse modalities. Furthermore, an asymmetric hash learning framework between individual modality-specific representation and minor semantic labels is constructed. Finally, an effective optimization algorithm is proposed. Comprehensive experiments on Wiki, MIRFlickr, and NUS-WIDE datasets demonstrate the superior performance of STSCH to state-of-the-art hashing approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104123"},"PeriodicalIF":15.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}