Information Fusion最新文献_第6页

A comprehensive survey of visible and infrared imaging in complex environments: Principle, degradation and enhancement

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-17 DOI: 10.1016/j.inffus.2025.103036

Yuanbo Li , Ping Zhou , Gongbo Zhou , Haozhe Wang , Yunqi Lu , Yuxing Peng

Images captured in extreme environments, including deep-earth, deep-sea, and deep-space exploration sites, often suffer from significant degradation due to complex visual factors, which adversely impact visual quality and complicate perceptual tasks. This survey systematically synthesizes recent advancements in visual perception and understanding within these challenging contexts. It focuses on the imaging principles and degradation mechanisms affecting both visible light and infrared images, as well as the image enhancement techniques developed to mitigate various degradation factors. The survey begins by examining key degradation mechanisms, such as low light, high water vapor, and heavy dust in visible light images (VLI), along with atmospheric radiation attenuation and turbulence distortion in infrared images (IRI). Next, a categorization and critical evaluation of both traditional and deep learning-based image enhancement algorithms is conducted, with a particular emphasis placed on their applications to VLI and IRI. Additionally, we summarize the application of image enhancement algorithms in complex environments, using deep underground scenes of coal mines as a case study, and analyze current trends by tracking the evolution of these algorithms. Finally, the survey highlights the challenges of image enhancement under complex and harsh conditions, offering a critical assessment of existing limitations and suggesting future research directions. By consolidating key insights and identifying emerging trends and challenges, this survey aims to serve as a comprehensive resource for researchers engaged in image enhancement techniques in extreme environmental conditions, such as those found in deep-earth, deep-sea, and deep-space environments.

{"title":"A comprehensive survey of visible and infrared imaging in complex environments: Principle, degradation and enhancement","authors":"Yuanbo Li , Ping Zhou , Gongbo Zhou , Haozhe Wang , Yunqi Lu , Yuxing Peng","doi":"10.1016/j.inffus.2025.103036","DOIUrl":"10.1016/j.inffus.2025.103036","url":null,"abstract":"<div><div>Images captured in extreme environments, including deep-earth, deep-sea, and deep-space exploration sites, often suffer from significant degradation due to complex visual factors, which adversely impact visual quality and complicate perceptual tasks. This survey systematically synthesizes recent advancements in visual perception and understanding within these challenging contexts. It focuses on the imaging principles and degradation mechanisms affecting both visible light and infrared images, as well as the image enhancement techniques developed to mitigate various degradation factors. The survey begins by examining key degradation mechanisms, such as low light, high water vapor, and heavy dust in visible light images (VLI), along with atmospheric radiation attenuation and turbulence distortion in infrared images (IRI). Next, a categorization and critical evaluation of both traditional and deep learning-based image enhancement algorithms is conducted, with a particular emphasis placed on their applications to VLI and IRI. Additionally, we summarize the application of image enhancement algorithms in complex environments, using deep underground scenes of coal mines as a case study, and analyze current trends by tracking the evolution of these algorithms. Finally, the survey highlights the challenges of image enhancement under complex and harsh conditions, offering a critical assessment of existing limitations and suggesting future research directions. By consolidating key insights and identifying emerging trends and challenges, this survey aims to serve as a comprehensive resource for researchers engaged in image enhancement techniques in extreme environmental conditions, such as those found in deep-earth, deep-sea, and deep-space environments.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103036"},"PeriodicalIF":14.7,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weighted-digraph-guided multi-kernelized learning for outlier explanation

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-17 DOI: 10.1016/j.inffus.2025.103026

Lili Guan , Lei Duan , Xinye Wang , Haiying Wang , Rui Lin

Outlier explanation methods based on outlying subspace mining have been widely used in various applications due to their effectiveness and explainability. These existing methods aim to find an outlying subspace of the original space (a set of features) that can clearly distinguish a query outlier from all inliers. However, when the query outlier in the original space are linearly inseparable from inliers, these existing methods may not be able to accurately identify an outlying subspace that effectively distinguishes the query outlier from all inliers. Moreover, these methods ignore differences between the query outlier and other outliers. In this paper, we propose a novel method named WANDER (Wighted-digrAph-Guided Multi-KerNelizeD lEaRning) for outlier explanation, aiming to learn an optimal outlying subspace that can separate the query outlier from other outliers and the inliers simultaneously. Specifically, we first design a quadruplet sampling module to transform the original dataset into a set of quadruplets to mitigate extreme data imbalances and to help the explainer better capture the differences among the query outlier, other outliers, and inliers. Then we design a weighted digraph generation module to capture the geometric structure in each quadruplet within the original space. In order to consider the condition that quadruplets are linearly inseparable in the original space, we further construct a feature embedding module to map the set of quadruplets from the original space to a kernelized embedding space. To find the optimal kernelized embedding space, we design an outlying measure module to iteratively update the parameters in the feature embedding module by the weighted-digraph-based quadruplet loss. Finally, WANDER outputs an outlying subspace used to interpret the query outlier through an outlying subspace extraction module. Extensive experiments show that WANDER outperforms state-of-the-art methods, achieving improvements in AUPRC, AUROC, Jaccard Index, and

F_{1}

scores of up to 25.3%, 16.5%, 37.4%, and 28.4%, respectively, across seven real-world datasets. Our datasets and source code are publicly available at https://github.com/KDDElab/WANDER1.

{"title":"Weighted-digraph-guided multi-kernelized learning for outlier explanation","authors":"Lili Guan , Lei Duan , Xinye Wang , Haiying Wang , Rui Lin","doi":"10.1016/j.inffus.2025.103026","DOIUrl":"10.1016/j.inffus.2025.103026","url":null,"abstract":"<div><div>Outlier explanation methods based on outlying subspace mining have been widely used in various applications due to their effectiveness and explainability. These existing methods aim to find an outlying subspace of the original space (a set of features) that can clearly distinguish a query outlier from all inliers. However, when the query outlier in the original space are linearly inseparable from inliers, these existing methods may not be able to accurately identify an outlying subspace that effectively distinguishes the query outlier from all inliers. Moreover, these methods ignore differences between the query outlier and other outliers. In this paper, we propose a novel method named WANDER (<strong>W</strong>ighted-digr<strong>A</strong>ph-Guided Multi-Ker<strong>N</strong>elize<strong>D</strong> l<strong>E</strong>a<strong>R</strong>ning) for outlier explanation, aiming to learn an optimal outlying subspace that can separate the query outlier from other outliers and the inliers simultaneously. Specifically, we first design a quadruplet sampling module to transform the original dataset into a set of quadruplets to mitigate extreme data imbalances and to help the explainer better capture the differences among the query outlier, other outliers, and inliers. Then we design a weighted digraph generation module to capture the geometric structure in each quadruplet within the original space. In order to consider the condition that quadruplets are linearly inseparable in the original space, we further construct a feature embedding module to map the set of quadruplets from the original space to a kernelized embedding space. To find the optimal kernelized embedding space, we design an outlying measure module to iteratively update the parameters in the feature embedding module by the weighted-digraph-based quadruplet loss. Finally, WANDER outputs an outlying subspace used to interpret the query outlier through an outlying subspace extraction module. Extensive experiments show that WANDER outperforms state-of-the-art methods, achieving improvements in AUPRC, AUROC, Jaccard Index, and <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> scores of up to 25.3%, 16.5%, 37.4%, and 28.4%, respectively, across seven real-world datasets. Our datasets and source code are publicly available at <span><span>https://github.com/KDDElab/WANDER1</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103026"},"PeriodicalIF":14.7,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STA-Net: Spatial–temporal alignment network for hybrid EEG-fNIRS decoding

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-15 DOI: 10.1016/j.inffus.2025.103023

Mutian Liu , Banghua Yang , Lin Meng , Yonghuai Zhang , Shouwei Gao , Peng Zan , Xinxing Xia

Hybrid brain–computer interfaces (BCI) have garnered attention for the capacity to transcend the constraints of single-modality BCI. It is essential to develop innovative fusion methodologies to exploit the high temporal resolution of electroencephalography (EEG) and the high spatial resolution of functional near-infrared spectroscopy (fNIRS). We propose an end-to-end Spatial–Temporal Alignment Network (STA-Net) that achieves precise spatial and temporal alignment between EEG and fNIRS. STA-Net comprises two sub-layers: the fNIRS-guided Spatial Alignment (FGSA) layer and the EEG-guided Temporal Alignment (EGTA) layer. The FGSA layer calculates spatial attention maps from fNRIS to identify sensitive brain regions and spatially aligns EEG with fNIRS through the weighting of EEG channels. The EGTA layer generates temporal attention maps based on the cross-attention mechanism, thereby producing fNIRS signals that are temporally aligned with EEG. This resolves the issue of temporal mismatch caused by the inherent delay of fNIRS. Finally, spatio-temporally aligned EEG-fNIRS signals are fused to classify mental tasks: motor imagery (MI), mental arithmetic (MA), and word generation (WG). STA-Net achieves remarkable performance, with an average accuracy of 69.65% for MI, 85.14% for MA, and 79.03% for WG in subject-specific evaluations, which is superior to state-of-the-art single-modality and multi-modality algorithms. Moreover, STA-Net exhibits less performance degradation in the early stages of tasks compared with the benchmark methods. The spatial–temporal alignment between EEG and fNIRS enhances the performance of hybrid BCI and promotes the decoding of EEG-fNIRS. STA-Net has the potential to establish a new backbone for EEG-fNIRS BCI. The code is available at https://github.com/MutianLiu-SHU/STA-Net.

{"title":"STA-Net: Spatial–temporal alignment network for hybrid EEG-fNIRS decoding","authors":"Mutian Liu , Banghua Yang , Lin Meng , Yonghuai Zhang , Shouwei Gao , Peng Zan , Xinxing Xia","doi":"10.1016/j.inffus.2025.103023","DOIUrl":"10.1016/j.inffus.2025.103023","url":null,"abstract":"<div><div>Hybrid brain–computer interfaces (BCI) have garnered attention for the capacity to transcend the constraints of single-modality BCI. It is essential to develop innovative fusion methodologies to exploit the high temporal resolution of electroencephalography (EEG) and the high spatial resolution of functional near-infrared spectroscopy (fNIRS). We propose an end-to-end Spatial–Temporal Alignment Network (STA-Net) that achieves precise spatial and temporal alignment between EEG and fNIRS. STA-Net comprises two sub-layers: the fNIRS-guided Spatial Alignment (FGSA) layer and the EEG-guided Temporal Alignment (EGTA) layer. The FGSA layer calculates spatial attention maps from fNRIS to identify sensitive brain regions and spatially aligns EEG with fNIRS through the weighting of EEG channels. The EGTA layer generates temporal attention maps based on the cross-attention mechanism, thereby producing fNIRS signals that are temporally aligned with EEG. This resolves the issue of temporal mismatch caused by the inherent delay of fNIRS. Finally, spatio-temporally aligned EEG-fNIRS signals are fused to classify mental tasks: motor imagery (MI), mental arithmetic (MA), and word generation (WG). STA-Net achieves remarkable performance, with an average accuracy of 69.65% for MI, 85.14% for MA, and 79.03% for WG in subject-specific evaluations, which is superior to state-of-the-art single-modality and multi-modality algorithms. Moreover, STA-Net exhibits less performance degradation in the early stages of tasks compared with the benchmark methods. The spatial–temporal alignment between EEG and fNIRS enhances the performance of hybrid BCI and promotes the decoding of EEG-fNIRS. STA-Net has the potential to establish a new backbone for EEG-fNIRS BCI. The code is available at <span><span>https://github.com/MutianLiu-SHU/STA-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103023"},"PeriodicalIF":14.7,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NeuralOOD: Improving out-of-distribution generalization performance with brain-machine fusion learning framework

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-14 DOI: 10.1016/j.inffus.2025.103021

Shuangchen Zhao , Changde Du , Jingze Li , Hui Li , Huiguang He

Deep Neural Networks (DNNs) have demonstrated exceptional recognition capabilities in traditional computer vision (CV) tasks. However, existing CV models often suffer a significant decrease in accuracy when confronted with out-of-distribution (OOD) data. In contrast to these DNN models, human can maintain a consistently low error rate when facing OOD scenes, partly attributed to the rich prior cognitive knowledge stored in the human brain. Previous OOD generalization researches only focus on the single modal, overlooking the advantages of multimodal learning method. In this paper, we utilize the multimodal learning method to improve the OOD generalization and propose a novel Brain-machine Fusion Learning (BMFL) framework. We adopt the cross-attention mechanism to fuse the visual knowledge from CV model and prior cognitive knowledge from the human brain. Specially, we employ a pre-trained visual neural encoding model to predict the functional Magnetic Resonance Imaging (fMRI) from visual features which eliminates the need for the fMRI data collection and pre-processing, effectively reduces the workload associated with conventional BMFL methods. Furthermore, we construct a brain transformer to facilitate the extraction of knowledge inside the fMRI data. Moreover, we introduce the Pearson correlation coefficient maximization regularization method into the training process, which improves the fusion capability with better constrains. Our model outperforms the DINOv2 and baseline models on the ImageNet-1k validation dataset as well as on carefully curated OOD datasets, showcasing its superior performance in diverse scenarios.

{"title":"NeuralOOD: Improving out-of-distribution generalization performance with brain-machine fusion learning framework","authors":"Shuangchen Zhao , Changde Du , Jingze Li , Hui Li , Huiguang He","doi":"10.1016/j.inffus.2025.103021","DOIUrl":"10.1016/j.inffus.2025.103021","url":null,"abstract":"<div><div>Deep Neural Networks (DNNs) have demonstrated exceptional recognition capabilities in traditional computer vision (CV) tasks. However, existing CV models often suffer a significant decrease in accuracy when confronted with out-of-distribution (OOD) data. In contrast to these DNN models, human can maintain a consistently low error rate when facing OOD scenes, partly attributed to the rich prior cognitive knowledge stored in the human brain. Previous OOD generalization researches only focus on the single modal, overlooking the advantages of multimodal learning method. In this paper, we utilize the multimodal learning method to improve the OOD generalization and propose a novel Brain-machine Fusion Learning (BMFL) framework. We adopt the cross-attention mechanism to fuse the visual knowledge from CV model and prior cognitive knowledge from the human brain. Specially, we employ a pre-trained visual neural encoding model to predict the functional Magnetic Resonance Imaging (fMRI) from visual features which eliminates the need for the fMRI data collection and pre-processing, effectively reduces the workload associated with conventional BMFL methods. Furthermore, we construct a brain transformer to facilitate the extraction of knowledge inside the fMRI data. Moreover, we introduce the Pearson correlation coefficient maximization regularization method into the training process, which improves the fusion capability with better constrains. Our model outperforms the DINOv2 and baseline models on the ImageNet-1k validation dataset as well as on carefully curated OOD datasets, showcasing its superior performance in diverse scenarios.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103021"},"PeriodicalIF":14.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143444752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CCSUMSP: A cross-subject Chinese speech decoding framework with unified topology and multi-modal semantic pre-training

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-14 DOI: 10.1016/j.inffus.2025.103022

Shuai Huang, Yongxiong Wang, Huan Luo

Decoding speech from brain signals has been a long-standing challenge in neuroscience and brain–computer interface research. While significant progress has been made in English speech decoding, cross-subject Chinese speech decoding remains understudied, despite its potential applications and unique linguistic characteristics. Chinese, with its logographic writing system and tonal nature, presents unique challenges for neural decoding, including complex visual processing of characters and the need to distinguish subtle tonal differences that can alter word meanings. In this paper, we propose Cross-Subject Chinese Speech Decoding Framework with Unified Topology and Multi-Modal Semantic Pre-training(CCSUMSP), a novel framework for cross-subject Chinese speech decoding from electroencephalogram (EEG) signals. There are three key innovations in our approach: (1) We develop a unified topological representation(UTR) that can accommodate various EEG montages, enabling better generalization across subjects and recording setups; (2) We design a multi-modal semantic pre-training strategy by using both EEG and eye-tracking data to capture richer linguistic information; (3) We introduce a dynamic multi-view decoder(DMD) where the weights of different brain regions can be adaptively adjusted based on input signals. In contrast to the state-of-the-art methods, this article presents significant improvements in cross-subject decoding accuracy and generalization by evaluating our framework on the ChineseEEG dataset. Moreover, through our work, we advance the field of EEG-based speech decoding and provide insights into the neural mechanisms underlying Chinese language processing. Finally, the framework we proposed is potentially employed in assistive communication technologies and neural rehabilitation for Chinese speakers.

{"title":"CCSUMSP: A cross-subject Chinese speech decoding framework with unified topology and multi-modal semantic pre-training","authors":"Shuai Huang, Yongxiong Wang, Huan Luo","doi":"10.1016/j.inffus.2025.103022","DOIUrl":"10.1016/j.inffus.2025.103022","url":null,"abstract":"<div><div>Decoding speech from brain signals has been a long-standing challenge in neuroscience and brain–computer interface research. While significant progress has been made in English speech decoding, cross-subject Chinese speech decoding remains understudied, despite its potential applications and unique linguistic characteristics. Chinese, with its logographic writing system and tonal nature, presents unique challenges for neural decoding, including complex visual processing of characters and the need to distinguish subtle tonal differences that can alter word meanings. In this paper, we propose Cross-Subject Chinese Speech Decoding Framework with Unified Topology and Multi-Modal Semantic Pre-training(CCSUMSP), a novel framework for cross-subject Chinese speech decoding from electroencephalogram (EEG) signals. There are three key innovations in our approach: (1) We develop a unified topological representation(UTR) that can accommodate various EEG montages, enabling better generalization across subjects and recording setups; (2) We design a multi-modal semantic pre-training strategy by using both EEG and eye-tracking data to capture richer linguistic information; (3) We introduce a dynamic multi-view decoder(DMD) where the weights of different brain regions can be adaptively adjusted based on input signals. In contrast to the state-of-the-art methods, this article presents significant improvements in cross-subject decoding accuracy and generalization by evaluating our framework on the ChineseEEG dataset. Moreover, through our work, we advance the field of EEG-based speech decoding and provide insights into the neural mechanisms underlying Chinese language processing. Finally, the framework we proposed is potentially employed in assistive communication technologies and neural rehabilitation for Chinese speakers.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103022"},"PeriodicalIF":14.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143428105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mutual-support generalized category discovery

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-14 DOI: 10.1016/j.inffus.2025.103020

Yu Duan , Zhanxuan Hu , Rong Wang , Zhensheng Sun , Feiping Nie , Xuelong Li

This work focuses on the problem of Generalized Category Discovery (GCD), a more realistic and challenging semi-supervised learning setting where unlabeled data may belong to either previously known or unseen categories. Recent advancements have demonstrated the efficacy of both pseudo-label-based parametric classification methods and representation-based non-parametric classification methods in tackling this problem. However, there exists a gap in the literature concerning the integration of their respective advantages. The former tends to be biased towards the ’Old’ categories, making it easier to classify samples into the ’Old’ groups. The latter cannot learn discriminative representations, decreasing the clustering performance. To this end, we propose Mutual-Support Generalized Category Discovery (MSGCD), a framework that unifies these two paradigms, leveraging their strengths in a mutually reinforcing manner. It simultaneously learns high-quality pseudo-labels and discriminative representations. It incorporates a novel Mutual-Support mechanism to facilitate symbiotic enhancement. Specifically, high-quality pseudo-labels furnish valuable weakly supervised information for learning discriminative representations, while discriminative representations enable the estimation of semantic similarity between samples, guiding the model in generating more reliable pseudo-labels. MSGCD is remarkably effective, achieving state-of-the-art results on several datasets. Moreover, Mutual-Support mechanism is not only effective in image classification tasks, but also provides intuition for cross-modal representation learning, open-world image segmentation, and recognition. The codes is available at https://github.com/DuannYu/MSGCD.

{"title":"Mutual-support generalized category discovery","authors":"Yu Duan , Zhanxuan Hu , Rong Wang , Zhensheng Sun , Feiping Nie , Xuelong Li","doi":"10.1016/j.inffus.2025.103020","DOIUrl":"10.1016/j.inffus.2025.103020","url":null,"abstract":"<div><div>This work focuses on the problem of Generalized Category Discovery (GCD), a more realistic and challenging semi-supervised learning setting where unlabeled data may belong to either previously known or unseen categories. Recent advancements have demonstrated the efficacy of both pseudo-label-based parametric classification methods and representation-based non-parametric classification methods in tackling this problem. However, there exists a gap in the literature concerning the integration of their respective advantages. The former tends to be biased towards the ’Old’ categories, making it easier to classify samples into the ’Old’ groups. The latter cannot learn discriminative representations, decreasing the clustering performance. To this end, we propose Mutual-Support Generalized Category Discovery (MSGCD), a framework that unifies these two paradigms, leveraging their strengths in a mutually reinforcing manner. It simultaneously learns high-quality pseudo-labels and discriminative representations. It incorporates a novel <em>Mutual-Support mechanism</em> to facilitate symbiotic enhancement. Specifically, high-quality pseudo-labels furnish valuable weakly supervised information for learning discriminative representations, while discriminative representations enable the estimation of semantic similarity between samples, guiding the model in generating more reliable pseudo-labels. MSGCD is remarkably effective, achieving state-of-the-art results on several datasets. Moreover, <em>Mutual-Support mechanism</em> is not only effective in image classification tasks, but also provides intuition for cross-modal representation learning, open-world image segmentation, and recognition. The codes is available at <span><span>https://github.com/DuannYu/MSGCD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103020"},"PeriodicalIF":14.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning to estimate probabilistic attitude via matrix Fisher distribution with inertial sensor

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-12 DOI: 10.1016/j.inffus.2025.103001

Yuqiang Jin , Wen-An Zhang , Ling Shi

We propose a probabilistic framework for attitude estimation using the matrix Fisher distribution on the special orthogonal group

SO (3)

. To be specific, a deep neural network is first designed to estimate the attitude probability distributions over

SO (3)

, i.e., the unconstrained parameters of matrix Fisher distribution. The network takes common inertial measurements as input and overcomes the challenge of learning a regression model due to the topology difference between the

R^{N}

and

SO (3)

. We achieve this by using negative log-likelihood loss, which provides a loss function with desirable properties, such as convexity. Subsequently, a Bayesian fusion framework is introduced for properly fusing the measurement probability distribution estimated by the network with the kinematic model of the rigid body to recover the accurate attitude, and simultaneously estimate the time-varying gyroscope bias. The overall Bayesian estimator can be divided into two key steps: uncertainty propagation based on attitude kinematics and measurement update based on the learned network outputs. Finally, extensive experiments are conducted on several datasets representing driving, flying and walking scenarios, and the results demonstrate a promising performance of the proposed method compared with the state-of-the-art ones. Moreover, we present some ablation experimental results and discuss the application of our method to vehicle dead reckoning by combining it with a constrained velocity model.

{"title":"Learning to estimate probabilistic attitude via matrix Fisher distribution with inertial sensor","authors":"Yuqiang Jin , Wen-An Zhang , Ling Shi","doi":"10.1016/j.inffus.2025.103001","DOIUrl":"10.1016/j.inffus.2025.103001","url":null,"abstract":"<div><div>We propose a probabilistic framework for attitude estimation using the matrix Fisher distribution on the special orthogonal group <span><math><mrow><mo>SO</mo><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math></span>. To be specific, a deep neural network is first designed to estimate the attitude probability distributions over <span><math><mrow><mo>SO</mo><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math></span>, i.e., the unconstrained parameters of matrix Fisher distribution. The network takes common inertial measurements as input and overcomes the challenge of learning a regression model due to the topology difference between the <span><math><msup><mrow><mi>R</mi></mrow><mrow><mi>N</mi></mrow></msup></math></span> and <span><math><mrow><mo>SO</mo><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math></span>. We achieve this by using negative log-likelihood loss, which provides a loss function with desirable properties, such as convexity. Subsequently, a Bayesian fusion framework is introduced for properly fusing the measurement probability distribution estimated by the network with the kinematic model of the rigid body to recover the accurate attitude, and simultaneously estimate the time-varying gyroscope bias. The overall Bayesian estimator can be divided into two key steps: uncertainty propagation based on attitude kinematics and measurement update based on the learned network outputs. Finally, extensive experiments are conducted on several datasets representing driving, flying and walking scenarios, and the results demonstrate a promising performance of the proposed method compared with the state-of-the-art ones. Moreover, we present some ablation experimental results and discuss the application of our method to vehicle dead reckoning by combining it with a constrained velocity model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103001"},"PeriodicalIF":14.7,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143428099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-scale convolutional attention frequency-enhanced transformer network for medical image segmentation 用于医学图像分割的多尺度卷积注意力频率增强变换器网络

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-12 DOI: 10.1016/j.inffus.2025.103019

Shun Yan, Benquan Yang, Aihua Chen, Xiaoming Zhao, Shiqing Zhang

Automatic segmentation of medical images plays a crucial role in assisting doctors with diagnosis and treatment planning. Among them, multi-scale vision transformer has become a powerful tool for medical image segmentation. However, due to its overly aggressive self-attention design leads to issues such as insufficient local feature extraction and lack of detailed feature information. To address these problems, this study proposes Multi-Scale Convolutional Attention Frequency-Enhanced Transformer Network (MCAFT), which includes Multi-Scale Convolutional Attention Frequency-Enhanced Transformer Modules (MCAFTM) and Multi-Scale Progressive Gate-Spatial Attention (MSGA). MCAFTM employs channel, spatial mechanisms, which are highly effective in capturing complex spatial relationships while focusing on prominent regions. Additionally, it applies Discrete Wavelet Transform (DWT) to decompose input feature maps into sub-bands: low-frequency sub-band (

L L

), which captures overall structural information, and high-frequency sub-bands (

L H

,

H L

,

H H

) which retain fine-grained details such as edges and textures. Subsequently, an efficient transformer and reverse attention mechanism are employed to enhance contextual attention and boundary information. The proposed MSGA enhances multi-scale context, adaptively modeling inter-scale dependencies to bridge the semantic gap between encoder and decoder modules. Extensive experiments are conducted on several representative medical image segmentation tasks, including synapse abdominal multi-organ, cardiac organ, and polyp lesions. The proposed MCAFTM achieves DICE scores of 83.87 and 92.32 for synapse abdominal multi-organ and cardiac organ segmentation, respectively. For five polyp datasets (ClinicDB, Kvasir, ColonDB, ETIS, CVC-T), MCAFTM obtaines DICE scores of 94.49, 92.62, 81.07, 78.68, and 88.91 respectively. These results demonstrate that both MCAFTM and MSGA are effective architectures.

{"title":"Multi-scale convolutional attention frequency-enhanced transformer network for medical image segmentation","authors":"Shun Yan, Benquan Yang, Aihua Chen, Xiaoming Zhao, Shiqing Zhang","doi":"10.1016/j.inffus.2025.103019","DOIUrl":"10.1016/j.inffus.2025.103019","url":null,"abstract":"<div><div>Automatic segmentation of medical images plays a crucial role in assisting doctors with diagnosis and treatment planning. Among them, multi-scale vision transformer has become a powerful tool for medical image segmentation. However, due to its overly aggressive self-attention design leads to issues such as insufficient local feature extraction and lack of detailed feature information. To address these problems, this study proposes Multi-Scale Convolutional Attention Frequency-Enhanced Transformer Network (MCAFT), which includes Multi-Scale Convolutional Attention Frequency-Enhanced Transformer Modules (MCAFTM) and Multi-Scale Progressive Gate-Spatial Attention (MSGA). MCAFTM employs channel, spatial mechanisms, which are highly effective in capturing complex spatial relationships while focusing on prominent regions. Additionally, it applies Discrete Wavelet Transform (DWT) to decompose input feature maps into sub-bands: low-frequency sub-band (<span><math><mrow><mi>L</mi><mi>L</mi></mrow></math></span>), which captures overall structural information, and high-frequency sub-bands (<span><math><mrow><mi>L</mi><mi>H</mi></mrow></math></span>, <span><math><mrow><mi>H</mi><mi>L</mi></mrow></math></span>, <span><math><mrow><mi>H</mi><mi>H</mi></mrow></math></span>) which retain fine-grained details such as edges and textures. Subsequently, an efficient transformer and reverse attention mechanism are employed to enhance contextual attention and boundary information. The proposed MSGA enhances multi-scale context, adaptively modeling inter-scale dependencies to bridge the semantic gap between encoder and decoder modules. Extensive experiments are conducted on several representative medical image segmentation tasks, including synapse abdominal multi-organ, cardiac organ, and polyp lesions. The proposed MCAFTM achieves DICE scores of 83.87 and 92.32 for synapse abdominal multi-organ and cardiac organ segmentation, respectively. For five polyp datasets (ClinicDB, Kvasir, ColonDB, ETIS, CVC-T), MCAFTM obtaines DICE scores of 94.49, 92.62, 81.07, 78.68, and 88.91 respectively. These results demonstrate that both MCAFTM and MSGA are effective architectures.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103019"},"PeriodicalIF":14.7,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143421073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multiview-slice feature fusion network for early diagnosis of Alzheimer’s disease with structural MRI images 利用结构性核磁共振成像图像早期诊断阿尔茨海默病的多视图切片特征融合网络

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-12 DOI: 10.1016/j.inffus.2025.103010

Hesheng Huang , Witold Pedrycz , Kaoru Hirota , Fei Yan

Alzheimer’s disease (AD) is a prevalent neurodegenerative disorder with high incidence and significant mortality among the elderly worldwide. Nevertheless, early and accurate diagnosis and treatment of the disease could delay its progression to evolve into more severe phases. Traditional methods, which are largely binary in classification, often struggle with the complexity of multi-classification tasks, resulting in lower accuracy rates. To address this, we propose a multiview-slice feature fusion network that uses structural magnetic resonance imaging (MRI) images to accurately distinguish between normal control (NC), mild cognitive impairment (MCI), and AD subjects. The scientific contribution of this study resides in the innovative integration of a multi-view scheme with lightweight networks to design sophisticated modules, significantly enhancing the accuracy of early AD multi-classification. Firstly, an improved EfficientNet model is designed using the DropBlock technique, i.e., DB-EfficientNet, which can be incorporated with MobileViT and ShuffleNet V2 models to develop a novel hybrid deep feature extraction approach for extracting the features of multi-view slices. In addition, a feature enhancement and aggregation module is proposed based on the original hybrid attention (HBA) mechanism to strengthen the global representation of multi-view slices by capturing the dependencies among long-distance features. Meanwhile, a feature fusion strategy is formulated to efficiently fuse the refined features to further improve the accuracy of the model. To validate the performance of the proposed network for AD diagnosis, a range of experiments were implemented using the public Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The proposed scheme yielded average accuracies of 97.64%, 96.47%, 96.48%, and 95.31% on the ADNI-1 dataset as well as 97.71%, 97.48%, 96.30%, and 96.14% on the ADNI-2 dataset for AD/NC, MCI/NC, AD/MCI, and AD/MCI/NC classifications, respectively. These results have demonstrated that our scheme outperforms recent comparable methods, thus offering a significant reference for early and accurate AD diagnosis.

{"title":"A multiview-slice feature fusion network for early diagnosis of Alzheimer’s disease with structural MRI images","authors":"Hesheng Huang , Witold Pedrycz , Kaoru Hirota , Fei Yan","doi":"10.1016/j.inffus.2025.103010","DOIUrl":"10.1016/j.inffus.2025.103010","url":null,"abstract":"<div><div>Alzheimer’s disease (AD) is a prevalent neurodegenerative disorder with high incidence and significant mortality among the elderly worldwide. Nevertheless, early and accurate diagnosis and treatment of the disease could delay its progression to evolve into more severe phases. Traditional methods, which are largely binary in classification, often struggle with the complexity of multi-classification tasks, resulting in lower accuracy rates. To address this, we propose a multiview-slice feature fusion network that uses structural magnetic resonance imaging (MRI) images to accurately distinguish between normal control (NC), mild cognitive impairment (MCI), and AD subjects. The scientific contribution of this study resides in the innovative integration of a multi-view scheme with lightweight networks to design sophisticated modules, significantly enhancing the accuracy of early AD multi-classification. Firstly, an improved EfficientNet model is designed using the DropBlock technique, i.e., DB-EfficientNet, which can be incorporated with MobileViT and ShuffleNet V2 models to develop a novel hybrid deep feature extraction approach for extracting the features of multi-view slices. In addition, a feature enhancement and aggregation module is proposed based on the original hybrid attention (HBA) mechanism to strengthen the global representation of multi-view slices by capturing the dependencies among long-distance features. Meanwhile, a feature fusion strategy is formulated to efficiently fuse the refined features to further improve the accuracy of the model. To validate the performance of the proposed network for AD diagnosis, a range of experiments were implemented using the public Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The proposed scheme yielded average accuracies of 97.64%, 96.47%, 96.48%, and 95.31% on the ADNI-1 dataset as well as 97.71%, 97.48%, 96.30%, and 96.14% on the ADNI-2 dataset for AD/NC, MCI/NC, AD/MCI, and AD/MCI/NC classifications, respectively. These results have demonstrated that our scheme outperforms recent comparable methods, thus offering a significant reference for early and accurate AD diagnosis.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103010"},"PeriodicalIF":14.7,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anomaly detection for microservice system via augmented multimodal data and hybrid graph representations

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-11 DOI: 10.1016/j.inffus.2025.103017

Peipeng Wang, Xiuguo Zhang, Zhiying Cao

Accurate anomaly detection is essential for ensuring the reliability of microservice systems. Current approaches typically analyze system anomaly patterns using single-modal data (i.e., traces, metrics, and logs) while neglecting the class imbalance between normal and abnormal samples, which can easily lead to misjudgment. This paper propose AMulSys, a graph-based anomaly detection approach, which adopts a hierarchical architecture to simultaneously model three modal data. First, we employ a unified graph structure to analyze the intricate scheduling relationships of traces and integrates various metrics to represent the system’s resource consumption. Meanwhile, to capture the execution paths between logs in interactive services, we design a log heterogeneous graph modeling method, where the service to which the log event belongs is represented as a vertex attribute, and allows distinguishing the paths of the same and different services. Second, we propose a category prior guided multimodal data augmentation algorithm to alleviate class imbalance. It extends the Mixup to multimodal representations and incorporates category priors to bias the synthetic samples labels to the abnormal category. Furthermore, considering the impact of hard samples under class imbalance, we select and assign weights to hard samples, and perform multifaceted contrastive learning based on intra-/inter-modal and inter-sample to optimize pseudo sample. Evaluation results on two real microservice system datasets show that AMulSys outperforms state-of-the-art approaches and achieves an F1-score higher than 0.97.

{"title":"Anomaly detection for microservice system via augmented multimodal data and hybrid graph representations","authors":"Peipeng Wang, Xiuguo Zhang, Zhiying Cao","doi":"10.1016/j.inffus.2025.103017","DOIUrl":"10.1016/j.inffus.2025.103017","url":null,"abstract":"<div><div>Accurate anomaly detection is essential for ensuring the reliability of microservice systems. Current approaches typically analyze system anomaly patterns using single-modal data (i.e., traces, metrics, and logs) while neglecting the class imbalance between normal and abnormal samples, which can easily lead to misjudgment. This paper propose AMulSys, a graph-based anomaly detection approach, which adopts a hierarchical architecture to simultaneously model three modal data. First, we employ a unified graph structure to analyze the intricate scheduling relationships of traces and integrates various metrics to represent the system’s resource consumption. Meanwhile, to capture the execution paths between logs in interactive services, we design a log heterogeneous graph modeling method, where the service to which the log event belongs is represented as a vertex attribute, and allows distinguishing the paths of the same and different services. Second, we propose a category prior guided multimodal data augmentation algorithm to alleviate class imbalance. It extends the Mixup to multimodal representations and incorporates category priors to bias the synthetic samples labels to the abnormal category. Furthermore, considering the impact of hard samples under class imbalance, we select and assign weights to hard samples, and perform multifaceted contrastive learning based on intra-/inter-modal and inter-sample to optimize pseudo sample. Evaluation results on two real microservice system datasets show that AMulSys outperforms state-of-the-art approaches and achieves an F1-score higher than 0.97.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 103017"},"PeriodicalIF":14.7,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143394262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0