Neurocomputing最新文献

英文中文

Correlation-based switching mean teacher for semi-supervised medical image segmentation

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-03-03 DOI: 10.1016/j.neucom.2025.129818

Guiyuhan Deng , Hao Sun , Wei Xie

The mean teacher framework is one of the mainstream approaches in semi-supervised medical image segmentation. While training together in the traditional mean teacher framework, the teacher model and the student model share the same structure. An Exponential Moving Average (EMA) updating strategy is applied to optimize the teacher model. Although the EMA approach facilitates a smooth training process, it causes the model coupling and error accumulation problems. These issues constrain the model from precisely delineating the regions of pathological structures, especially for the low-contrast regions in medical images. In this paper, we propose a new semi-supervised segmentation model, namely Correlation-based Switching Mean Teacher (CS-MT), which comprises two teacher models and one student model to alleviate these problems. Particularly, two teacher models adopt a switching training strategy at every epoch to avoid the convergence and similarity between the teacher models and the student model. In addition, we introduce a feature correlation module in each model to leverage the similarity information in the feature maps to improve the model’s predictions. Furthermore, the stochastic process of CutMix operation destroys the structures of organs in medical images, generating adverse mixed results. We propose an adaptive CutMix manner to mitigate the negative effects of these mixed results in model training. Extensive experiments validate that CS-MT outperforms the state-of-the-art semi-supervised methods on the LA, Pancreas-NIH, ACDC and BraTS 2019 datasets.

{"title":"Correlation-based switching mean teacher for semi-supervised medical image segmentation","authors":"Guiyuhan Deng , Hao Sun , Wei Xie","doi":"10.1016/j.neucom.2025.129818","DOIUrl":"10.1016/j.neucom.2025.129818","url":null,"abstract":"<div><div>The mean teacher framework is one of the mainstream approaches in semi-supervised medical image segmentation. While training together in the traditional mean teacher framework, the teacher model and the student model share the same structure. An Exponential Moving Average (EMA) updating strategy is applied to optimize the teacher model. Although the EMA approach facilitates a smooth training process, it causes the model coupling and error accumulation problems. These issues constrain the model from precisely delineating the regions of pathological structures, especially for the low-contrast regions in medical images. In this paper, we propose a new semi-supervised segmentation model, namely Correlation-based Switching Mean Teacher (CS-MT), which comprises two teacher models and one student model to alleviate these problems. Particularly, two teacher models adopt a switching training strategy at every epoch to avoid the convergence and similarity between the teacher models and the student model. In addition, we introduce a feature correlation module in each model to leverage the similarity information in the feature maps to improve the model’s predictions. Furthermore, the stochastic process of CutMix operation destroys the structures of organs in medical images, generating adverse mixed results. We propose an adaptive CutMix manner to mitigate the negative effects of these mixed results in model training. Extensive experiments validate that CS-MT outperforms the state-of-the-art semi-supervised methods on the LA, Pancreas-NIH, ACDC and BraTS 2019 datasets.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"633 ","pages":"Article 129818"},"PeriodicalIF":5.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Zero-shot low-dose CT denoising across variable schemes via strip-scanning diffusion models

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-03-03 DOI: 10.1016/j.neucom.2025.129828

Bo Su , Jiabo Xu , Xiangyun Hu , Yunfei Zha , Jun Wan , Jiancheng Li

Artifacts and noise in low-dose CT (LDCT) may degrade image quality, potentially impacting subsequent diagnoses. In recent years, supervised image post-processing methods have been extensively studied for their effectiveness in noise reduction. However, clinical conditions often make it difficult to obtain paired normal-dose and low-dose CT images. Additionally, scanning protocols in clinical settings are diverse, necessitating different thickness or dose settings, which further complicates and increases the cost of low-dose data collection. These challenges limit the practical application and widespread adoption of supervised methods. This study introduces a novel end-to-end zero-shot strip-scanning diffusion model (SSDiff) that requires only a single model trained on normal-dose CT (NDCT) images to achieve LDCT image denoising across various scanning protocols with different slice thicknesses, doses, or devices. The sampling process employs a strip scanning strategy that combines overlapping strip information and input LDCT images to solve the maximum a posteriori problem to produce denoising results sequentially. We use only simple convolutional and attentional architectures and perform extensive experiments on three different datasets involving different doses, thicknesses, and devices; the results show that our method outperforms supervised methods in most cases, and visualization and blinded evaluations indicate that our method is very close to NDCT.

{"title":"Zero-shot low-dose CT denoising across variable schemes via strip-scanning diffusion models","authors":"Bo Su , Jiabo Xu , Xiangyun Hu , Yunfei Zha , Jun Wan , Jiancheng Li","doi":"10.1016/j.neucom.2025.129828","DOIUrl":"10.1016/j.neucom.2025.129828","url":null,"abstract":"<div><div>Artifacts and noise in low-dose CT (LDCT) may degrade image quality, potentially impacting subsequent diagnoses. In recent years, supervised image post-processing methods have been extensively studied for their effectiveness in noise reduction. However, clinical conditions often make it difficult to obtain paired normal-dose and low-dose CT images. Additionally, scanning protocols in clinical settings are diverse, necessitating different thickness or dose settings, which further complicates and increases the cost of low-dose data collection. These challenges limit the practical application and widespread adoption of supervised methods. This study introduces a novel end-to-end zero-shot strip-scanning diffusion model (SSDiff) that requires only a single model trained on normal-dose CT (NDCT) images to achieve LDCT image denoising across various scanning protocols with different slice thicknesses, doses, or devices. The sampling process employs a strip scanning strategy that combines overlapping strip information and input LDCT images to solve the maximum a posteriori problem to produce denoising results sequentially. We use only simple convolutional and attentional architectures and perform extensive experiments on three different datasets involving different doses, thicknesses, and devices; the results show that our method outperforms supervised methods in most cases, and visualization and blinded evaluations indicate that our method is very close to NDCT.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"633 ","pages":"Article 129828"},"PeriodicalIF":5.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CMGN: Text GNN and RWKV MLP-mixer combined with cross-feature fusion for fake news detection

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-03-03 DOI: 10.1016/j.neucom.2025.129811

ShaoDong Cui, Kaibo Duan, Wen Ma, Hiroyuki Shinnou

With the rapid development of social media, the influence and harm of fake news have gradually increased, making accurate detection of fake news particularly important. Current fake news detection methods primarily rely on the main text of the news, neglecting the interrelationships between additional texts. We propose a cross-feature fusion network with additional text graph construction to address this issue and improve fake news detection. Specifically, we utilize a text graph neural network (GNN) to model the graph relationships of additional texts to enhance the model’s perception capabilities. Additionally, we employ the RWKV MLP-mixer to process the news text and design a cross-feature fusion mechanism to achieve mutual fusion of different features, thereby improving fake news detection. Experiments on the LIAR, FA-KES, IFND, and CHEF datasets demonstrate that our proposed model outperforms existing methods in fake news detection.

引用次数: 0

A novel multi-scale salient object detection framework utilizing nonlinear spiking neural P systems 利用非线性尖峰神经 P 系统的新型多尺度突出物体检测框架

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-03-03 DOI: 10.1016/j.neucom.2025.129821

Nan Zhou, Minglong He, Hong Peng, Zhicai Liu

Salient object detection (SOD) is fundamental to computer vision applications ranging from autonomous driving and surveillance to medical imaging. Despite significant progress, existing methods struggle to effectively model multi-scale features and their complex interdependencies, particularly in challenging real-world scenarios with complex backgrounds and varying scales. To address these limitations, this paper proposes a novel detection framework that leverages the hierarchical processing capabilities of nonlinear spiking neural P (NSNP) systems. The proposed framework introduces three key innovations: a bio-inspired convolution mechanism that captures fine-grained local features with neural dynamics; a semantic learning module enhanced by Contextual Transformer Attention for comprehensive global context understanding; and an adaptive mixed attention-based fusion strategy that optimizes cross-scale feature integration. The experimental results on four challenging benchmark datasets demonstrate that the proposed method outperforms fourteen other state-of-the-art methods, achieving average improvements of 1.02%, 1.3%, 2.3%, and 0.1% on the four evaluation metrics (

S_{m}

E_{ξ}^{m}

F_{β}^{w}

, and

M A E

), respectively. These advances validate the potential of spiking neural P systems in salient object detection, while opening new possibilities for bio-inspired approaches in visual computing.

{"title":"A novel multi-scale salient object detection framework utilizing nonlinear spiking neural P systems","authors":"Nan Zhou, Minglong He, Hong Peng, Zhicai Liu","doi":"10.1016/j.neucom.2025.129821","DOIUrl":"10.1016/j.neucom.2025.129821","url":null,"abstract":"<div><div>Salient object detection (SOD) is fundamental to computer vision applications ranging from autonomous driving and surveillance to medical imaging. Despite significant progress, existing methods struggle to effectively model multi-scale features and their complex interdependencies, particularly in challenging real-world scenarios with complex backgrounds and varying scales. To address these limitations, this paper proposes a novel detection framework that leverages the hierarchical processing capabilities of nonlinear spiking neural P (NSNP) systems. The proposed framework introduces three key innovations: a bio-inspired convolution mechanism that captures fine-grained local features with neural dynamics; a semantic learning module enhanced by Contextual Transformer Attention for comprehensive global context understanding; and an adaptive mixed attention-based fusion strategy that optimizes cross-scale feature integration. The experimental results on four challenging benchmark datasets demonstrate that the proposed method outperforms fourteen other state-of-the-art methods, achieving average improvements of 1.02%, 1.3%, 2.3%, and 0.1% on the four evaluation metrics (<span><math><msub><mrow><mi>S</mi></mrow><mrow><mi>m</mi></mrow></msub></math></span>, <span><math><msubsup><mrow><mi>E</mi></mrow><mrow><mi>ξ</mi></mrow><mrow><mi>m</mi></mrow></msubsup></math></span>, <span><math><msubsup><mrow><mi>F</mi></mrow><mrow><mi>β</mi></mrow><mrow><mi>w</mi></mrow></msubsup></math></span>, and <span><math><mrow><mi>M</mi><mi>A</mi><mi>E</mi></mrow></math></span>), respectively. These advances validate the potential of spiking neural P systems in salient object detection, while opening new possibilities for bio-inspired approaches in visual computing.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"634 ","pages":"Article 129821"},"PeriodicalIF":5.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143561944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A bioinspired model of decision making guided by reward dimensions and a motivational state

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-03-02 DOI: 10.1016/j.neucom.2025.129806

Diana G. Gómez-Martínez , Alison Muñoz-Capote , Oscar Hernández , Francisco Robles , Félix Ramos

The decision-making process is a critical component of computational systems, whose processing involves the evaluation of various alternatives presented as possible solutions to a given problem, depending on the current context. This paper seeks to show how a neuroscience-based decision-making mechanism (DMM) integrating decision criteria, knowledge of reward stimuli, and motivational information helps to contribute to producing human-like adaptive behavior. To fulfill this objective, a computational model on DMM is proposed. The alternatives in this proposed model are constructed based on preferences, and the selection of the best alternative is guided by a goal-directed control scheme influenced by a motivational state (MS). The formation of preferences considers some dimensions of the reward, e.g., magnitude, probability of receiving the reward, incentive salience, and affective value. To validate the model exhibits a behavior considering parameters human being uses to compute its behavior, a case study was proposed. The case study’s objective is to gain the maximum reward (food) from the choice of a 4-choice card (a variation of Iowa Gambling Test), each card has a reward and a contingency probability associated with it. The analysis of the results of the case study shows that the model presents a short exploitation stage to find the contingency rule and choose the best option frequently according to some studies, also observed that the utility value of the card influenced the MS of hunger and other factors play a critical role in the DMM.

{"title":"A bioinspired model of decision making guided by reward dimensions and a motivational state","authors":"Diana G. Gómez-Martínez , Alison Muñoz-Capote , Oscar Hernández , Francisco Robles , Félix Ramos","doi":"10.1016/j.neucom.2025.129806","DOIUrl":"10.1016/j.neucom.2025.129806","url":null,"abstract":"<div><div>The decision-making process is a critical component of computational systems, whose processing involves the evaluation of various alternatives presented as possible solutions to a given problem, depending on the current context. This paper seeks to show how a neuroscience-based decision-making mechanism (DMM) integrating decision criteria, knowledge of reward stimuli, and motivational information helps to contribute to producing human-like adaptive behavior. To fulfill this objective, a computational model on DMM is proposed. The alternatives in this proposed model are constructed based on preferences, and the selection of the best alternative is guided by a goal-directed control scheme influenced by a motivational state (MS). The formation of preferences considers some dimensions of the reward, e.g., magnitude, probability of receiving the reward, incentive salience, and affective value. To validate the model exhibits a behavior considering parameters human being uses to compute its behavior, a case study was proposed. The case study’s objective is to gain the maximum reward (food) from the choice of a 4-choice card (a variation of Iowa Gambling Test), each card has a reward and a contingency probability associated with it. The analysis of the results of the case study shows that the model presents a short exploitation stage to find the contingency rule and choose the best option frequently according to some studies, also observed that the utility value of the card influenced the MS of hunger and other factors play a critical role in the DMM.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"634 ","pages":"Article 129806"},"PeriodicalIF":5.5,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143561942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

H-SGANet: Hybrid sparse graph attention network for deformable medical image registration

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-03-02 DOI: 10.1016/j.neucom.2025.129810

Yufeng Zhou, Wenming Cao

The integration of Convolutional Neural Networks (ConvNets) and Transformers has become a strong candidate for image registration, combining the strengths of both models and utilizing a large parameter space. However, this hybrid model, which treats brain MRI volumes as grid or sequence structures, struggles to accurately represent anatomical connectivity, diverse brain regions, and critical connections within the brain’s architecture. There are also concerns about the computational expense and GPU memory usage of this model. To address these issues, we propose a lightweight hybrid sparse graph attention network (H-SGANet). The network includes Sparse Graph Attention (SGA), a core mechanism based on Vision Graph Neural Networks (ViG) with predefined anatomical connections. The SGA module expands the model’s receptive field and integrates seamlessly into the network. To further enhance the hybrid network, Separable Self-Attention (SSA) is used as an advanced token mixer, combined with depth-wise convolution to form SSAFormer. This strategic integration is designed to more effectively extract long-range dependencies. As a hybrid ConvNet-ViG-Transformer model, H-SGANet offers three key benefits for volumetric medical image registration. It optimizes fixed and moving images simultaneously through a hybrid feature fusion layer and an end-to-end learning framework. Compared to VoxelMorph, a model with a similar parameter count, H-SGANet demonstrates significant performance enhancements of 3.5% and 1.5% in Dice score on the OASIS dataset and LPBA40 dataset, respectively. The code is publicly available at https://github.com/2250432015/H-SGANet/.

{"title":"H-SGANet: Hybrid sparse graph attention network for deformable medical image registration","authors":"Yufeng Zhou, Wenming Cao","doi":"10.1016/j.neucom.2025.129810","DOIUrl":"10.1016/j.neucom.2025.129810","url":null,"abstract":"<div><div>The integration of Convolutional Neural Networks (ConvNets) and Transformers has become a strong candidate for image registration, combining the strengths of both models and utilizing a large parameter space. However, this hybrid model, which treats brain MRI volumes as grid or sequence structures, struggles to accurately represent anatomical connectivity, diverse brain regions, and critical connections within the brain’s architecture. There are also concerns about the computational expense and GPU memory usage of this model. To address these issues, we propose a lightweight hybrid sparse graph attention network (H-SGANet). The network includes Sparse Graph Attention (SGA), a core mechanism based on Vision Graph Neural Networks (ViG) with predefined anatomical connections. The SGA module expands the model’s receptive field and integrates seamlessly into the network. To further enhance the hybrid network, Separable Self-Attention (SSA) is used as an advanced token mixer, combined with depth-wise convolution to form SSAFormer. This strategic integration is designed to more effectively extract long-range dependencies. As a hybrid ConvNet-ViG-Transformer model, H-SGANet offers three key benefits for volumetric medical image registration. It optimizes fixed and moving images simultaneously through a hybrid feature fusion layer and an end-to-end learning framework. Compared to VoxelMorph, a model with a similar parameter count, H-SGANet demonstrates significant performance enhancements of 3.5% and 1.5% in Dice score on the OASIS dataset and LPBA40 dataset, respectively. The code is publicly available at <span><span>https://github.com/2250432015/H-SGANet/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"633 ","pages":"Article 129810"},"PeriodicalIF":5.5,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fairness in constrained spectral clustering

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-03-01 DOI: 10.1016/j.neucom.2025.129815

Laxita Agrawal , V. Vijaya Saradhi , Teena Sharma

Semi-supervised clustering methods have gained significant attention in both theoretical research and real-world applications, including economics, finance, marketing, and healthcare. Among these methods, constrained spectral clustering enhances clustering quality by incorporating pairwise constraints, namely, must-link and cannot-link constraints, which guide the clustering process by specifying whether certain data points should or should not belong to the same cluster. However, traditional constrained spectral clustering methods may inadvertently propagate biases present in the data or constraints, leading to unequal representation of sensitive groups, such as different genders or racial groups, across clusters. This imbalance raises concerns about fairness, an issue that remains largely unexplored in constrained spectral clustering. To address this gap, this paper proposes a novel method named fair-constrained Spectral Clustering (fair-cSC). The proposed method integrates fairness into the must-link and cannot-link constraints by defining a fair constraint matrix, ensuring that pairwise relationships do not introduce bias against any particular group. Additionally, a balance constraint is incorporated to enforce fairness across input data points, promoting equal representation of sensitive groups within clusters. Comprehensive experiments on six benchmarked datasets, including ablation studies, demonstrate that the proposed fair-cSC method effectively enhances fairness while preserving clustering quality. Furthermore, the ablation study provides insights into the method’s performance under different settings, reinforcing its robustness and applicability in real-world scenarios.

{"title":"Fairness in constrained spectral clustering","authors":"Laxita Agrawal , V. Vijaya Saradhi , Teena Sharma","doi":"10.1016/j.neucom.2025.129815","DOIUrl":"10.1016/j.neucom.2025.129815","url":null,"abstract":"<div><div>Semi-supervised clustering methods have gained significant attention in both theoretical research and real-world applications, including economics, finance, marketing, and healthcare. Among these methods, constrained spectral clustering enhances clustering quality by incorporating pairwise constraints, namely, must-link and cannot-link constraints, which guide the clustering process by specifying whether certain data points should or should not belong to the same cluster. However, traditional constrained spectral clustering methods may inadvertently propagate biases present in the data or constraints, leading to unequal representation of sensitive groups, such as different genders or racial groups, across clusters. This imbalance raises concerns about fairness, an issue that remains largely unexplored in constrained spectral clustering. To address this gap, this paper proposes a novel method named fair-constrained Spectral Clustering (fair-cSC). The proposed method integrates fairness into the must-link and cannot-link constraints by defining a fair constraint matrix, ensuring that pairwise relationships do not introduce bias against any particular group. Additionally, a balance constraint is incorporated to enforce fairness across input data points, promoting equal representation of sensitive groups within clusters. Comprehensive experiments on six benchmarked datasets, including ablation studies, demonstrate that the proposed fair-cSC method effectively enhances fairness while preserving clustering quality. Furthermore, the ablation study provides insights into the method’s performance under different settings, reinforcing its robustness and applicability in real-world scenarios.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"634 ","pages":"Article 129815"},"PeriodicalIF":5.5,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Decoupled contrastive learning for multilingual multimodal medical pre-trained model

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-03-01 DOI: 10.1016/j.neucom.2025.129809

Qiyuan Li , Chen Qiu , Haijiang Liu , Jinguang Gu , Dan Luo

Multilingual multimodal pre-training aims to facilitate the integration of conceptual representations across diverse languages and modalities within a shared, high-dimensional semantic space. This endeavor in healthcare faces challenges related to language diversity, suboptimal multimodal interactions, and an absence of coherent multilingual multimodal representations. In response to these challenges, we introduce a novel multilingual multimodal medical pre-training model. Initially, we employ a strategic augmentation of the medical corpus by expanding the MIMIC-CXR report dataset to 20 distinct languages using machine translation techniques. Subsequently, we develop a targeted label disambiguation technique to address the labeling noise within decoupled contrastive learning. In particular, it categorizes and refines uncertain phrases within the clinical reports based on disease type, promoting finer-grained semantic similarity and improving inter-modality interactions. Building on these proposals, we present a refined multilingual multimodal medical pre-trained model, significantly enhancing the understanding of medical multimodal data and adapting the model to multilingual medical contexts. Experiments reveal that our model outperforms other baselines in medical image classification and multilingual medical image–text retrieval by up to 13.78% and 12.6%, respectively.

{"title":"Decoupled contrastive learning for multilingual multimodal medical pre-trained model","authors":"Qiyuan Li , Chen Qiu , Haijiang Liu , Jinguang Gu , Dan Luo","doi":"10.1016/j.neucom.2025.129809","DOIUrl":"10.1016/j.neucom.2025.129809","url":null,"abstract":"<div><div>Multilingual multimodal pre-training aims to facilitate the integration of conceptual representations across diverse languages and modalities within a shared, high-dimensional semantic space. This endeavor in healthcare faces challenges related to language diversity, suboptimal multimodal interactions, and an absence of coherent multilingual multimodal representations. In response to these challenges, we introduce a novel multilingual multimodal medical pre-training model. Initially, we employ a strategic augmentation of the medical corpus by expanding the MIMIC-CXR report dataset to 20 distinct languages using machine translation techniques. Subsequently, we develop a targeted label disambiguation technique to address the labeling noise within decoupled contrastive learning. In particular, it categorizes and refines uncertain phrases within the clinical reports based on disease type, promoting finer-grained semantic similarity and improving inter-modality interactions. Building on these proposals, we present a refined multilingual multimodal medical pre-trained model, significantly enhancing the understanding of medical multimodal data and adapting the model to multilingual medical contexts. Experiments reveal that our model outperforms other baselines in medical image classification and multilingual medical image–text retrieval by up to 13.78% and 12.6%, respectively.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"633 ","pages":"Article 129809"},"PeriodicalIF":5.5,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-modal information fusion for multi-task end-to-end behavior prediction in autonomous driving

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-03-01 DOI: 10.1016/j.neucom.2025.129857

Guo Baicang , Liu Hao , Yang Xiao , Cao Yuan , Jin Lisheng , Wang Yinlin

Behavior prediction in autonomous driving is increasingly achieved through end-to-end frameworks that predict vehicle states from multi-modal information, streamlining decision-making and enhancing robustness in time-varying road conditions. This study proposes a novel multi-modal information fusion-based, multi-task end-to-end model that integrates RGB images, depth maps, and semantic segmentation data, enhancing situational awareness and predictive precision. Utilizing a Vision Transformer (ViT) for comprehensive spatial feature extraction and a Residual-CNN-BiGRU structure for capturing temporal dependencies, the model fuses spatiotemporal features to predict vehicle speed and steering angle with high precision. Through comparative, ablation, and generalization tests on the Udacity and self-collected datasets, the proposed model achieves steering angle prediction errors of MSE 0.012 rad, RMSE 0.109 rad, and MAE 0.074 rad, and speed prediction errors of MSE 0.321 km/h, RMSE 0.567 km/h, and MAE 0.373 km/h, outperforming existing driving behavior prediction models. Key contributions of this study include the development of a channel difference attention mechanism and advanced spatiotemporal feature fusion techniques, which improve predictive accuracy and robustness. These methods effectively balance computational efficiency and predictive performance, contributing to practical advancements in driving behavior prediction.

{"title":"Multi-modal information fusion for multi-task end-to-end behavior prediction in autonomous driving","authors":"Guo Baicang , Liu Hao , Yang Xiao , Cao Yuan , Jin Lisheng , Wang Yinlin","doi":"10.1016/j.neucom.2025.129857","DOIUrl":"10.1016/j.neucom.2025.129857","url":null,"abstract":"<div><div>Behavior prediction in autonomous driving is increasingly achieved through end-to-end frameworks that predict vehicle states from multi-modal information, streamlining decision-making and enhancing robustness in time-varying road conditions. This study proposes a novel multi-modal information fusion-based, multi-task end-to-end model that integrates RGB images, depth maps, and semantic segmentation data, enhancing situational awareness and predictive precision. Utilizing a Vision Transformer (ViT) for comprehensive spatial feature extraction and a Residual-CNN-BiGRU structure for capturing temporal dependencies, the model fuses spatiotemporal features to predict vehicle speed and steering angle with high precision. Through comparative, ablation, and generalization tests on the Udacity and self-collected datasets, the proposed model achieves steering angle prediction errors of MSE 0.012 rad, RMSE 0.109 rad, and MAE 0.074 rad, and speed prediction errors of MSE 0.321 km/h, RMSE 0.567 km/h, and MAE 0.373 km/h, outperforming existing driving behavior prediction models. Key contributions of this study include the development of a channel difference attention mechanism and advanced spatiotemporal feature fusion techniques, which improve predictive accuracy and robustness. These methods effectively balance computational efficiency and predictive performance, contributing to practical advancements in driving behavior prediction.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"634 ","pages":"Article 129857"},"PeriodicalIF":5.5,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on self-adaptive grid point cloud down-sampling method based on plane fitting and Mahalanobis distance Gaussian weighting

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2025-02-28 DOI: 10.1016/j.neucom.2025.129746

Hongfei Zu , Jing Zhu , Xinfeng Wang , Xiang Zhang , Ning Chen , Gangxiang Guo , Zhangwei Chen

In this manuscript, a self-adaptive grid point cloud down-sampling method based on plane fitting was proposed, which could effectively reduce redundant data while better preserving the geometric features of the original model and maintaining high accuracy. This method first constructs initial voxel grids and divides the grids into large density and small density ones according to the point cloud density. After that, for small density grids, the boundary points are extracted first, and the rest areas are uniformly sampled, while for large density grids, a method based on Mahalanobis distance Gaussian weighting is proposed and adopted to estimate the normal vector of points, and feature points are determined and retained by calculating the information entropy. Then, three models in the public dataset, the Cat model, Bed_0355 model and Fandisk model, were employed as test subjects to compare the proposed method with two commonly used down-sampling methods: uniform sampling and voxel grid sampling methods. The results indicated that this new method was able to better retain the geometric features of the original models, especially high curvature and sharp parts, with smaller errors and fewer holes. Finally, this method was applied to the down-sampling of 3D scanning point clouds of two typical metal machine parts, threaded joint and sheet metal part, and the measured results demonstrated that this method not only effectively preserved the model features, but also guaranteed accuracy of key geometric dimensions after high reduction ratio down-sampling, such as the relative errors of thread tooth angles and hole inner diameters being less than 1 %.

{"title":"Research on self-adaptive grid point cloud down-sampling method based on plane fitting and Mahalanobis distance Gaussian weighting","authors":"Hongfei Zu , Jing Zhu , Xinfeng Wang , Xiang Zhang , Ning Chen , Gangxiang Guo , Zhangwei Chen","doi":"10.1016/j.neucom.2025.129746","DOIUrl":"10.1016/j.neucom.2025.129746","url":null,"abstract":"<div><div>In this manuscript, a self-adaptive grid point cloud down-sampling method based on plane fitting was proposed, which could effectively reduce redundant data while better preserving the geometric features of the original model and maintaining high accuracy. This method first constructs initial voxel grids and divides the grids into large density and small density ones according to the point cloud density. After that, for small density grids, the boundary points are extracted first, and the rest areas are uniformly sampled, while for large density grids, a method based on Mahalanobis distance Gaussian weighting is proposed and adopted to estimate the normal vector of points, and feature points are determined and retained by calculating the information entropy. Then, three models in the public dataset, the Cat model, Bed_0355 model and Fandisk model, were employed as test subjects to compare the proposed method with two commonly used down-sampling methods: uniform sampling and voxel grid sampling methods. The results indicated that this new method was able to better retain the geometric features of the original models, especially high curvature and sharp parts, with smaller errors and fewer holes. Finally, this method was applied to the down-sampling of 3D scanning point clouds of two typical metal machine parts, threaded joint and sheet metal part, and the measured results demonstrated that this method not only effectively preserved the model features, but also guaranteed accuracy of key geometric dimensions after high reduction ratio down-sampling, such as the relative errors of thread tooth angles and hole inner diameters being less than 1 %.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"634 ","pages":"Article 129746"},"PeriodicalIF":5.5,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Neurocomputing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀