首页 > 最新文献

Neurocomputing最新文献

英文 中文
Joint Conditional Diffusion Model for image restoration with mixed degradations
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-27 DOI: 10.1016/j.neucom.2025.129512
Yufeng Yue, Meng Yu, Luojie Yang, Tong Liu
Image restoration is rather challenging in adverse weather conditions, especially when multiple degradations occur simultaneously. Blind image decomposition was proposed to tackle this issue, however, its effectiveness heavily relies on the accurate estimation of each component. Although diffusion-based models exhibit strong generative abilities in image restoration tasks, they may generate irrelevant contents when the degraded images are severely corrupted. To address these issues, we leverage physical constraints to guide the whole restoration process, where a mixed degradation model based on atmosphere scattering model is constructed. Then we formulate our Joint Conditional Diffusion Model (JCDM) by incorporating the degraded image and degradation mask to provide precise guidance. To achieve better color and detail recovery results, we further integrate a refinement network to reconstruct the restored image, where Uncertainty Estimation Block (UEB) is employed to enhance the features. Extensive experiments performed on both multi-weather and weather-specific datasets demonstrate the superiority of our method over state-of-the-art competing methods. The code will be available at https://github.com/mengyu212/JCDM.
{"title":"Joint Conditional Diffusion Model for image restoration with mixed degradations","authors":"Yufeng Yue,&nbsp;Meng Yu,&nbsp;Luojie Yang,&nbsp;Tong Liu","doi":"10.1016/j.neucom.2025.129512","DOIUrl":"10.1016/j.neucom.2025.129512","url":null,"abstract":"<div><div>Image restoration is rather challenging in adverse weather conditions, especially when multiple degradations occur simultaneously. Blind image decomposition was proposed to tackle this issue, however, its effectiveness heavily relies on the accurate estimation of each component. Although diffusion-based models exhibit strong generative abilities in image restoration tasks, they may generate irrelevant contents when the degraded images are severely corrupted. To address these issues, we leverage physical constraints to guide the whole restoration process, where a mixed degradation model based on atmosphere scattering model is constructed. Then we formulate our Joint Conditional Diffusion Model (JCDM) by incorporating the degraded image and degradation mask to provide precise guidance. To achieve better color and detail recovery results, we further integrate a refinement network to reconstruct the restored image, where Uncertainty Estimation Block (UEB) is employed to enhance the features. Extensive experiments performed on both multi-weather and weather-specific datasets demonstrate the superiority of our method over state-of-the-art competing methods. The code will be available at <span><span>https://github.com/mengyu212/JCDM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"626 ","pages":"Article 129512"},"PeriodicalIF":5.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143314289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PseudoNeuronGAN: Unpaired synthetic image to pseudo-neuron image translation for label-free neuron instance segmentation
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-27 DOI: 10.1016/j.neucom.2025.129559
Zhenzhen You , Ming Jiang , Zhenghao Shi , Cheng Shi , Shuangli Du , Minghua Zhao , Anne-Sophie Hérard , Nicolas Souedet , Thierry Delzescaux
Accurate neuron instance segmentation is of great significance in the field of neuroscience. The prerequisite for obtaining high-precision segmentation results using deep learning models is to have a large number of labeled datasets. However, in areas such as the dentate gyrus of the hippocampus where tens of thousands of neurons are aggregated, neuroscientists are unable to label neuron pixels. In this paper, we propose a pipeline for label-free neuron instance segmentation. Firstly, PseudoNeuronGAN, an unpaired synthetic image to pseudo-neuron image translation network, is proposed. Without requiring any manual labeling, synthetic cell images with known centroid labels and real neuron images are sufficient to generate a pseudo-neuron dataset. Since centroid labels are constraints to prevent neuron loss during the translation process, they are consistent in both the synthetic dataset and the generated pseudo-neuron dataset, and can be set as labels for pseudo-neuron images to train deep learning networks to predict the centroids of real neurons. Finally, based on the detected neuron centroids, neuron instance segmentation can be obtained by using competitive region growing algorithm. Experiments show that our pipeline succeeds in performing neuron instance segmentation without the need for manual annotations. PseudoNeuronGAN to generate a labeled pseudo-neuron dataset will greatly reduce the tedious labeling work by neuroscientists, and the accuracy of centroid labels is no longer biased by subjective factors. In terms of instance segmentation performance, the average F-score calculated by classical deep learning models trained on the pseudo-neuron dataset exceeds the average F-score trained on a limited number of real neuron dataset, reflecting the high quality of the generated pseudo-neuron dataset. Our critical code of PseudoNeuronGAN is available at https://github.com/zhenzhen89/PseudoNeuronGAN.
{"title":"PseudoNeuronGAN: Unpaired synthetic image to pseudo-neuron image translation for label-free neuron instance segmentation","authors":"Zhenzhen You ,&nbsp;Ming Jiang ,&nbsp;Zhenghao Shi ,&nbsp;Cheng Shi ,&nbsp;Shuangli Du ,&nbsp;Minghua Zhao ,&nbsp;Anne-Sophie Hérard ,&nbsp;Nicolas Souedet ,&nbsp;Thierry Delzescaux","doi":"10.1016/j.neucom.2025.129559","DOIUrl":"10.1016/j.neucom.2025.129559","url":null,"abstract":"<div><div>Accurate neuron instance segmentation is of great significance in the field of neuroscience. The prerequisite for obtaining high-precision segmentation results using deep learning models is to have a large number of labeled datasets. However, in areas such as the dentate gyrus of the hippocampus where tens of thousands of neurons are aggregated, neuroscientists are unable to label neuron pixels. In this paper, we propose a pipeline for label-free neuron instance segmentation. Firstly, PseudoNeuronGAN, an unpaired synthetic image to pseudo-neuron image translation network, is proposed. Without requiring any manual labeling, synthetic cell images with known centroid labels and real neuron images are sufficient to generate a pseudo-neuron dataset. Since centroid labels are constraints to prevent neuron loss during the translation process, they are consistent in both the synthetic dataset and the generated pseudo-neuron dataset, and can be set as labels for pseudo-neuron images to train deep learning networks to predict the centroids of real neurons. Finally, based on the detected neuron centroids, neuron instance segmentation can be obtained by using competitive region growing algorithm. Experiments show that our pipeline succeeds in performing neuron instance segmentation without the need for manual annotations. PseudoNeuronGAN to generate a labeled pseudo-neuron dataset will greatly reduce the tedious labeling work by neuroscientists, and the accuracy of centroid labels is no longer biased by subjective factors. In terms of instance segmentation performance, the average F-score calculated by classical deep learning models trained on the pseudo-neuron dataset exceeds the average F-score trained on a limited number of real neuron dataset, reflecting the high quality of the generated pseudo-neuron dataset. Our critical code of PseudoNeuronGAN is available at <span><span>https://github.com/zhenzhen89/PseudoNeuronGAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"626 ","pages":"Article 129559"},"PeriodicalIF":5.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143314458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in instance segmentation: Technologies, metrics and applications in computer vision
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-27 DOI: 10.1016/j.neucom.2025.129584
José M. Molina , Juan P. Llerena , Luis Usero , Miguel A. Patricio
Instance segmentation is an advanced technique in computer vision that focuses on identifying and classifying each individual object in an image at the pixel level. Unlike semantic segmentation, which groups pixels of similar objects without distinguishing between different instances, instance segmentation assigns unique labels to each object, even if they are of the same class. This makes it possible not only to detect the presence and category of objects in an image but also to locate each specific instance and clearly distinguish them from each other. This problem not only advances the technical and theoretical understanding of how machines see and process digital images, but also has a direct impact on various industries and sectors where computer vision is an essential part of the system. In this paper, we present the current deep learning-based technologies, the metrics used for their evaluation, and a review of general and concrete datasets in general and drone-specific contexts. The results of this study provide a compendium of easily deployable deep learning-based technologies. This review paper aims to accelerate the process of understanding and using instance segmentation technologies for the reader.
{"title":"Advances in instance segmentation: Technologies, metrics and applications in computer vision","authors":"José M. Molina ,&nbsp;Juan P. Llerena ,&nbsp;Luis Usero ,&nbsp;Miguel A. Patricio","doi":"10.1016/j.neucom.2025.129584","DOIUrl":"10.1016/j.neucom.2025.129584","url":null,"abstract":"<div><div>Instance segmentation is an advanced technique in computer vision that focuses on identifying and classifying each individual object in an image at the pixel level. Unlike semantic segmentation, which groups pixels of similar objects without distinguishing between different instances, instance segmentation assigns unique labels to each object, even if they are of the same class. This makes it possible not only to detect the presence and category of objects in an image but also to locate each specific instance and clearly distinguish them from each other. This problem not only advances the technical and theoretical understanding of how machines see and process digital images, but also has a direct impact on various industries and sectors where computer vision is an essential part of the system. In this paper, we present the current deep learning-based technologies, the metrics used for their evaluation, and a review of general and concrete datasets in general and drone-specific contexts. The results of this study provide a compendium of easily deployable deep learning-based technologies. This review paper aims to accelerate the process of understanding and using instance segmentation technologies for the reader.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129584"},"PeriodicalIF":5.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143152022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Task-aware Temporal Modeling and Matching for few-shot action recognition
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-27 DOI: 10.1016/j.neucom.2025.129467
Yucheng Zhan , Yijun Pan , Siying Wu , Yueyi Zhang , Xiaoyan Sun
Few-shot action recognition seeks to classify new action categories using only a few labeled video samples as reference. Due to the lack of sufficient training samples and the complex structure of video data, it is difficult to extract global features that can be directly used for classification. Therefore, most previous works adopt image encoders to extract the features of each frame individually, and then perform temporal fusion and alignment for the query and support features. However, they neglect the importance of sufficient spatiotemporal modeling and relationships with other categories in the few-shot task when extracting features, rendering them less effective at distinguishing between the given action classes, especially those that require the perception of local motion. In this paper, we present Hierarchical Task-aware Temporal Modeling and Matching (HTTMM) to better perceive critical motion patterns and extract task-relevant discriminative features for matching. Specifically, we propose a task guidance generator and a hierarchical task-aware temporal module. The former collects samples across all categories to get task contextual information and generate task guidance. The latter performs hierarchical spatiotemporal modeling in a task-aware manner by incorporating the task guidance and leveraging the multi-stage visual features of the image encoder. This design makes the extracted features rich in motion cues and more discriminative among the new classes in the task. Based on the global and local features obtained, we further propose a simple matching strategy that takes multi-level relationships into consideration to improve the robustness of matching. To demonstrate the effectiveness of our method, we evaluate it on four commonly used datasets, i.e., Kinetics, UCF101, HMDB51, and Something–Something v2. The experimental results show that our method outperforms the existing state-of-the-art methods.
{"title":"Hierarchical Task-aware Temporal Modeling and Matching for few-shot action recognition","authors":"Yucheng Zhan ,&nbsp;Yijun Pan ,&nbsp;Siying Wu ,&nbsp;Yueyi Zhang ,&nbsp;Xiaoyan Sun","doi":"10.1016/j.neucom.2025.129467","DOIUrl":"10.1016/j.neucom.2025.129467","url":null,"abstract":"<div><div>Few-shot action recognition seeks to classify new action categories using only a few labeled video samples as reference. Due to the lack of sufficient training samples and the complex structure of video data, it is difficult to extract global features that can be directly used for classification. Therefore, most previous works adopt image encoders to extract the features of each frame individually, and then perform temporal fusion and alignment for the query and support features. However, they neglect the importance of sufficient spatiotemporal modeling and relationships with other categories in the few-shot task when extracting features, rendering them less effective at distinguishing between the given action classes, especially those that require the perception of local motion. In this paper, we present Hierarchical Task-aware Temporal Modeling and Matching (HTTMM) to better perceive critical motion patterns and extract task-relevant discriminative features for matching. Specifically, we propose a task guidance generator and a hierarchical task-aware temporal module. The former collects samples across all categories to get task contextual information and generate task guidance. The latter performs hierarchical spatiotemporal modeling in a task-aware manner by incorporating the task guidance and leveraging the multi-stage visual features of the image encoder. This design makes the extracted features rich in motion cues and more discriminative among the new classes in the task. Based on the global and local features obtained, we further propose a simple matching strategy that takes multi-level relationships into consideration to improve the robustness of matching. To demonstrate the effectiveness of our method, we evaluate it on four commonly used datasets, <em>i.e.</em>, Kinetics, UCF101, HMDB51, and Something–Something v2. The experimental results show that our method outperforms the existing state-of-the-art methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"624 ","pages":"Article 129467"},"PeriodicalIF":5.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143147392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ATPTrack: Visual tracking with alternating token pruning of dynamic templates and search region
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-27 DOI: 10.1016/j.neucom.2025.129534
Shuo Zhang , Dan Zhang , Qi Zou
Constantly varying appearance of targets brings tremendous challenges for visual object tracking, especially in long-term tracking and background interference scenarios. Current leading trackers attempt to introduce multi-level fixed and dynamic templates to encode changing target information and abundant spatiotemporal contexts. These methods achieve astonishing performance improvements. However, dynamic templates are obtained from intermediate frames with high tracking confidence scores. These frames are not manually annotated. Therefore, the accuracy of dynamic templates relies entirely on tracking results. Dynamic templates may contain a large amount of uninformative and irrelevant background noise due to imprecise tracking. Additionally, multiple templates result in increased computational complexity as well. In order to tackle the above problems, a novel tracker dubbed ATPTrack is proposed for efficient end-to-end tracking. ATPTrack combines the initial target information of fixed templates and updated target representations of dynamic templates. Particularly, ATPTrack develops an alternating token trimming method that prunes dynamic templates and search region progressively. After token simplification, target-related information is highlighted in both dynamic templates and search region, which makes it more computationally efficient. Furthermore, a novel similarity ranking module is incorporated into the dynamic template pruning (DTP) block and the search region pruning (SRP) block for selecting the most discriminative target tokens. The DTP and SRP blocks are responsible for efficient trimming of dynamic templates and the search region, separately. The proposed ATPTrack achieves competitive performance on multiple tracking benchmarks. Compared to merely trimming the search region, ATPTrack further reduces the multiply-accumulate operations (MACs) by 11.5 % with negligible performance drop of 0.3 % by alternately pruning dynamic templates and search region.
{"title":"ATPTrack: Visual tracking with alternating token pruning of dynamic templates and search region","authors":"Shuo Zhang ,&nbsp;Dan Zhang ,&nbsp;Qi Zou","doi":"10.1016/j.neucom.2025.129534","DOIUrl":"10.1016/j.neucom.2025.129534","url":null,"abstract":"<div><div>Constantly varying appearance of targets brings tremendous challenges for visual object tracking, especially in long-term tracking and background interference scenarios. Current leading trackers attempt to introduce multi-level fixed and dynamic templates to encode changing target information and abundant spatiotemporal contexts. These methods achieve astonishing performance improvements. However, dynamic templates are obtained from intermediate frames with high tracking confidence scores. These frames are not manually annotated. Therefore, the accuracy of dynamic templates relies entirely on tracking results. Dynamic templates may contain a large amount of uninformative and irrelevant background noise due to imprecise tracking. Additionally, multiple templates result in increased computational complexity as well. In order to tackle the above problems, a novel tracker dubbed ATPTrack is proposed for efficient end-to-end tracking. ATPTrack combines the initial target information of fixed templates and updated target representations of dynamic templates. Particularly, ATPTrack develops an alternating token trimming method that prunes dynamic templates and search region progressively. After token simplification, target-related information is highlighted in both dynamic templates and search region, which makes it more computationally efficient. Furthermore, a novel similarity ranking module is incorporated into the dynamic template pruning (DTP) block and the search region pruning (SRP) block for selecting the most discriminative target tokens. The DTP and SRP blocks are responsible for efficient trimming of dynamic templates and the search region, separately. The proposed ATPTrack achieves competitive performance on multiple tracking benchmarks. Compared to merely trimming the search region, ATPTrack further reduces the multiply-accumulate operations (MACs) by 11.5 % with negligible performance drop of 0.3 % by alternately pruning dynamic templates and search region.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129534"},"PeriodicalIF":5.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143152638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPT4TFP: Spatio-temporal fusion large language model for traffic flow prediction
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-27 DOI: 10.1016/j.neucom.2025.129562
Yiwu Xu, Mengchi Liu
Traffic flow prediction aims to anticipate the future usage levels of transportation, and is a pivotal component of intelligent transportation systems. Previous studies have mainly employed deep learning technologies to decode traffic flow data. These methods process the spatial and temporal embeddings of traffic flow data in a sequential, parallel, or single-feature manner. Although the structures of these models are becoming more and more complex, their accuracy has not improved. Recently, large language models (LLMs) have made significant progress in traffic flow prediction tasks due to their superior performance. However, although the spatio-temporal dependencies of traffic flow prediction can be captured by LLMs, they ignore the cross-relationships between spatio-temporal embeddings. To this end, we propose a spatio-temporal fusion large language model (GPT4TFP) for traffic flow prediction, which is divided into four components: the spatio-temporal embedding layer, the spatio-temporal fusion layer, the frozen pre-trained LLM layer, and the output linear layer. The spatio-temporal embedding layer embeds traffic flow data into the spatio-temporal representations required by traffic flow prediction. In the spatio-temporal fusion layer, we propose a spatio-temporal fusion strategy based on multi-head cross-attention to capture the cross-relationships between spatio-temporal embeddings. In addition, we introduce a frozen pre-trained strategy to fine-tune the LLM to improve the accuracy of traffic flow prediction. The experimental results on two traffic flow datasets show that the proposed model outperforms a set of state-of-the-art baseline models.
{"title":"GPT4TFP: Spatio-temporal fusion large language model for traffic flow prediction","authors":"Yiwu Xu,&nbsp;Mengchi Liu","doi":"10.1016/j.neucom.2025.129562","DOIUrl":"10.1016/j.neucom.2025.129562","url":null,"abstract":"<div><div>Traffic flow prediction aims to anticipate the future usage levels of transportation, and is a pivotal component of intelligent transportation systems. Previous studies have mainly employed deep learning technologies to decode traffic flow data. These methods process the spatial and temporal embeddings of traffic flow data in a sequential, parallel, or single-feature manner. Although the structures of these models are becoming more and more complex, their accuracy has not improved. Recently, large language models (LLMs) have made significant progress in traffic flow prediction tasks due to their superior performance. However, although the spatio-temporal dependencies of traffic flow prediction can be captured by LLMs, they ignore the cross-relationships between spatio-temporal embeddings. To this end, we propose a spatio-temporal fusion large language model (GPT4TFP) for traffic flow prediction, which is divided into four components: the spatio-temporal embedding layer, the spatio-temporal fusion layer, the frozen pre-trained LLM layer, and the output linear layer. The spatio-temporal embedding layer embeds traffic flow data into the spatio-temporal representations required by traffic flow prediction. In the spatio-temporal fusion layer, we propose a spatio-temporal fusion strategy based on multi-head cross-attention to capture the cross-relationships between spatio-temporal embeddings. In addition, we introduce a frozen pre-trained strategy to fine-tune the LLM to improve the accuracy of traffic flow prediction. The experimental results on two traffic flow datasets show that the proposed model outperforms a set of state-of-the-art baseline models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129562"},"PeriodicalIF":5.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143152458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-attention enhanced variational encoding for interpretable remaining useful life prediction
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-27 DOI: 10.1016/j.neucom.2025.129487
Wen Liu , Jyun-You Chiang , Guojun Liu , Haobo Zhang
In Prognostics Health Management (PHM), predicting Remaining Useful Life (RUL) is a key technique for equipment health evaluation. The utilization of deep learning methods has improved prediction accuracy. However, these approaches often fail to provide the transparency and interpretability that maintenance personnel require to diagnose equipment degradation effectively. To address this challenge, a Dual-Attention Enhanced Variational Encoding (DAEVE) approach based on Transformer is developed for more interpretable RUL prediction. This framework integrates both sensor and time step encoders, a latent space with inductive bias and a regression model: the fusion encoder compresses input data into a three-dimension(3-D) latent space, facilitating both the prediction and interpretation of the equipment degradation process. Four turbofan aircraft engine datasets are applied in extensive experiments to evaluate the efficacy of proposed method. The results demonstrate that DAEVE outperforms most state-of-the-art methods in prediction accuracy. Furthermore, the proposed method exhibits the latent degradation trajectories and more informative sensors in diverse stages. This research could enhance maintenance decision-making processes and reduce operational risks, contributing to the advancement of predictive maintenance in the aerospace and related industries.
{"title":"Dual-attention enhanced variational encoding for interpretable remaining useful life prediction","authors":"Wen Liu ,&nbsp;Jyun-You Chiang ,&nbsp;Guojun Liu ,&nbsp;Haobo Zhang","doi":"10.1016/j.neucom.2025.129487","DOIUrl":"10.1016/j.neucom.2025.129487","url":null,"abstract":"<div><div>In Prognostics Health Management (PHM), predicting Remaining Useful Life (RUL) is a key technique for equipment health evaluation. The utilization of deep learning methods has improved prediction accuracy. However, these approaches often fail to provide the transparency and interpretability that maintenance personnel require to diagnose equipment degradation effectively. To address this challenge, a Dual-Attention Enhanced Variational Encoding (DAEVE) approach based on Transformer is developed for more interpretable RUL prediction. This framework integrates both sensor and time step encoders, a latent space with inductive bias and a regression model: the fusion encoder compresses input data into a three-dimension(3-D) latent space, facilitating both the prediction and interpretation of the equipment degradation process. Four turbofan aircraft engine datasets are applied in extensive experiments to evaluate the efficacy of proposed method. The results demonstrate that DAEVE outperforms most state-of-the-art methods in prediction accuracy. Furthermore, the proposed method exhibits the latent degradation trajectories and more informative sensors in diverse stages. This research could enhance maintenance decision-making processes and reduce operational risks, contributing to the advancement of predictive maintenance in the aerospace and related industries.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"624 ","pages":"Article 129487"},"PeriodicalIF":5.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143147386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-modulated frequency-aware pooling via spatial guidance
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-27 DOI: 10.1016/j.neucom.2025.129507
Yunzhong Si , Huiying Xu , Xinzhong Zhu , Rihao Liu , Hongbo Li
Pooling is widely used in computer vision to expand the receptive field and enhance semantic understanding by reducing spatial resolution. However, current mainstream downsampling methods primarily rely on local spatial aggregation. While they effectively reduce the spatial resolution of feature maps and extract discriminative features, they are still limited by the constraints of the receptive field and the inadequacy of single-domain information, making it challenging to effectively capture fine details while suppressing noise. To address these limitations, we propose a Dual-Domain Downsampling (D3) method, which leverages the complementarity of spatial and frequency domains. We employ an invertible local two-dimensional Discrete Cosine Transform (2D DCT) transformation to construct a frequency domain pooling window. In the spatial domain, we design an Inverted Multiform Attention Modulator (IMAM) that expands the receptive field through multiform convolutions, while adaptively constructing dynamic frequency weights guided by rich spatial information. This allows for fine-grained modulation of different frequency components, either amplifying or attenuating them in different spatial regions, effectively reducing noise while preserving detail. Extensive experiments on ImageNet-1K, MSCOCO, and complex scene detection datasets across various benchmark models consistently validate the effectiveness of our approach. On the ImageNet-1K classification task, our method achieve up to a 1.95% accuracy improvement, with significant performance gains over state-of-the-art methods on MSCOCO and other challenging detection scenarios. The code will be made publicly available at: https://github.com/HZAI-ZJNU/D3.
{"title":"Attention-modulated frequency-aware pooling via spatial guidance","authors":"Yunzhong Si ,&nbsp;Huiying Xu ,&nbsp;Xinzhong Zhu ,&nbsp;Rihao Liu ,&nbsp;Hongbo Li","doi":"10.1016/j.neucom.2025.129507","DOIUrl":"10.1016/j.neucom.2025.129507","url":null,"abstract":"<div><div>Pooling is widely used in computer vision to expand the receptive field and enhance semantic understanding by reducing spatial resolution. However, current mainstream downsampling methods primarily rely on local spatial aggregation. While they effectively reduce the spatial resolution of feature maps and extract discriminative features, they are still limited by the constraints of the receptive field and the inadequacy of single-domain information, making it challenging to effectively capture fine details while suppressing noise. To address these limitations, we propose a Dual-Domain Downsampling (D3) method, which leverages the complementarity of spatial and frequency domains. We employ an invertible local two-dimensional Discrete Cosine Transform (2D DCT) transformation to construct a frequency domain pooling window. In the spatial domain, we design an Inverted Multiform Attention Modulator (IMAM) that expands the receptive field through multiform convolutions, while adaptively constructing dynamic frequency weights guided by rich spatial information. This allows for fine-grained modulation of different frequency components, either amplifying or attenuating them in different spatial regions, effectively reducing noise while preserving detail. Extensive experiments on ImageNet-1K, MSCOCO, and complex scene detection datasets across various benchmark models consistently validate the effectiveness of our approach. On the ImageNet-1K classification task, our method achieve up to a 1.95% accuracy improvement, with significant performance gains over state-of-the-art methods on MSCOCO and other challenging detection scenarios. The code will be made publicly available at: <span><span>https://github.com/HZAI-ZJNU/D3</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129507"},"PeriodicalIF":5.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143152461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Bayesian based NARX modeling of cortical response: Introducing information entropy for enhancing the stability
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-27 DOI: 10.1016/j.neucom.2025.129569
Nan Zheng , Yurong Li , Wuxiang Shi , Qiurong Xie
In this paper, an innovative Sparse Bayesian Learning (SBL)-based modeling approach incorporating Information Entropy (IE) to enhance the stability is developed to create Nonlinear Auto-Regressive model with eXogenous input, aiming to address the challenges of low estimation accuracy, limited computational efficiency and insufficient sparsity in existing methods. This development is conducive to capture the key features of cortical responses when focusing on neural activity, providing more accurate results for studying brain mechanisms. By introducing identity transformations and optimizing parameter update and stopping strategies, both computational efficiency and estimation accuracy of the SBL algorithm are effectively improved, where the iterative matrix within ISBL is refined by the introduced IE, which further strengthens the algorithm performance at low Signal-to-Noise Ratio levels. Extensive evaluation demonstrates the proposed method reduces the error by 48 %, decreases the traditional SBL method's runtime by 70 %, and achieves the sparsest result while maintaining structural accuracy, which shows significant competitiveness in accuracy, efficiency and sparsity as compared to other state-of-the-art methods. Moreover, the analysis of real EEG signals indicates that the brain's response follows a fundamental rhythm pattern of adaptation to both active and passive tasks, and such adaptive process can be effectively captured by the proposed sparse model through the combination of linear and nonlinear terms, each serving distinct roles. These findings offer a novel insight into the human sensorimotor system, which indicates the great potential of the proposed method in assessing sensorimotor impairments and exploring effective clinical intervention method.
{"title":"Sparse Bayesian based NARX modeling of cortical response: Introducing information entropy for enhancing the stability","authors":"Nan Zheng ,&nbsp;Yurong Li ,&nbsp;Wuxiang Shi ,&nbsp;Qiurong Xie","doi":"10.1016/j.neucom.2025.129569","DOIUrl":"10.1016/j.neucom.2025.129569","url":null,"abstract":"<div><div>In this paper, an innovative Sparse Bayesian Learning (SBL)-based modeling approach incorporating Information Entropy (IE) to enhance the stability is developed to create Nonlinear Auto-Regressive model with eXogenous input, aiming to address the challenges of low estimation accuracy, limited computational efficiency and insufficient sparsity in existing methods. This development is conducive to capture the key features of cortical responses when focusing on neural activity, providing more accurate results for studying brain mechanisms. By introducing identity transformations and optimizing parameter update and stopping strategies, both computational efficiency and estimation accuracy of the SBL algorithm are effectively improved, where the iterative matrix within ISBL is refined by the introduced IE, which further strengthens the algorithm performance at low Signal-to-Noise Ratio levels. Extensive evaluation demonstrates the proposed method reduces the error by 48 %, decreases the traditional SBL method's runtime by 70 %, and achieves the sparsest result while maintaining structural accuracy, which shows significant competitiveness in accuracy, efficiency and sparsity as compared to other state-of-the-art methods. Moreover, the analysis of real EEG signals indicates that the brain's response follows a fundamental rhythm pattern of adaptation to both active and passive tasks, and such adaptive process can be effectively captured by the proposed sparse model through the combination of linear and nonlinear terms, each serving distinct roles. These findings offer a novel insight into the human sensorimotor system, which indicates the great potential of the proposed method in assessing sensorimotor impairments and exploring effective clinical intervention method.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"626 ","pages":"Article 129569"},"PeriodicalIF":5.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143314284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Lyrebird Red Panda Optimization_Shepard Convolutional Neural Network for recognition of speech emotion in audio signals
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-26 DOI: 10.1016/j.neucom.2025.129506
Kanimozhi N. , Devi Priya R.
Speech serves as the primary mode of human communication, where semantic meaning is conveyed through the combination and arrangement of words. Recent research in Speech Emotion Recognition (SER) has assisted in maintaining and improving social relationships and behaviors among individuals. In recent times, several advancements have been attained in SER systems due to the incorporation of Deep learning (DL) models. However, the conventional techniques often require large, well-annotated datasets for effective training, which was resource-intensive to collect and label. Moreover, models may struggle to generalize across diverse speakers, emotional expressions, and recording conditions, potentially limiting their real-world applicability. Therefore, this paper presents a Lyrebird Red Panda Optimization _Shepard Convolutional Neural Network (LRPO_ShCNN) for SER. Initially, the input speech signal is preprocessed by using Adaptive Gaussian filter. After that, the significant features from the preprocessed image are extracted. Further, data augmentation process is carried out and it generates new data points from existing data. After that, feature selection is done with LRPO. Finally, SER is accomplished by utilizing ShCNN, where the emotions are classified. Moreover, the hyperparameters of ShCNN are tuned with LRPO, which is developed by the integration of Lyrebird Optimization Algorithm (LOA) and Red Panda Optimization (RPO). The evaluation results shows that the LRPO_ShCNN obtained Accuracy, Positive Predictive Value (PPV), Negative Predictive Value (NPV), True Positive Rate (TPR), and True Negative Rate (TNR) as 91.092 %, 90.552 %, 90.876 %, 91.230 %, and 91.818 % respectively.
{"title":"Hybrid Lyrebird Red Panda Optimization_Shepard Convolutional Neural Network for recognition of speech emotion in audio signals","authors":"Kanimozhi N. ,&nbsp;Devi Priya R.","doi":"10.1016/j.neucom.2025.129506","DOIUrl":"10.1016/j.neucom.2025.129506","url":null,"abstract":"<div><div>Speech serves as the primary mode of human communication, where semantic meaning is conveyed through the combination and arrangement of words. Recent research in Speech Emotion Recognition (SER) has assisted in maintaining and improving social relationships and behaviors among individuals. In recent times, several advancements have been attained in SER systems due to the incorporation of Deep learning (DL) models. However, the conventional techniques often require large, well-annotated datasets for effective training, which was resource-intensive to collect and label. Moreover, models may struggle to generalize across diverse speakers, emotional expressions, and recording conditions, potentially limiting their real-world applicability. Therefore, this paper presents a Lyrebird Red Panda Optimization _Shepard Convolutional Neural Network (LRPO_ShCNN) for SER. Initially, the input speech signal is preprocessed by using Adaptive Gaussian filter. After that, the significant features from the preprocessed image are extracted. Further, data augmentation process is carried out and it generates new data points from existing data. After that, feature selection is done with LRPO. Finally, SER is accomplished by utilizing ShCNN, where the emotions are classified. Moreover, the hyperparameters of ShCNN are tuned with LRPO, which is developed by the integration of Lyrebird Optimization Algorithm (LOA) and Red Panda Optimization (RPO). The evaluation results shows that the LRPO_ShCNN obtained Accuracy, Positive Predictive Value (PPV), Negative Predictive Value (NPV), True Positive Rate (TPR), and True Negative Rate (TNR) as 91.092 %, 90.552 %, 90.876 %, 91.230 %, and 91.818 % respectively.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129506"},"PeriodicalIF":5.5,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143210299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neurocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1