Pub Date : 2024-11-15DOI: 10.1016/j.neucom.2024.128877
He Fu , Cailing Wang , Zhanlong Chen
Convolutional neural networks (CNNs) have demonstrated strong capabilities in hyperspectral image (HSI) classification. However, it is still a challenge to adaptively adjust the size of the receptive fields (RFs) of CNNs base on the information of different scales in HSI to achieve adaptive selection of spectral–spatial features. In the paper, we modify the convolutional block attention module (CBAM) and propose a modified-CBAM-based network (MCNet) to adaptively select spectral–spatial features for HSI classification. In particular, the modified CBAM not only enables the model to adjust its RF size according to the information of different scales in HSI, but also enables the model to achieve a joint focus on important spectral and spatial features. This is very important to adaptively select more descriptive and discriminative spectral–spatial features. The proposed MCNet is compared with currently popular methods on Indian Pines, Kennedy Space Center, University of Pavia, and Botswana HSI datasets. The results show that MCNet has better classification results than other methods on overall accuracy, average accuracy, and Kappa.
{"title":"Adaptive selection of spectral–spatial features for hyperspectral image classification using a modified-CBAM-based network","authors":"He Fu , Cailing Wang , Zhanlong Chen","doi":"10.1016/j.neucom.2024.128877","DOIUrl":"10.1016/j.neucom.2024.128877","url":null,"abstract":"<div><div>Convolutional neural networks (CNNs) have demonstrated strong capabilities in hyperspectral image (HSI) classification. However, it is still a challenge to adaptively adjust the size of the receptive fields (RFs) of CNNs base on the information of different scales in HSI to achieve adaptive selection of spectral–spatial features. In the paper, we modify the convolutional block attention module (CBAM) and propose a modified-CBAM-based network (MCNet) to adaptively select spectral–spatial features for HSI classification. In particular, the modified CBAM not only enables the model to adjust its RF size according to the information of different scales in HSI, but also enables the model to achieve a joint focus on important spectral and spatial features. This is very important to adaptively select more descriptive and discriminative spectral–spatial features. The proposed MCNet is compared with currently popular methods on Indian Pines, Kennedy Space Center, University of Pavia, and Botswana HSI datasets. The results show that MCNet has better classification results than other methods on overall accuracy, average accuracy, and Kappa.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"615 ","pages":"Article 128877"},"PeriodicalIF":5.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.neucom.2024.128934
Jianming Wen , Ao Su , Xiaolin Wang , Hao Xu , Jijie Ma , Kang Chen , Xinyang Ge , Zisheng Xu , Zhong Lv
Virtual sample generation (VSG) technology aims to generate virtual samples based on real samples, in order to expand the size of the datasets and improve model performance. However, there is limited research summarizing VSG technology, which motivates this paper. In recent years, VSG technology has grown as a crucial tool for augmenting datasets and enhancing model performance, particularly in the fields like image recognition, medicine, and quality control where small datasets are common issues. This paper aims to provide an updated review of VSG technology, focusing on three key techniques which are important for small sample analysis studies, including sampling-based, information diffusion-based, and Generative Adversarial Networks (GANs)-based technology. In this review, we seek to identify the key trends in this field and to provide insights regarding the opportunities and challenges.
{"title":"Virtual sample generation for small sample learning: A survey, recent developments and future prospects","authors":"Jianming Wen , Ao Su , Xiaolin Wang , Hao Xu , Jijie Ma , Kang Chen , Xinyang Ge , Zisheng Xu , Zhong Lv","doi":"10.1016/j.neucom.2024.128934","DOIUrl":"10.1016/j.neucom.2024.128934","url":null,"abstract":"<div><div>Virtual sample generation (VSG) technology aims to generate virtual samples based on real samples, in order to expand the size of the datasets and improve model performance. However, there is limited research summarizing VSG technology, which motivates this paper. In recent years, VSG technology has grown as a crucial tool for augmenting datasets and enhancing model performance, particularly in the fields like image recognition, medicine, and quality control where small datasets are common issues. This paper aims to provide an updated review of VSG technology, focusing on three key techniques which are important for small sample analysis studies, including sampling-based, information diffusion-based, and Generative Adversarial Networks (GANs)-based technology. In this review, we seek to identify the key trends in this field and to provide insights regarding the opportunities and challenges.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"615 ","pages":"Article 128934"},"PeriodicalIF":5.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1016/j.neucom.2024.128871
Chuliang Guo , Yufei Chen , Yu Fu
Component-wise LSTM (cLSTM) constitutes multiple LSTM cells of distinct parameters, which has particular benefits of functional Magnetic Resonance Imaging (fMRI)-based neural Granger causality (NGC) analysis for the human brain. Back-propagation through time training on CPU and GPU suffers from low utilization due to inherent data dependencies within the LSTM cell. Moreover, batch 1 cLSTM training and few weight reuses across input feature maps worsen such a utilization problem. To this end, this study provides an FPGA-based training solution for cLSTM-based NGC analysis. The proposed cLSTM training accelerator identifies different data dependencies in forward and backward paths, and features two key components: (1) a fine-grained pipeline within the LSTM cell that achieves the lowest initial interval, and (2) a coarse-grained pipeline that trains input feature sequences across different LSTM cells in parallel. Experiments on the DAN sub-brain network from the COBRE dataset demonstrate the efficacy of FPGA-based cLSTM training, which achieves microseconds iteration latency compared with milliseconds on general-purpose platforms, e.g., 465 and 216 faster than Intel Core 13900K CPU and Nvidia RTX 2080Ti respectively. To the best of our knowledge, this work is the first to demonstrate LSTM training on FPGA, significantly accelerating the analysis and modeling of complex brain networks, and offering valuable advancements for neuroscience research at the edge.
{"title":"FPGA-based component-wise LSTM training accelerator for neural granger causality analysis","authors":"Chuliang Guo , Yufei Chen , Yu Fu","doi":"10.1016/j.neucom.2024.128871","DOIUrl":"10.1016/j.neucom.2024.128871","url":null,"abstract":"<div><div>Component-wise LSTM (cLSTM) constitutes multiple LSTM cells of distinct parameters, which has particular benefits of functional Magnetic Resonance Imaging (fMRI)-based neural Granger causality (NGC) analysis for the human brain. Back-propagation through time training on CPU and GPU suffers from low utilization due to inherent data dependencies within the LSTM cell. Moreover, batch 1 cLSTM training and few weight reuses across input feature maps worsen such a utilization problem. To this end, this study provides an FPGA-based training solution for cLSTM-based NGC analysis. The proposed cLSTM training accelerator identifies different data dependencies in forward and backward paths, and features two key components: (1) a fine-grained pipeline within the LSTM cell that achieves the lowest initial interval, and (2) a coarse-grained pipeline that trains input feature sequences across different LSTM cells in parallel. Experiments on the DAN sub-brain network from the COBRE dataset demonstrate the efficacy of FPGA-based cLSTM training, which achieves microseconds iteration latency compared with milliseconds on general-purpose platforms, <em>e.g.,</em> 465<span><math><mo>×</mo></math></span> and 216<span><math><mo>×</mo></math></span> faster than Intel Core 13900K CPU and Nvidia RTX 2080Ti respectively. To the best of our knowledge, this work is the first to demonstrate LSTM training on FPGA, significantly accelerating the analysis and modeling of complex brain networks, and offering valuable advancements for neuroscience research at the edge.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"615 ","pages":"Article 128871"},"PeriodicalIF":5.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1016/j.neucom.2024.128886
Di Tian, Jiabo Li, Jingyuan Lei
Environmental perception is a crucial component of intelligent driving technology, providing the informational foundation for intelligent decision-making and collaborative control. Due to the limitations of single sensors and the continuous advancements in deep learning and sensor technologies, multi-sensor information fusion in the Internet of Vehicles (IoV) has emerged as a major research hotspot. This approach is also a primary solution for achieving full self-driving. However, given the complexity of the technology, there are still many challenges in achieving accurate and reliable real-time multi-source information perception. Current discussions often focus on specific aspects of multi-sensor fusion in intelligent driving, while detailed discussions on sensor fusion in the context of the IoV are relatively scarce. To provide a comprehensive discussion and analysis of multi-sensor information fusion in IoV, this paper first provides a detailed introduction to its developmental background and the commonly involved sensors. Subsequently, a detailed analysis of the strategies, deep learning architectures, and methods for multi-sensor information fusion in the IoV is presented. Finally, the specific applications and key issues related to multi-sensor information fusion in IoV are discussed from multiple perspectives, along with an analysis of future development trends. This paper aims to serve as a valuable reference for advancing multi-sensor information fusion technology in IoV environments and supporting the realization of full self-driving.
{"title":"Multi-sensor information fusion in Internet of Vehicles based on deep learning: A review","authors":"Di Tian, Jiabo Li, Jingyuan Lei","doi":"10.1016/j.neucom.2024.128886","DOIUrl":"10.1016/j.neucom.2024.128886","url":null,"abstract":"<div><div>Environmental perception is a crucial component of intelligent driving technology, providing the informational foundation for intelligent decision-making and collaborative control. Due to the limitations of single sensors and the continuous advancements in deep learning and sensor technologies, multi-sensor information fusion in the Internet of Vehicles (IoV) has emerged as a major research hotspot. This approach is also a primary solution for achieving full self-driving. However, given the complexity of the technology, there are still many challenges in achieving accurate and reliable real-time multi-source information perception. Current discussions often focus on specific aspects of multi-sensor fusion in intelligent driving, while detailed discussions on sensor fusion in the context of the IoV are relatively scarce. To provide a comprehensive discussion and analysis of multi-sensor information fusion in IoV, this paper first provides a detailed introduction to its developmental background and the commonly involved sensors. Subsequently, a detailed analysis of the strategies, deep learning architectures, and methods for multi-sensor information fusion in the IoV is presented. Finally, the specific applications and key issues related to multi-sensor information fusion in IoV are discussed from multiple perspectives, along with an analysis of future development trends. This paper aims to serve as a valuable reference for advancing multi-sensor information fusion technology in IoV environments and supporting the realization of full self-driving.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128886"},"PeriodicalIF":5.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-10DOI: 10.1016/j.neucom.2024.128814
Aditya Kumar , Jainath Yadav
The branch of machine learning known as multi-view ensemble learning (MEL) is young and evolving quickly. The learning procedure in this case makes use of subsets of different features from the same dataset, and the prediction produced is then combined. The vertical partition of the dataset in regard to the portion of the feature set in a single source dataset is referred to as the view. View construction is a crucial job in MEL because an adequate number of good-quality views improves MEL’s performance. A well-known method of dividing up the nodes of a graph is called “graph coloring”, which involves giving each vertex a unique color so that no two neighboring vertex pairs share the same color. This approach can be utilized in a number of diverse fields including clustering. In this study, high-dimension features are partitioned using graph coloring, which is used to perform heterogeneous feature grouping. In order to automatically create views in MEL over high-dimensional datasets, the Graph coloring-based feature set partitioning (-FSP) technique is used. A support vector machine and artificial neural network have been used with 15 high-dimensional data sets to demonstrate the efficacy of the -FSP based MEL framework. Compared to single-view learning and other cutting-edge FSP-based MEL techniques, the results show that it is successful in enhancing classification performance. The outcomes have undergone non-parametric statistical study and the intended MEL framework has produced improved classification accuracy that is both acceptable and accurate.
{"title":"Unveiling the potential of graph coloring in feature set partitioning: A study on high-dimensional datasets","authors":"Aditya Kumar , Jainath Yadav","doi":"10.1016/j.neucom.2024.128814","DOIUrl":"10.1016/j.neucom.2024.128814","url":null,"abstract":"<div><div>The branch of machine learning known as multi-view ensemble learning (MEL) is young and evolving quickly. The learning procedure in this case makes use of subsets of different features from the same dataset, and the prediction produced is then combined. The vertical partition of the dataset in regard to the portion of the feature set in a single source dataset is referred to as the view. View construction is a crucial job in MEL because an adequate number of good-quality views improves MEL’s performance. A well-known method of dividing up the nodes of a graph is called “graph coloring”, which involves giving each vertex a unique color so that no two neighboring vertex pairs share the same color. This approach can be utilized in a number of diverse fields including clustering. In this study, high-dimension features are partitioned using graph coloring, which is used to perform heterogeneous feature grouping. In order to automatically create views in MEL over high-dimensional datasets, the Graph coloring-based feature set partitioning (<span><math><mi>GC</mi></math></span>-FSP) technique is used. A support vector machine and artificial neural network have been used with 15 high-dimensional data sets to demonstrate the efficacy of the <span><math><mi>GC</mi></math></span>-FSP based MEL framework. Compared to single-view learning and other cutting-edge FSP-based MEL techniques, the results show that it is successful in enhancing classification performance. The outcomes have undergone non-parametric statistical study and the intended MEL framework has produced improved classification accuracy that is both acceptable and accurate.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128814"},"PeriodicalIF":5.5,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.neucom.2024.128827
Muwei Jian , Huihui Huang , Haoran Zhang , Rui Wang , Xiaoguang Li , Hui Yu
Lung cancer stands as the leading cause of cancer related mortality worldwide. Precise and automated identification of lung nodules through 3D Computed Tomography (CT) scans is an essential part of screening for lung cancer effectively. Due to the small size of pulmonary nodules and the close correlation between neighboring slices of 3D CT images, most of the existing methods only consider the characteristics of a single slice, thus easily lead to insufficient detection accuracy of pulmonary nodules. To solve this problem, this paper proposes a Channel Shuffle Slice-Aware Network (CSSANet), which aims to fully exploit the spatial correlation between slices and effectively utilize the intra-slice features and inter-slice contextual information to achieve accurate detection of lung nodules. Specifically, we design a Group Shuffle Attention module (GSA module) to fuse the inter-slice feature in order to enhance the discrimination and extraction of corresponding shape information of distinct nodules in the same group of slices. Experiments and ablation study on a publicly available LUNA16 dataset demonstrate that the proposed method can enhance the detection sensitivity effectively. The Competition Performance Metric (CPM) score of 89.8 % is superior over other representative detection models.
{"title":"CSSANet: A channel shuffle slice-aware network for pulmonary nodule detection","authors":"Muwei Jian , Huihui Huang , Haoran Zhang , Rui Wang , Xiaoguang Li , Hui Yu","doi":"10.1016/j.neucom.2024.128827","DOIUrl":"10.1016/j.neucom.2024.128827","url":null,"abstract":"<div><div>Lung cancer stands as the leading cause of cancer related mortality worldwide. Precise and automated identification of lung nodules through 3D Computed Tomography (CT) scans is an essential part of screening for lung cancer effectively. Due to the small size of pulmonary nodules and the close correlation between neighboring slices of 3D CT images, most of the existing methods only consider the characteristics of a single slice, thus easily lead to insufficient detection accuracy of pulmonary nodules. To solve this problem, this paper proposes a Channel Shuffle Slice-Aware Network (CSSANet), which aims to fully exploit the spatial correlation between slices and effectively utilize the intra-slice features and inter-slice contextual information to achieve accurate detection of lung nodules. Specifically, we design a Group Shuffle Attention module (GSA module) to fuse the inter-slice feature in order to enhance the discrimination and extraction of corresponding shape information of distinct nodules in the same group of slices. Experiments and ablation study on a publicly available LUNA16 dataset demonstrate that the proposed method can enhance the detection sensitivity effectively. The Competition Performance Metric (CPM) score of 89.8 % is superior over other representative detection models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"615 ","pages":"Article 128827"},"PeriodicalIF":5.5,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.neucom.2024.128836
Hossein Hassani, Soodeh Nikan, Abdallah Shami
Experience replay is an indispensable part of deep reinforcement learning algorithms that enables the agent to revisit and reuse its past and recent experiences to update the network parameters. In many baseline off-policy algorithms, such as deep Q-networks (DQN), transitions in the replay buffer are typically sampled uniformly. This uniform sampling is not optimal for accelerating the agent’s training towards learning the optimal policy. A more selective and prioritized approach to experience sampling can yield improved learning efficiency and performance. In this regard, this work is devoted to the design of a novel prioritizing strategy to adaptively adjust the sampling probabilities of stored transitions in the replay buffer. Unlike existing sampling methods, the proposed algorithm takes into consideration the exploration–exploitation trade-off (EET) to rank transitions, which is of utmost importance in learning an optimal policy. Specifically, this approach utilizes temporal difference and Bellman errors as criteria for sampling priorities. To maintain balance in EET throughout training, the weights associated with both criteria are dynamically adjusted when constructing the sampling priorities. Additionally, any bias introduced by this sample prioritization is mitigated through assigning importance-sampling weight to each transition in the buffer. The efficacy of this prioritization scheme is assessed through training the DQN algorithm across various OpenAI Gym environments. The results obtained underscore the significance and superiority of our proposed algorithm over state-of-the-art methods. This is evidenced by its accelerated learning pace, greater cumulative reward, and higher success rate.
{"title":"Improved exploration–exploitation trade-off through adaptive prioritized experience replay","authors":"Hossein Hassani, Soodeh Nikan, Abdallah Shami","doi":"10.1016/j.neucom.2024.128836","DOIUrl":"10.1016/j.neucom.2024.128836","url":null,"abstract":"<div><div>Experience replay is an indispensable part of deep reinforcement learning algorithms that enables the agent to revisit and reuse its past and recent experiences to update the network parameters. In many baseline off-policy algorithms, such as deep Q-networks (DQN), transitions in the replay buffer are typically sampled uniformly. This uniform sampling is not optimal for accelerating the agent’s training towards learning the optimal policy. A more selective and prioritized approach to experience sampling can yield improved learning efficiency and performance. In this regard, this work is devoted to the design of a novel prioritizing strategy to adaptively adjust the sampling probabilities of stored transitions in the replay buffer. Unlike existing sampling methods, the proposed algorithm takes into consideration the exploration–exploitation trade-off (EET) to rank transitions, which is of utmost importance in learning an optimal policy. Specifically, this approach utilizes temporal difference and Bellman errors as criteria for sampling priorities. To maintain balance in EET throughout training, the weights associated with both criteria are dynamically adjusted when constructing the sampling priorities. Additionally, any bias introduced by this sample prioritization is mitigated through assigning importance-sampling weight to each transition in the buffer. The efficacy of this prioritization scheme is assessed through training the DQN algorithm across various OpenAI Gym environments. The results obtained underscore the significance and superiority of our proposed algorithm over state-of-the-art methods. This is evidenced by its accelerated learning pace, greater cumulative reward, and higher success rate.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128836"},"PeriodicalIF":5.5,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.neucom.2024.128811
Andrea Ferigo , Giovanni Iacca , Eric Medvet , Giorgia Nadizar
Multi-cellular organisms typically originate from a single cell, the zygote, that then develops into a multitude of structurally and functionally specialized cells. The potential of generating all the specialized cells that make up an organism is referred to as cellular “totipotency”, a concept introduced by the German plant physiologist Haberlandt in the early 1900s. In an attempt to reproduce this mechanism in synthetic organisms, we present a model based on a kind of modular robot called Voxel-based Soft Robot (VSR), where both the body, i.e., the arrangement of voxels, and the brain, i.e., the Artificial Neural Network (ANN) controlling each module, are subject to an evolutionary process aimed at optimizing the locomotion capabilities of the robot. In an analogy between totipotent cells and totipotent ANN-controlled modules, we then include in our model an additional level of adaptation provided by Hebbian learning, which allows the ANNs to adapt their weights during the execution of the locomotion task. Our in silico experiments reveal two main findings. Firstly, we confirm the common intuition that Hebbian plasticity effectively allows better performance and adaptation. Secondly and more importantly, we verify for the first time that the performance improvements yielded by plasticity are in essence due to a form of specialization at the level of single modules (and their associated ANNs): thanks to plasticity, modules specialize to react in different ways to the same set of stimuli, i.e., they become functionally and behaviorally different even though their ANNs are initialized in the same way. This mechanism, which can be seen as a form of totipotency at the level of ANNs, can have, in our view, profound implications in various areas of Artificial Intelligence (AI) and applications thereof, such as modular robotics and multi-agent systems.
{"title":"Totipotent neural controllers for modular soft robots: Achieving specialization in body–brain co-evolution through Hebbian learning","authors":"Andrea Ferigo , Giovanni Iacca , Eric Medvet , Giorgia Nadizar","doi":"10.1016/j.neucom.2024.128811","DOIUrl":"10.1016/j.neucom.2024.128811","url":null,"abstract":"<div><div>Multi-cellular organisms typically originate from a single cell, the zygote, that then develops into a multitude of structurally and functionally specialized cells. The potential of generating all the specialized cells that make up an organism is referred to as cellular “totipotency”, a concept introduced by the German plant physiologist Haberlandt in the early 1900s. In an attempt to reproduce this mechanism in synthetic organisms, we present a model based on a kind of modular robot called Voxel-based Soft Robot (VSR), where both the body, <em>i.e</em>., the arrangement of voxels, and the brain, <em>i.e</em>., the Artificial Neural Network (ANN) controlling each module, are subject to an evolutionary process aimed at optimizing the locomotion capabilities of the robot. In an analogy between totipotent cells and totipotent ANN-controlled modules, we then include in our model an additional level of adaptation provided by Hebbian learning, which allows the ANNs to adapt their weights during the execution of the locomotion task. Our in silico experiments reveal two main findings. Firstly, we confirm the common intuition that Hebbian plasticity effectively allows better performance and adaptation. Secondly and more importantly, we verify for the first time that the performance improvements yielded by plasticity are in essence due to a form of <em>specialization</em> at the level of single modules (and their associated ANNs): thanks to plasticity, modules specialize to react in different ways to the same set of stimuli, <em>i.e</em>., they become functionally and behaviorally different even though their ANNs are initialized in the same way. This mechanism, which can be seen as a form of totipotency at the level of ANNs, can have, in our view, profound implications in various areas of Artificial Intelligence (AI) and applications thereof, such as modular robotics and multi-agent systems.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128811"},"PeriodicalIF":5.5,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.neucom.2024.128844
You Wu, Bo Li, Zhixin Li
Previous methods have made promising progress, but there are still some limitations in narrowing the gap between modalities and exploring and preserving intrinsic multimodal semantics. Furthermore, there has been a failure to effectively incorporate the hash codes to correct poorly trained instance pairs during the training process. To overcome the above-mentioned issues, we propose a novel unsupervised hash learning framework, Revising Similarity Relationship Hashing (RSRH). Firstly, we constructed a feature cross-reconstruction module to narrow the gap between modalities. In addition, we build a multimodal fusion similarity map that nonlinearly combines intra- and inter-modal similarity maps to generate multimodal representations with complementary relationships. Finally, we propose a multimodal fusion graph update module for updating poorly trained instance pairs, improving retrieval performance. Experimental data show that our method outperforms many current mainstream hashing methods in performance, and its effectiveness and superiority have been fully validated.
{"title":"Revising similarity relationship hashing for unsupervised cross-modal retrieval","authors":"You Wu, Bo Li, Zhixin Li","doi":"10.1016/j.neucom.2024.128844","DOIUrl":"10.1016/j.neucom.2024.128844","url":null,"abstract":"<div><div>Previous methods have made promising progress, but there are still some limitations in narrowing the gap between modalities and exploring and preserving intrinsic multimodal semantics. Furthermore, there has been a failure to effectively incorporate the hash codes to correct poorly trained instance pairs during the training process. To overcome the above-mentioned issues, we propose a novel unsupervised hash learning framework, Revising Similarity Relationship Hashing (RSRH). Firstly, we constructed a feature cross-reconstruction module to narrow the gap between modalities. In addition, we build a multimodal fusion similarity map that nonlinearly combines intra- and inter-modal similarity maps to generate multimodal representations with complementary relationships. Finally, we propose a multimodal fusion graph update module for updating poorly trained instance pairs, improving retrieval performance. Experimental data show that our method outperforms many current mainstream hashing methods in performance, and its effectiveness and superiority have been fully validated.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128844"},"PeriodicalIF":5.5,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.neucom.2024.128835
Benhui Zhang , Junyu Gao , Yuan Yuan
Video paragraph captioning aims to describe a video that contains multiple events with a paragraph of generated coherent sentences. Such a captioning task is full of challenges since the high requirements for visual–textual relevance and semantic coherence across the captioning paragraph of a video. In this work, we introduce a memory-enhanced hierarchical transformer for video paragraph captioning. Our model adopts a hierarchical structure, where the outer layer transformer extracts visual information from a global perspective and captures the relevancy between event segments throughout the entire video, while the inner layer transformer further mines local details within each event segment. By thoroughly exploring both global and local visual information at the video and event levels, our model can provide comprehensive visual feature cues for promising paragraph caption generation. Additionally, we design a memory module to capture similar patterns among event segments within a video, which preserves contextual information across event segments and updates its memory state accordingly. Experimental results on two popular datasets, ActivityNet Captions and YouCook2, demonstrate that our proposed model can achieve superior performance, generating higher quality caption while maintaining consistency in the content of video.
{"title":"Memory-enhanced hierarchical transformer for video paragraph captioning","authors":"Benhui Zhang , Junyu Gao , Yuan Yuan","doi":"10.1016/j.neucom.2024.128835","DOIUrl":"10.1016/j.neucom.2024.128835","url":null,"abstract":"<div><div>Video paragraph captioning aims to describe a video that contains multiple events with a paragraph of generated coherent sentences. Such a captioning task is full of challenges since the high requirements for visual–textual relevance and semantic coherence across the captioning paragraph of a video. In this work, we introduce a memory-enhanced hierarchical transformer for video paragraph captioning. Our model adopts a hierarchical structure, where the outer layer transformer extracts visual information from a global perspective and captures the relevancy between event segments throughout the entire video, while the inner layer transformer further mines local details within each event segment. By thoroughly exploring both global and local visual information at the video and event levels, our model can provide comprehensive visual feature cues for promising paragraph caption generation. Additionally, we design a memory module to capture similar patterns among event segments within a video, which preserves contextual information across event segments and updates its memory state accordingly. Experimental results on two popular datasets, ActivityNet Captions and YouCook2, demonstrate that our proposed model can achieve superior performance, generating higher quality caption while maintaining consistency in the content of video.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"615 ","pages":"Article 128835"},"PeriodicalIF":5.5,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}