Pub Date : 2024-09-06DOI: 10.1016/j.neucom.2024.128514
The study of cooperation within social dilemmas has long been a fundamental topic across various disciplines, including computer science and social science. Recent advancements in Artificial Intelligence (AI) have significantly reshaped this field, offering fresh insights into understanding and enhancing cooperation. This survey examines three key areas at the intersection of AI and cooperation in social dilemmas. First, focusing on multi-agent cooperation, we review the intrinsic and external motivations that support cooperation among rational agents, and the methods employed to develop effective strategies against diverse opponents. Second, looking into human–agent cooperation, we discuss the current AI algorithms for cooperating with humans and the human biases towards AI agents. Third, we review the emergent field of leveraging AI agents to enhance cooperation among humans. We conclude by discussing future research avenues, such as using large language models, establishing unified theoretical frameworks, revisiting existing theories of human cooperation, and exploring multiple real-world applications.
{"title":"Multi-agent, human–agent and beyond: A survey on cooperation in social dilemmas","authors":"","doi":"10.1016/j.neucom.2024.128514","DOIUrl":"10.1016/j.neucom.2024.128514","url":null,"abstract":"<div><p>The study of cooperation within social dilemmas has long been a fundamental topic across various disciplines, including computer science and social science. Recent advancements in Artificial Intelligence (AI) have significantly reshaped this field, offering fresh insights into understanding and enhancing cooperation. This survey examines three key areas at the intersection of AI and cooperation in social dilemmas. First, focusing on multi-agent cooperation, we review the intrinsic and external motivations that support cooperation among rational agents, and the methods employed to develop effective strategies against diverse opponents. Second, looking into human–agent cooperation, we discuss the current AI algorithms for cooperating with humans and the human biases towards AI agents. Third, we review the emergent field of leveraging AI agents to enhance cooperation among humans. We conclude by discussing future research avenues, such as using large language models, establishing unified theoretical frameworks, revisiting existing theories of human cooperation, and exploring multiple real-world applications.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142244049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1016/j.neucom.2024.128501
This paper presents a novel approach for solving the Complex Word Identification (CWI) task using the text-to-text generative model. The CWI task involves identifying complex words in text, which is a challenging Natural Language Processing task. To our knowledge, it is a first attempt to address CWI problem into text-to-text context. In this work, we propose a new methodology that leverages the power of the Transformer model to evaluate complexity of words in binary and probabilistic settings. We also propose a novel CWI dataset, which consists of 62,200 phrases, both complex and simple. We train and fine-tune our proposed model on our CWI dataset. We also evaluate its performance on separate test sets across three different domains. Our experimental results demonstrate the effectiveness of our proposed approach compared to state-of-the-art methods.
{"title":"Text-to-text generative approach for enhanced complex word identification","authors":"","doi":"10.1016/j.neucom.2024.128501","DOIUrl":"10.1016/j.neucom.2024.128501","url":null,"abstract":"<div><p>This paper presents a novel approach for solving the Complex Word Identification (CWI) task using the text-to-text generative model. The CWI task involves identifying complex words in text, which is a challenging Natural Language Processing task. To our knowledge, it is a first attempt to address CWI problem into text-to-text context. In this work, we propose a new methodology that leverages the power of the Transformer model to evaluate complexity of words in binary and probabilistic settings. We also propose a novel CWI dataset, which consists of 62,200 phrases, both complex and simple. We train and fine-tune our proposed model on our CWI dataset. We also evaluate its performance on separate test sets across three different domains. Our experimental results demonstrate the effectiveness of our proposed approach compared to state-of-the-art methods.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0925231224012724/pdfft?md5=f8ab474940958df48eb8630b15af37e4&pid=1-s2.0-S0925231224012724-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142164299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1016/j.neucom.2024.128525
Deep learning is being increasingly applied to image and video compression in a new paradigm known as neural video compression. While achieving impressive rate–distortion (RD) performance, neural video codecs (NVC) require heavy neural networks, which in turn have large memory and computational costs and often lack important functionalities such as variable rate. These are significant limitations to their practical application. Addressing these problems, recent slimmable image codecs can dynamically adjust their model capacity to elegantly reduce the memory and computation requirements, without harming RD performance. However, the extension to video is not straightforward due to the non-trivial interplay with complex motion estimation and compensation modules in most NVC architectures. In this paper we propose the slimmable video codec framework (SlimVC) that integrates an slimmable autoencoder and a motion-free conditional entropy model. We show that the slimming mechanism is also applicable to the more complex case of video architectures, providing SlimVC with simultaneous control of the computational cost, memory and rate, which are all important requirements in practice. We further provide detailed experimental analysis, and describe application scenarios that can benefit from slimmable video codecs.
{"title":"A slimmable framework for practical neural video compression","authors":"","doi":"10.1016/j.neucom.2024.128525","DOIUrl":"10.1016/j.neucom.2024.128525","url":null,"abstract":"<div><p>Deep learning is being increasingly applied to image and video compression in a new paradigm known as neural video compression. While achieving impressive rate–distortion (RD) performance, neural video codecs (NVC) require heavy neural networks, which in turn have large memory and computational costs and often lack important functionalities such as variable rate. These are significant limitations to their practical application. Addressing these problems, recent slimmable image codecs can dynamically adjust their model capacity to elegantly reduce the memory and computation requirements, without harming RD performance. However, the extension to video is not straightforward due to the non-trivial interplay with complex motion estimation and compensation modules in most NVC architectures. In this paper we propose the slimmable video codec framework (SlimVC) that integrates an slimmable autoencoder and a motion-free conditional entropy model. We show that the slimming mechanism is also applicable to the more complex case of video architectures, providing SlimVC with simultaneous control of the computational cost, memory and rate, which are all important requirements in practice. We further provide detailed experimental analysis, and describe application scenarios that can benefit from slimmable video codecs.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0925231224012967/pdfft?md5=654b6c4d97ef5741b1cbca57e7e0b8f4&pid=1-s2.0-S0925231224012967-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142164188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1016/j.neucom.2024.128527
Object detection in optical remote sensing (RS) images is crucial for both military and civilian applications. However, a major challenge in RS object detection lies in the complexity of texture details within the images, which makes it difficult to accurately identify the objects. Currently, many object detection methods based on deep learning focus primarily on network architecture and label assignment design. These methods often employ an end-to-end training approach, where the loss function only directly constraints the final output layer. However, this approach gives each module within the network a significant amount of freedom during the optimization process, which can hinder the network’s ability to effectively focus on the object and limit detection accuracy. To address these limitations, this paper proposes a novel approach called the Attention Feature Guided Network (AFGN). In this approach, a Attention Feature Guided Branch (AFGB) is introduced during the training phase of the CNN-based end-to-end detection network. The AFGB provides additional shallow supervision outside the detector’s output layer, guiding the backbone to effectively focus on the object amidst complex backgrounds. Additionally, a new operation called Background Blur Mask (BBM) is proposed, which is embedded in the AFGB to achieve image-level attention. Experiments conducted on the DIOR dataset demonstrate the effectiveness and efficiency of the proposed method. Our method achieves an mAP (mean average precision) of 0.777, surpassing many state-of-the-art object detection methods.
{"title":"AFGN: Attention Feature Guided Network for object detection in optical remote sensing image","authors":"","doi":"10.1016/j.neucom.2024.128527","DOIUrl":"10.1016/j.neucom.2024.128527","url":null,"abstract":"<div><p>Object detection in optical remote sensing (RS) images is crucial for both military and civilian applications. However, a major challenge in RS object detection lies in the complexity of texture details within the images, which makes it difficult to accurately identify the objects. Currently, many object detection methods based on deep learning focus primarily on network architecture and label assignment design. These methods often employ an end-to-end training approach, where the loss function only directly constraints the final output layer. However, this approach gives each module within the network a significant amount of freedom during the optimization process, which can hinder the network’s ability to effectively focus on the object and limit detection accuracy. To address these limitations, this paper proposes a novel approach called the Attention Feature Guided Network (AFGN). In this approach, a Attention Feature Guided Branch (AFGB) is introduced during the training phase of the CNN-based end-to-end detection network. The AFGB provides additional shallow supervision outside the detector’s output layer, guiding the backbone to effectively focus on the object amidst complex backgrounds. Additionally, a new operation called Background Blur Mask (BBM) is proposed, which is embedded in the AFGB to achieve image-level attention. Experiments conducted on the DIOR dataset demonstrate the effectiveness and efficiency of the proposed method. Our method achieves an mAP (mean average precision) of 0.777, surpassing many state-of-the-art object detection methods.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1016/j.neucom.2024.128526
In recent years, the capsule network has significantly impacted deep learning with its unique structure that robustly handles spatial relationships and image deformations like rotation and scaling. While previous research has primarily focused on enhancing the structural network of capsule networks to process complex images, little attention has been given to the rich semantic information contained within the capsules themselves. We recognize this gap and propose the Multi-Order Descartes Expansion Capsule Network (MODE-CapsNet). By introducing the Multi-Order Descartes Expansion Transformation (MODET), this innovative architecture enhances the expressiveness of a single capsule by enabling its projection into a higher-dimensional space. As far as we know, this is the first significant enhancement at the single-capsule granularity level, providing a new perspective for improving capsule networks. Additionally, we proposed a hierarchical routing algorithm designed explicitly for the MODE capsules, significantly optimizing computational efficiency and performance. Experimental results on datasets (MNIST, Fashion-MNIST, SVHN, CIFAR-10, tiny-ImageNet) showed that MODE capsules exhibited improved separability and expressiveness, contributing to overall network accuracy, robustness, and computational efficiency.
{"title":"A novel capsule network based on Multi-Order Descartes Extension Transformation","authors":"","doi":"10.1016/j.neucom.2024.128526","DOIUrl":"10.1016/j.neucom.2024.128526","url":null,"abstract":"<div><p>In recent years, the capsule network has significantly impacted deep learning with its unique structure that robustly handles spatial relationships and image deformations like rotation and scaling. While previous research has primarily focused on enhancing the structural network of capsule networks to process complex images, little attention has been given to the rich semantic information contained within the capsules themselves. We recognize this gap and propose the Multi-Order Descartes Expansion Capsule Network (MODE-CapsNet). By introducing the Multi-Order Descartes Expansion Transformation (MODET), this innovative architecture enhances the expressiveness of a single capsule by enabling its projection into a higher-dimensional space. As far as we know, this is the first significant enhancement at the single-capsule granularity level, providing a new perspective for improving capsule networks. Additionally, we proposed a hierarchical routing algorithm designed explicitly for the MODE capsules, significantly optimizing computational efficiency and performance. Experimental results on datasets (MNIST, Fashion-MNIST, SVHN, CIFAR-10, tiny-ImageNet) showed that MODE capsules exhibited improved separability and expressiveness, contributing to overall network accuracy, robustness, and computational efficiency.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142244271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1016/j.neucom.2024.128461
Deep Convolutional Neural Networks (CNNs) have been widely used in various domains due to their impressive capabilities. These models are typically composed of a large number of 2D convolutional (Conv2D) layers with numerous trainable parameters. To manage the complexity of such networks, compression techniques can be applied, which typically rely on the analysis of trained deep learning models. However, in certain situations, training a new CNN from scratch may be infeasible due to resource limitations. In this paper, we propose an alternative parameterization to Conv2D filters with significantly fewer parameters without relying on compressing a pre-trained CNN. Our analysis reveals that the effective rank of the vectorized Conv2D filters decreases with respect to the increasing depth in the network. This leads to the development of the Depthwise Convolutional Eigen-Filter (DeCEF) layer, which is a low rank version of the Conv2D layer with significantly fewer trainable parameters and floating point operations (FLOPs). The way we define the effective rank is different from previous work, and it is easy to implement and interpret. Applying this technique is straightforward – one can simply replace any standard convolutional layer with a DeCEF layer in a CNN. To evaluate the effectiveness of DeCEF layers, experiments are conducted on the benchmark datasets CIFAR-10 and ImageNet for various network architectures. The results have shown a similar or higher accuracy using about 2/3 of the original parameters and reducing the number of FLOPs to 2/3 of the base network. Additionally, analyzing the patterns in the effective rank provides insights into the inner workings of CNNs and highlights opportunities for future research.
{"title":"Building efficient CNNs using Depthwise Convolutional Eigen-Filters (DeCEF)","authors":"","doi":"10.1016/j.neucom.2024.128461","DOIUrl":"10.1016/j.neucom.2024.128461","url":null,"abstract":"<div><p>Deep Convolutional Neural Networks (CNNs) have been widely used in various domains due to their impressive capabilities. These models are typically composed of a large number of 2D convolutional (Conv2D) layers with numerous trainable parameters. To manage the complexity of such networks, compression techniques can be applied, which typically rely on the analysis of trained deep learning models. However, in certain situations, training a new CNN from scratch may be infeasible due to resource limitations. In this paper, we propose an alternative parameterization to Conv2D filters with significantly fewer parameters without relying on compressing a pre-trained CNN. Our analysis reveals that the effective rank of the vectorized Conv2D filters decreases with respect to the increasing depth in the network. This leads to the development of the Depthwise Convolutional Eigen-Filter (DeCEF) layer, which is a low rank version of the Conv2D layer with significantly fewer trainable parameters and floating point operations (FLOPs). The way we define the effective rank is different from previous work, and it is easy to implement and interpret. Applying this technique is straightforward – one can simply replace any standard convolutional layer with a DeCEF layer in a CNN. To evaluate the effectiveness of DeCEF layers, experiments are conducted on the benchmark datasets CIFAR-10 and ImageNet for various network architectures. The results have shown a similar or higher accuracy using about 2/3 of the original parameters and reducing the number of FLOPs to 2/3 of the base network. Additionally, analyzing the patterns in the effective rank provides insights into the inner workings of CNNs and highlights opportunities for future research.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0925231224012323/pdfft?md5=6f5a3a86accd86ed460b34e4b3ac884f&pid=1-s2.0-S0925231224012323-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142128681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1016/j.neucom.2024.128538
Imbalanced datasets pose challenges to standard classification algorithms. Although oversampling techniques can balance the number of samples across different classes, the difficulties of imbalanced classification is not solely imbalanced data itself but other factors, such as small disjuncts and overlapping regions, especially in the presence of noise. Traditional oversampling techniques are not effectively address these intricacies. To this end, we propose a novel oversampling method called Newton’s Cooling Law-Based Weighted Oversampling (NCLWO). The proposed method initially calculates the weight of the minority class based on density and closeness factors to identify hard-to-learn samples, assigning them higher heat. Subsequently, Newton’s Cooling Law is applied to each minority class sample by using it as the center and expanding the sampling region outward, gradually decreasing the heat until reaching a balanced state. Finally, majority class samples within the sampling region are translated to eliminate overlapping areas, and a weighted oversampling approach is employed to synthesize informative minority class samples. The experimental study, carried out on a set of benchmark datasets, confirm that the proposed method not only outperforms state-of-the-art oversampling approaches but also shows greater robustness in the presence of feature noise.
{"title":"NCLWO: Newton’s cooling law-based weighted oversampling algorithm for imbalanced datasets with feature noise","authors":"","doi":"10.1016/j.neucom.2024.128538","DOIUrl":"10.1016/j.neucom.2024.128538","url":null,"abstract":"<div><p>Imbalanced datasets pose challenges to standard classification algorithms. Although oversampling techniques can balance the number of samples across different classes, the difficulties of imbalanced classification is not solely imbalanced data itself but other factors, such as small disjuncts and overlapping regions, especially in the presence of noise. Traditional oversampling techniques are not effectively address these intricacies. To this end, we propose a novel oversampling method called <em>Newton’s Cooling Law-Based Weighted Oversampling</em> (NCLWO). The proposed method initially calculates the weight of the minority class based on density and closeness factors to identify hard-to-learn samples, assigning them higher heat. Subsequently, Newton’s Cooling Law is applied to each minority class sample by using it as the center and expanding the sampling region outward, gradually decreasing the heat until reaching a balanced state. Finally, majority class samples within the sampling region are translated to eliminate overlapping areas, and a weighted oversampling approach is employed to synthesize informative minority class samples. The experimental study, carried out on a set of benchmark datasets, confirm that the proposed method not only outperforms state-of-the-art oversampling approaches but also shows greater robustness in the presence of feature noise.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1016/j.neucom.2024.128537
The study of modulating the neuronal signals by electrical stimuli is important to intervene the abnormal neuronal firing and bring them to a normal state. Spike trains, which are the highest quality of brain signals, have been deficiently explored and analyzed owing to the challenges of obtaining them in reality. In this regard, this paper aims to investigate and analyze the effect of electrical stimuli on the spiking response of neurons, and the following work is to be carried out. The relationships between the spiking response and three parameters (namely, the amplitude of the electrode current (EC), the angular velocity of the electric field current (EFC), and the signal-noise ratio (SNR)) are examined on a neuronal model with spatial length and multiple active properties. When specific currents with different SNRs are imposed on the neurons, their influence on the spiking response is further explored. With regard to the spiking response, the main focus is on three characteristics, i.e., the spiking pattern, the spike count (SC), and the spiking arrangement. An algorithm, called the return map distance (RMD) algorithm, is proposed in this paper, and gives the classification of spiking patterns a quantitative criterion. Based on it, the spiking patterns are classified in this paper as busting spike train, regular spike train (RST), and meager spike train (MST). Simulation results indicate that both the amplitude of the EC and the angular velocity of the EFC change the neuronal spiking patterns. As the amplitude (angular velocity) of the EC (EFC) increases, the spiking pattern of the Soldado-Magraner model (SMM) eventually tends to RST (MST). In addition, the SC increases with the amplitude of the EC, while it does not hold for the SC with respect to the angular velocity of the EFC. Furthermore, the spiking arrangement and the SC are severely destroyed for the EC with low SNRs, while three spiking features of the SMM under EFC are all robust to the different SNRs, which implies that compared with the EC, the spiking responses of the SMM under EFC are more stable. The findings in this paper may provide some theoretical guidance to the fields related to neuronal firing, such as brain-computer interfaces and electrotherapy. The RMD algorithm proposed here can be applied to more individual neurons, and the spiking arrangement discussed here could be regarded as an effective encoding way for spike trains.
{"title":"Exploration on the spiking response of a single compartment neuron with multiple active properties under electrical stimuli","authors":"","doi":"10.1016/j.neucom.2024.128537","DOIUrl":"10.1016/j.neucom.2024.128537","url":null,"abstract":"<div><p>The study of modulating the neuronal signals by electrical stimuli is important to intervene the abnormal neuronal firing and bring them to a normal state. Spike trains, which are the highest quality of brain signals, have been deficiently explored and analyzed owing to the challenges of obtaining them in reality. In this regard, this paper aims to investigate and analyze the effect of electrical stimuli on the spiking response of neurons, and the following work is to be carried out. The relationships between the spiking response and three parameters (namely, the amplitude of the electrode current (EC), the angular velocity of the electric field current (EFC), and the signal-noise ratio (SNR)) are examined on a neuronal model with spatial length and multiple active properties. When specific currents with different SNRs are imposed on the neurons, their influence on the spiking response is further explored. With regard to the spiking response, the main focus is on three characteristics, i.e., the spiking pattern, the spike count (SC), and the spiking arrangement. An algorithm, called the return map distance (RMD) algorithm, is proposed in this paper, and gives the classification of spiking patterns a quantitative criterion. Based on it, the spiking patterns are classified in this paper as busting spike train, regular spike train (RST), and meager spike train (MST). Simulation results indicate that both the amplitude of the EC and the angular velocity of the EFC change the neuronal spiking patterns. As the amplitude (angular velocity) of the EC (EFC) increases, the spiking pattern of the Soldado-Magraner model (SMM) eventually tends to RST (MST). In addition, the SC increases with the amplitude of the EC, while it does not hold for the SC with respect to the angular velocity of the EFC. Furthermore, the spiking arrangement and the SC are severely destroyed for the EC with low SNRs, while three spiking features of the SMM under EFC are all robust to the different SNRs, which implies that compared with the EC, the spiking responses of the SMM under EFC are more stable. The findings in this paper may provide some theoretical guidance to the fields related to neuronal firing, such as brain-computer interfaces and electrotherapy. The RMD algorithm proposed here can be applied to more individual neurons, and the spiking arrangement discussed here could be regarded as an effective encoding way for spike trains.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1016/j.neucom.2024.128529
One-shot Network Pruning at Initialization (OPaI) is acknowledged as a highly cost-effective strategy for network pruning. However, it has been observed that OPaI models tend to suffer from reduced accuracy stability as target sparsity increases. This study introduces a novel approach by incorporating Discriminative Data (DD) into OPaI, significantly improving performance at higher sparsity levels while maintaining the “one-shot” nature. Our approach achieves state-of-the-art (SOTA) performance, challenging the previously held belief of OPaI’s data independence. Through detailed ablation studies, we thoroughly investigate the influence of data on OPaI, particularly focusing on how DD addresses a common failure in OPaI known as “layer collapse”. Furthermore, our experiments demonstrate that leveraging DD from various pre-trained models can markedly boost pruning performance across different models without requiring changes to the existing model architectures or pruning methodologies. These significant improvements highlight our method’s high generalizability and stability, paving new paths for advancing pruning strategies. Our code is publicly available at: https://github.com/Nonac/DDOPaI.
{"title":"Leveraging discriminative data: A pathway to high-performance, stable One-shot Network Pruning at Initialization","authors":"","doi":"10.1016/j.neucom.2024.128529","DOIUrl":"10.1016/j.neucom.2024.128529","url":null,"abstract":"<div><p>One-shot Network Pruning at Initialization (OPaI) is acknowledged as a highly cost-effective strategy for network pruning. However, it has been observed that OPaI models tend to suffer from reduced accuracy stability as target sparsity increases. This study introduces a novel approach by incorporating Discriminative Data (DD) into OPaI, significantly improving performance at higher sparsity levels while maintaining the “one-shot” nature. Our approach achieves state-of-the-art (SOTA) performance, challenging the previously held belief of OPaI’s data independence. Through detailed ablation studies, we thoroughly investigate the influence of data on OPaI, particularly focusing on how DD addresses a common failure in OPaI known as “layer collapse”. Furthermore, our experiments demonstrate that leveraging DD from various pre-trained models can markedly boost pruning performance across different models without requiring changes to the existing model architectures or pruning methodologies. These significant improvements highlight our method’s high generalizability and stability, paving new paths for advancing pruning strategies. Our code is publicly available at: <span><span>https://github.com/Nonac/DDOPaI</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142164301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1016/j.neucom.2024.128533
Broad learning system (BLS) is an efficient incremental learning machine algorithm. However, there are some disadvantages in such an algorithm. For example, the number of hidden layer nodes needs to be manually adjusted during the training process, meanwhile the large uncertainty will be caused by two random mappings. To solve these problems, based on the optimization ability of the kernel function, a double-kernel broad learning system (DKBLS) is proposed to eliminate the uncertainty of random mapping by using additive kernel strategy. Meanwhile, to reduce the computing costs and training time of DKBLS, a double-kernel based bayesian approximation broad learning system with dropout (Dropout-DKBLS) is further proposed. Ablation experiments show that the output accuracy of Dropout-DKBLS does not decrease even if the node is dropped. In addition, function approximation experiments show that DKBLS and Dropout-DKBLS have good robustness and can accurately predict noise data. The regression and classification experiments on multiple datasets are compared with the latest kernel-based learning methods. The comparison results show that both DKBLS and Dropout-DKBLS have good regression and classification performance. By further comparing the training time of these kernel-based learning methods, we prove that the Dropout-DKBLS can reduce the computational cost while ensuring the output accuracy.
{"title":"Double-kernel based Bayesian approximation broad learning system with dropout","authors":"","doi":"10.1016/j.neucom.2024.128533","DOIUrl":"10.1016/j.neucom.2024.128533","url":null,"abstract":"<div><p>Broad learning system (BLS) is an efficient incremental learning machine algorithm. However, there are some disadvantages in such an algorithm. For example, the number of hidden layer nodes needs to be manually adjusted during the training process, meanwhile the large uncertainty will be caused by two random mappings. To solve these problems, based on the optimization ability of the kernel function, a double-kernel broad learning system (DKBLS) is proposed to eliminate the uncertainty of random mapping by using additive kernel strategy. Meanwhile, to reduce the computing costs and training time of DKBLS, a double-kernel based bayesian approximation broad learning system with dropout (Dropout-DKBLS) is further proposed. Ablation experiments show that the output accuracy of Dropout-DKBLS does not decrease even if the node is dropped. In addition, function approximation experiments show that DKBLS and Dropout-DKBLS have good robustness and can accurately predict noise data. The regression and classification experiments on multiple datasets are compared with the latest kernel-based learning methods. The comparison results show that both DKBLS and Dropout-DKBLS have good regression and classification performance. By further comparing the training time of these kernel-based learning methods, we prove that the Dropout-DKBLS can reduce the computational cost while ensuring the output accuracy.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}