Dayena Jeong, Jaewoo Park, Jeonghee Jo, Jongkil Park, Jaewook Kim, Hyun Jae Jang, Suyoun Lee, Seongsik Park
Recent deep neural networks (DNNs), such as diffusion models [1], have faced high computational demands. Thus, spiking neural networks (SNNs) have attracted lots of attention as energy-efficient neural networks. However, conventional spiking neurons, such as leaky integrate-and-fire neurons, cannot accurately represent complex non-linear activation functions, such as Swish [2]. To approximate activation functions with spiking neurons, few spikes (FS) neurons were proposed [3], but the approximation performance was limited due to the lack of training methods considering the neurons. Thus, we propose tendency-based parameter initialization (TBPI) to enhance the approximation of activation function with FS neurons, exploiting temporal dependencies initializing the training parameters.
{"title":"A More Accurate Approximation of Activation Function with Few Spikes Neurons","authors":"Dayena Jeong, Jaewoo Park, Jeonghee Jo, Jongkil Park, Jaewook Kim, Hyun Jae Jang, Suyoun Lee, Seongsik Park","doi":"arxiv-2409.00044","DOIUrl":"https://doi.org/arxiv-2409.00044","url":null,"abstract":"Recent deep neural networks (DNNs), such as diffusion models [1], have faced\u0000high computational demands. Thus, spiking neural networks (SNNs) have attracted\u0000lots of attention as energy-efficient neural networks. However, conventional\u0000spiking neurons, such as leaky integrate-and-fire neurons, cannot accurately\u0000represent complex non-linear activation functions, such as Swish [2]. To\u0000approximate activation functions with spiking neurons, few spikes (FS) neurons\u0000were proposed [3], but the approximation performance was limited due to the\u0000lack of training methods considering the neurons. Thus, we propose\u0000tendency-based parameter initialization (TBPI) to enhance the approximation of\u0000activation function with FS neurons, exploiting temporal dependencies\u0000initializing the training parameters.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu
The growth rate of the GPU memory capacity has not been able to keep up with that of the size of large language models (LLMs), hindering the model training process. In particular, activations -- the intermediate tensors produced during forward propagation and reused in backward propagation -- dominate the GPU memory use. To address this challenge, we propose TBA to efficiently offload activations to high-capacity NVMe SSDs. This approach reduces GPU memory usage without impacting performance by adaptively overlapping data transfers with computation. TBA is compatible with popular deep learning frameworks like PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor deduplication, forwarding, and adaptive offloading to further enhance efficiency. We conduct extensive experiments on GPT, BERT, and T5. Results demonstrate that TBA effectively reduces 47% of the activation peak memory usage. At the same time, TBA perfectly overlaps the I/O with the computation and incurs negligible performance overhead. We introduce the recompute-offload-keep (ROK) curve to compare the TBA offloading with other two tensor placement strategies, keeping activations in memory and layerwise full recomputation. We find that TBA achieves better memory savings than layerwise full recomputation while retaining the performance of keeping the activations in memory.
{"title":"TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading","authors":"Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu","doi":"arxiv-2408.10013","DOIUrl":"https://doi.org/arxiv-2408.10013","url":null,"abstract":"The growth rate of the GPU memory capacity has not been able to keep up with\u0000that of the size of large language models (LLMs), hindering the model training\u0000process. In particular, activations -- the intermediate tensors produced during\u0000forward propagation and reused in backward propagation -- dominate the GPU\u0000memory use. To address this challenge, we propose TBA to efficiently offload\u0000activations to high-capacity NVMe SSDs. This approach reduces GPU memory usage\u0000without impacting performance by adaptively overlapping data transfers with\u0000computation. TBA is compatible with popular deep learning frameworks like\u0000PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor\u0000deduplication, forwarding, and adaptive offloading to further enhance\u0000efficiency. We conduct extensive experiments on GPT, BERT, and T5. Results\u0000demonstrate that TBA effectively reduces 47% of the activation peak memory\u0000usage. At the same time, TBA perfectly overlaps the I/O with the computation\u0000and incurs negligible performance overhead. We introduce the\u0000recompute-offload-keep (ROK) curve to compare the TBA offloading with other two\u0000tensor placement strategies, keeping activations in memory and layerwise full\u0000recomputation. We find that TBA achieves better memory savings than layerwise\u0000full recomputation while retaining the performance of keeping the activations\u0000in memory.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A continual learning agent builds on previous experiences to develop increasingly complex behaviors by adapting to non-stationary and dynamic environments while preserving previously acquired knowledge. However, scaling these systems presents significant challenges, particularly in balancing the preservation of previous policies with the adaptation of new ones to current environments. This balance, known as the stability-plasticity dilemma, is especially pronounced in complex multi-agent domains such as the train scheduling problem, where environmental and agent behaviors are constantly changing, and the search space is vast. In this work, we propose addressing these challenges in the train scheduling problem using curriculum learning. We design a curriculum with adjacent skills that build on each other to improve generalization performance. Introducing a curriculum with distinct tasks introduces non-stationarity, which we address by proposing a new algorithm: Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically generates and adjusts Q-function subspaces to handle environmental changes and task requirements. CDE mitigates catastrophic forgetting through EWC while ensuring high plasticity using adaptive rational activation functions. Experimental results demonstrate significant improvements in learning efficiency and adaptability compared to RL baselines and other adapted methods for continual learning, highlighting the potential of our method in managing the stability-plasticity dilemma in the adaptive train scheduling setting.
{"title":"Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion","authors":"Achref Jaziri, Etienne Künzel, Visvanathan Ramesh","doi":"arxiv-2408.09838","DOIUrl":"https://doi.org/arxiv-2408.09838","url":null,"abstract":"A continual learning agent builds on previous experiences to develop\u0000increasingly complex behaviors by adapting to non-stationary and dynamic\u0000environments while preserving previously acquired knowledge. However, scaling\u0000these systems presents significant challenges, particularly in balancing the\u0000preservation of previous policies with the adaptation of new ones to current\u0000environments. This balance, known as the stability-plasticity dilemma, is\u0000especially pronounced in complex multi-agent domains such as the train\u0000scheduling problem, where environmental and agent behaviors are constantly\u0000changing, and the search space is vast. In this work, we propose addressing\u0000these challenges in the train scheduling problem using curriculum learning. We\u0000design a curriculum with adjacent skills that build on each other to improve\u0000generalization performance. Introducing a curriculum with distinct tasks\u0000introduces non-stationarity, which we address by proposing a new algorithm:\u0000Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically\u0000generates and adjusts Q-function subspaces to handle environmental changes and\u0000task requirements. CDE mitigates catastrophic forgetting through EWC while\u0000ensuring high plasticity using adaptive rational activation functions.\u0000Experimental results demonstrate significant improvements in learning\u0000efficiency and adaptability compared to RL baselines and other adapted methods\u0000for continual learning, highlighting the potential of our method in managing\u0000the stability-plasticity dilemma in the adaptive train scheduling setting.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian
Human Action Recognition (HAR) stands as a pivotal research domain in both computer vision and artificial intelligence, with RGB cameras dominating as the preferred tool for investigation and innovation in this field. However, in real-world applications, RGB cameras encounter numerous challenges, including light conditions, fast motion, and privacy concerns. Consequently, bio-inspired event cameras have garnered increasing attention due to their advantages of low energy consumption, high dynamic range, etc. Nevertheless, most existing event-based HAR datasets are low resolution ($346 times 260$). In this paper, we propose a large-scale, high-definition ($1280 times 800$) human action recognition dataset based on the CeleX-V event camera, termed CeleX-HAR. It encompasses 150 commonly occurring action categories, comprising a total of 124,625 video sequences. Various factors such as multi-view, illumination, action speed, and occlusion are considered when recording these data. To build a more comprehensive benchmark dataset, we report over 20 mainstream HAR models for future works to compare. In addition, we also propose a novel Mamba vision backbone network for event stream based HAR, termed EVMamba, which equips the spatial plane multi-directional scanning and novel voxel temporal scanning mechanism. By encoding and mining the spatio-temporal information of event streams, our EVMamba has achieved favorable results across multiple datasets. Both the dataset and source code will be released on url{https://github.com/Event-AHU/CeleX-HAR}
人类动作识别(HAR)是计算机视觉和人工智能领域的一个重要研究领域,RGB 摄像机是该领域研究和创新的首选工具。然而,在现实世界的应用中,RGB 摄像机遇到了许多挑战,包括光线条件、快速运动和隐私问题。因此,生物事件相机因其低能耗、高动态范围等优点而受到越来越多的关注。然而,现有的基于事件的 HAR 数据集大多分辨率较低(346 美元/次 260 美元)。在本文中,我们提出了一个基于 CeleX-V 事件相机的大规模、高清晰度(1280 美元乘以 800 美元)人类动作识别数据集,称为 CeleX-HAR。该数据集涵盖 150 个常见动作类别,共包含 124625 个视频序列。在记录这些数据时,考虑了多视角、光照、动作速度和遮挡等各种因素。为了建立一个更全面的基准数据集,我们报告了 20 多个主流 HAR 模型,供未来的工作进行比较。此外,我们还为基于事件流的 HAR 提出了一种新颖的 Mamba 视觉骨干网络,称为 EVMamba,它配备了空间平面多向扫描和新颖的体素时间扫描机制。通过对事件流的时空信息进行编码和挖掘,我们的EVMamba在多个数据集上取得了良好的效果。数据集和源代码都将在(https://github.com/Event-AHU/CeleX-HAR)上发布。
{"title":"Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms","authors":"Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian","doi":"arxiv-2408.09764","DOIUrl":"https://doi.org/arxiv-2408.09764","url":null,"abstract":"Human Action Recognition (HAR) stands as a pivotal research domain in both\u0000computer vision and artificial intelligence, with RGB cameras dominating as the\u0000preferred tool for investigation and innovation in this field. However, in\u0000real-world applications, RGB cameras encounter numerous challenges, including\u0000light conditions, fast motion, and privacy concerns. Consequently, bio-inspired\u0000event cameras have garnered increasing attention due to their advantages of low\u0000energy consumption, high dynamic range, etc. Nevertheless, most existing\u0000event-based HAR datasets are low resolution ($346 times 260$). In this paper,\u0000we propose a large-scale, high-definition ($1280 times 800$) human action\u0000recognition dataset based on the CeleX-V event camera, termed CeleX-HAR. It\u0000encompasses 150 commonly occurring action categories, comprising a total of\u0000124,625 video sequences. Various factors such as multi-view, illumination,\u0000action speed, and occlusion are considered when recording these data. To build\u0000a more comprehensive benchmark dataset, we report over 20 mainstream HAR models\u0000for future works to compare. In addition, we also propose a novel Mamba vision\u0000backbone network for event stream based HAR, termed EVMamba, which equips the\u0000spatial plane multi-directional scanning and novel voxel temporal scanning\u0000mechanism. By encoding and mining the spatio-temporal information of event\u0000streams, our EVMamba has achieved favorable results across multiple datasets.\u0000Both the dataset and source code will be released on\u0000url{https://github.com/Event-AHU/CeleX-HAR}","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Active Inference framework models perception and action as a unified process, where agents use probabilistic models to predict and actively minimize sensory discrepancies. In complement and contrast, traditional population-based metaheuristics rely on reactive environmental interactions without anticipatory adaptation. This paper proposes the integration of Active Inference into these metaheuristics to enhance performance through anticipatory environmental adaptation. We demonstrate this approach specifically with Ant Colony Optimization (ACO) on the Travelling Salesman Problem (TSP). Experimental results indicate that Active Inference can yield some improved solutions with only a marginal increase in computational cost, with interesting patterns of performance that relate to number and topology of nodes in the graph. Further work will characterize where and when different types of Active Inference augmentation of population metaheuristics may be efficacious.
{"title":"Enhancing Population-based Search with Active Inference","authors":"Nassim Dehouche, Daniel Friedman","doi":"arxiv-2408.09548","DOIUrl":"https://doi.org/arxiv-2408.09548","url":null,"abstract":"The Active Inference framework models perception and action as a unified\u0000process, where agents use probabilistic models to predict and actively minimize\u0000sensory discrepancies. In complement and contrast, traditional population-based\u0000metaheuristics rely on reactive environmental interactions without anticipatory\u0000adaptation. This paper proposes the integration of Active Inference into these\u0000metaheuristics to enhance performance through anticipatory environmental\u0000adaptation. We demonstrate this approach specifically with Ant Colony\u0000Optimization (ACO) on the Travelling Salesman Problem (TSP). Experimental\u0000results indicate that Active Inference can yield some improved solutions with\u0000only a marginal increase in computational cost, with interesting patterns of\u0000performance that relate to number and topology of nodes in the graph. Further\u0000work will characterize where and when different types of Active Inference\u0000augmentation of population metaheuristics may be efficacious.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas
Forward-only learning algorithms have recently gained attention as alternatives to gradient backpropagation, replacing the backward step of this latter solver with an additional contrastive forward pass. Among these approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to achieve competitive levels of performance in terms of generalization and complexity. Networks trained using FFA learn to contrastively maximize a layer-wise defined goodness score when presented with real data (denoted as positive samples) and to minimize it when processing synthetic data (corr. negative samples). However, this algorithm still faces weaknesses that negatively affect the model accuracy and training stability, primarily due to a gradient imbalance between positive and negative samples. To overcome this issue, in this work we propose a novel implementation of the FFA algorithm, denoted as Polar-FFA, which extends the original formulation by introducing a neural division (emph{polarization}) between positive and negative instances. Neurons in each of these groups aim to maximize their goodness when presented with their respective data type, thereby creating a symmetric gradient behavior. To empirically gauge the improved learning capabilities of our proposed Polar-FFA, we perform several systematic experiments using different activation and goodness functions over image classification datasets. Our results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and convergence speed. Furthermore, its lower reliance on hyperparameters reduces the need for hyperparameter tuning to guarantee optimal generalization capabilities, thereby allowing for a broader range of neural network configurations.
{"title":"On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization","authors":"Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas","doi":"arxiv-2408.09210","DOIUrl":"https://doi.org/arxiv-2408.09210","url":null,"abstract":"Forward-only learning algorithms have recently gained attention as\u0000alternatives to gradient backpropagation, replacing the backward step of this\u0000latter solver with an additional contrastive forward pass. Among these\u0000approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to\u0000achieve competitive levels of performance in terms of generalization and\u0000complexity. Networks trained using FFA learn to contrastively maximize a\u0000layer-wise defined goodness score when presented with real data (denoted as\u0000positive samples) and to minimize it when processing synthetic data (corr.\u0000negative samples). However, this algorithm still faces weaknesses that\u0000negatively affect the model accuracy and training stability, primarily due to a\u0000gradient imbalance between positive and negative samples. To overcome this\u0000issue, in this work we propose a novel implementation of the FFA algorithm,\u0000denoted as Polar-FFA, which extends the original formulation by introducing a\u0000neural division (emph{polarization}) between positive and negative instances.\u0000Neurons in each of these groups aim to maximize their goodness when presented\u0000with their respective data type, thereby creating a symmetric gradient\u0000behavior. To empirically gauge the improved learning capabilities of our\u0000proposed Polar-FFA, we perform several systematic experiments using different\u0000activation and goodness functions over image classification datasets. Our\u0000results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and\u0000convergence speed. Furthermore, its lower reliance on hyperparameters reduces\u0000the need for hyperparameter tuning to guarantee optimal generalization\u0000capabilities, thereby allowing for a broader range of neural network\u0000configurations.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas W. Landry, Beckett R. Hyde, Jake C. Perez, Sean E. Shaheen, Juan G. Restrepo
Efficient and accurate prediction of physical systems is important even when the rules of those systems cannot be easily learned. Reservoir computing, a type of recurrent neural network with fixed nonlinear units, is one such prediction method and is valued for its ease of training. Organic electrochemical transistors (OECTs) are physical devices with nonlinear transient properties that can be used as the nonlinear units of a reservoir computer. We present a theoretical framework for simulating reservoir computers using OECTs as the non-linear units as a test bed for designing physical reservoir computers. We present a proof of concept demonstrating that such an implementation can accurately predict the Lorenz attractor with comparable performance to standard reservoir computer implementations. We explore the effect of operating parameters and find that the prediction performance strongly depends on the pinch-off voltage of the OECTs.
{"title":"A theoretical framework for reservoir computing on networks of organic electrochemical transistors","authors":"Nicholas W. Landry, Beckett R. Hyde, Jake C. Perez, Sean E. Shaheen, Juan G. Restrepo","doi":"arxiv-2408.09223","DOIUrl":"https://doi.org/arxiv-2408.09223","url":null,"abstract":"Efficient and accurate prediction of physical systems is important even when\u0000the rules of those systems cannot be easily learned. Reservoir computing, a\u0000type of recurrent neural network with fixed nonlinear units, is one such\u0000prediction method and is valued for its ease of training. Organic\u0000electrochemical transistors (OECTs) are physical devices with nonlinear\u0000transient properties that can be used as the nonlinear units of a reservoir\u0000computer. We present a theoretical framework for simulating reservoir computers\u0000using OECTs as the non-linear units as a test bed for designing physical\u0000reservoir computers. We present a proof of concept demonstrating that such an\u0000implementation can accurately predict the Lorenz attractor with comparable\u0000performance to standard reservoir computer implementations. We explore the\u0000effect of operating parameters and find that the prediction performance\u0000strongly depends on the pinch-off voltage of the OECTs.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spiking neural networks (SNNs) transmit information via low-power binary spikes and have received widespread attention in areas such as computer vision and reinforcement learning. However, there have been very few explorations of SNNs in more practical industrial scenarios. In this paper, we focus on the application of SNNs in bearing fault diagnosis to facilitate the integration of high-performance AI algorithms and real-world industries. In particular, we identify two key limitations of existing SNN fault diagnosis methods: inadequate encoding capacity that necessitates cumbersome data preprocessing, and non-spike-oriented architectures that constrain the performance of SNNs. To alleviate these problems, we propose a Multi-scale Residual Attention SNN (MRA-SNN) to simultaneously improve the efficiency, performance, and robustness of SNN methods. By incorporating a lightweight attention mechanism, we have designed a multi-scale attention encoding module to extract multiscale fault features from vibration signals and encode them as spatio-temporal spikes, eliminating the need for complicated preprocessing. Then, the spike residual attention block extracts high-dimensional fault features and enhances the expressiveness of sparse spikes with the attention mechanism for end-to-end diagnosis. In addition, the performance and robustness of MRA-SNN is further enhanced by introducing the lightweight attention mechanism within the spiking neurons to simulate the biological dendritic filtering effect. Extensive experiments on MFPT and JNU benchmark datasets demonstrate that MRA-SNN significantly outperforms existing methods in terms of accuracy, energy consumption and noise robustness, and is more feasible for deployment in real-world industrial scenarios.
{"title":"Toward End-to-End Bearing Fault Diagnosis for Industrial Scenarios with Spiking Neural Networks","authors":"Yongqi Ding, Lin Zuo, Mengmeng Jing, Kunshan Yang, Biao Chen, Yunqian Yu","doi":"arxiv-2408.11067","DOIUrl":"https://doi.org/arxiv-2408.11067","url":null,"abstract":"Spiking neural networks (SNNs) transmit information via low-power binary\u0000spikes and have received widespread attention in areas such as computer vision\u0000and reinforcement learning. However, there have been very few explorations of\u0000SNNs in more practical industrial scenarios. In this paper, we focus on the\u0000application of SNNs in bearing fault diagnosis to facilitate the integration of\u0000high-performance AI algorithms and real-world industries. In particular, we\u0000identify two key limitations of existing SNN fault diagnosis methods:\u0000inadequate encoding capacity that necessitates cumbersome data preprocessing,\u0000and non-spike-oriented architectures that constrain the performance of SNNs. To\u0000alleviate these problems, we propose a Multi-scale Residual Attention SNN\u0000(MRA-SNN) to simultaneously improve the efficiency, performance, and robustness\u0000of SNN methods. By incorporating a lightweight attention mechanism, we have\u0000designed a multi-scale attention encoding module to extract multiscale fault\u0000features from vibration signals and encode them as spatio-temporal spikes,\u0000eliminating the need for complicated preprocessing. Then, the spike residual\u0000attention block extracts high-dimensional fault features and enhances the\u0000expressiveness of sparse spikes with the attention mechanism for end-to-end\u0000diagnosis. In addition, the performance and robustness of MRA-SNN is further\u0000enhanced by introducing the lightweight attention mechanism within the spiking\u0000neurons to simulate the biological dendritic filtering effect. Extensive\u0000experiments on MFPT and JNU benchmark datasets demonstrate that MRA-SNN\u0000significantly outperforms existing methods in terms of accuracy, energy\u0000consumption and noise robustness, and is more feasible for deployment in\u0000real-world industrial scenarios.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas Soures, Peter Helfer, Anurag Daram, Tej Pandit, Dhireesha Kudithipudi
Catastrophic interference, the loss of previously learned information when learning new information, remains a major challenge in machine learning. Since living organisms do not seem to suffer from this problem, researchers have taken inspiration from biology to improve memory retention in artificial intelligence systems. However, previous attempts to use bio-inspired mechanisms have typically resulted in systems that rely on task boundary information during training and/or explicit task identification during inference, information that is not available in real-world scenarios. Here, we show that neuro-inspired mechanisms such as synaptic consolidation and metaplasticity can mitigate catastrophic interference in a spiking neural network, using only synapse-local information, with no need for task awareness, and with a fixed memory size that does not need to be increased when training on new tasks. Our model, TACOS, combines neuromodulation with complex synaptic dynamics to enable new learning while protecting previous information. We evaluate TACOS on sequential image recognition tasks and demonstrate its effectiveness in reducing catastrophic interference. Our results show that TACOS outperforms existing regularization techniques in domain-incremental learning scenarios. We also report the results of an ablation study to elucidate the contribution of each neuro-inspired mechanism separately.
{"title":"TACOS: Task Agnostic Continual Learning in Spiking Neural Networks","authors":"Nicholas Soures, Peter Helfer, Anurag Daram, Tej Pandit, Dhireesha Kudithipudi","doi":"arxiv-2409.00021","DOIUrl":"https://doi.org/arxiv-2409.00021","url":null,"abstract":"Catastrophic interference, the loss of previously learned information when\u0000learning new information, remains a major challenge in machine learning. Since\u0000living organisms do not seem to suffer from this problem, researchers have\u0000taken inspiration from biology to improve memory retention in artificial\u0000intelligence systems. However, previous attempts to use bio-inspired mechanisms\u0000have typically resulted in systems that rely on task boundary information\u0000during training and/or explicit task identification during inference,\u0000information that is not available in real-world scenarios. Here, we show that\u0000neuro-inspired mechanisms such as synaptic consolidation and metaplasticity can\u0000mitigate catastrophic interference in a spiking neural network, using only\u0000synapse-local information, with no need for task awareness, and with a fixed\u0000memory size that does not need to be increased when training on new tasks. Our\u0000model, TACOS, combines neuromodulation with complex synaptic dynamics to enable\u0000new learning while protecting previous information. We evaluate TACOS on\u0000sequential image recognition tasks and demonstrate its effectiveness in\u0000reducing catastrophic interference. Our results show that TACOS outperforms\u0000existing regularization techniques in domain-incremental learning scenarios. We\u0000also report the results of an ablation study to elucidate the contribution of\u0000each neuro-inspired mechanism separately.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bernhard J. BergerUniversity of Rostock, Software Engineering Chair Rostock, GermanyHamburg University of Technology, Institute of Embedded Systems, Germany, Christina PlumpDFKI - Cyber-Physical Systems Bremen, Germany, Rolf DrechslerUniversity of Bremen, Departments of Mathematics and Computer ScienceDFKI - Cyber-Physical Systems Bremen, Germany
As AI solutions enter safety-critical products, the explainability and interpretability of solutions generated by AI products become increasingly important. In the long term, such explanations are the key to gaining users' acceptance of AI-based systems' decisions. We report on applying a model-driven-based optimisation to search for an interpretable and explainable policy that solves the game 2048. This paper describes a solution to the GECCO'24 Interpretable Control Competition using the open-source software EvoAl. We aimed to develop an approach for creating interpretable policies that are easy to adapt to new ideas.
随着人工智能解决方案进入安全关键型产品,人工智能产品生成的解决方案的可解释性和可解读性变得越来越重要。从长远来看,这种解释是让用户接受人工智能系统决策的关键。我们报告了如何应用基于模型驱动的优化方法来寻找一种可解释和可解释的政策,以解决 2048 游戏。本文介绍了使用开源软件EvoAl为GECCO'24可解释控制竞赛(GECCO'24 Interpretable Control Competition)提供的解决方案。我们的目标是开发一种方法,用于创建易于适应新想法的可解释策略。
{"title":"$EvoAl^{2048}$","authors":"Bernhard J. BergerUniversity of Rostock, Software Engineering Chair Rostock, GermanyHamburg University of Technology, Institute of Embedded Systems, Germany, Christina PlumpDFKI - Cyber-Physical Systems Bremen, Germany, Rolf DrechslerUniversity of Bremen, Departments of Mathematics and Computer ScienceDFKI - Cyber-Physical Systems Bremen, Germany","doi":"arxiv-2408.16780","DOIUrl":"https://doi.org/arxiv-2408.16780","url":null,"abstract":"As AI solutions enter safety-critical products, the explainability and\u0000interpretability of solutions generated by AI products become increasingly\u0000important. In the long term, such explanations are the key to gaining users'\u0000acceptance of AI-based systems' decisions. We report on applying a\u0000model-driven-based optimisation to search for an interpretable and explainable\u0000policy that solves the game 2048. This paper describes a solution to the\u0000GECCO'24 Interpretable Control Competition using the open-source software\u0000EvoAl. We aimed to develop an approach for creating interpretable policies that\u0000are easy to adapt to new ideas.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}