首页 > 最新文献

arXiv - CS - Neural and Evolutionary Computing最新文献

英文 中文
A More Accurate Approximation of Activation Function with Few Spikes Neurons 用少量尖峰神经元更精确地逼近激活函数
Pub Date : 2024-08-19 DOI: arxiv-2409.00044
Dayena Jeong, Jaewoo Park, Jeonghee Jo, Jongkil Park, Jaewook Kim, Hyun Jae Jang, Suyoun Lee, Seongsik Park
Recent deep neural networks (DNNs), such as diffusion models [1], have facedhigh computational demands. Thus, spiking neural networks (SNNs) have attractedlots of attention as energy-efficient neural networks. However, conventionalspiking neurons, such as leaky integrate-and-fire neurons, cannot accuratelyrepresent complex non-linear activation functions, such as Swish [2]. Toapproximate activation functions with spiking neurons, few spikes (FS) neuronswere proposed [3], but the approximation performance was limited due to thelack of training methods considering the neurons. Thus, we proposetendency-based parameter initialization (TBPI) to enhance the approximation ofactivation function with FS neurons, exploiting temporal dependenciesinitializing the training parameters.
最近的深度神经网络(DNN),如扩散模型[1],面临着很高的计算要求。因此,尖峰神经网络(SNN)作为高能效神经网络吸引了大量关注。然而,传统的尖峰神经元(如泄漏整合-发射神经元)无法准确地表示复杂的非线性激活函数,如 Swish[2]。为了用尖峰神经元逼近激活函数,有人提出了少尖峰(FS)神经元 [3],但由于缺乏考虑神经元的训练方法,逼近性能有限。因此,我们提出了基于时序的参数初始化(TBPI),利用训练参数初始化的时序依赖性来提高 FS 神经元激活函数的近似性。
{"title":"A More Accurate Approximation of Activation Function with Few Spikes Neurons","authors":"Dayena Jeong, Jaewoo Park, Jeonghee Jo, Jongkil Park, Jaewook Kim, Hyun Jae Jang, Suyoun Lee, Seongsik Park","doi":"arxiv-2409.00044","DOIUrl":"https://doi.org/arxiv-2409.00044","url":null,"abstract":"Recent deep neural networks (DNNs), such as diffusion models [1], have faced\u0000high computational demands. Thus, spiking neural networks (SNNs) have attracted\u0000lots of attention as energy-efficient neural networks. However, conventional\u0000spiking neurons, such as leaky integrate-and-fire neurons, cannot accurately\u0000represent complex non-linear activation functions, such as Swish [2]. To\u0000approximate activation functions with spiking neurons, few spikes (FS) neurons\u0000were proposed [3], but the approximation performance was limited due to the\u0000lack of training methods considering the neurons. Thus, we propose\u0000tendency-based parameter initialization (TBPI) to enhance the approximation of\u0000activation function with FS neurons, exploiting temporal dependencies\u0000initializing the training parameters.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading TBA:使用基于固态盘的激活卸载加快大型语言模型训练
Pub Date : 2024-08-19 DOI: arxiv-2408.10013
Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu
The growth rate of the GPU memory capacity has not been able to keep up withthat of the size of large language models (LLMs), hindering the model trainingprocess. In particular, activations -- the intermediate tensors produced duringforward propagation and reused in backward propagation -- dominate the GPUmemory use. To address this challenge, we propose TBA to efficiently offloadactivations to high-capacity NVMe SSDs. This approach reduces GPU memory usagewithout impacting performance by adaptively overlapping data transfers withcomputation. TBA is compatible with popular deep learning frameworks likePyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensordeduplication, forwarding, and adaptive offloading to further enhanceefficiency. We conduct extensive experiments on GPT, BERT, and T5. Resultsdemonstrate that TBA effectively reduces 47% of the activation peak memoryusage. At the same time, TBA perfectly overlaps the I/O with the computationand incurs negligible performance overhead. We introduce therecompute-offload-keep (ROK) curve to compare the TBA offloading with other twotensor placement strategies, keeping activations in memory and layerwise fullrecomputation. We find that TBA achieves better memory savings than layerwisefull recomputation while retaining the performance of keeping the activationsin memory.
GPU 内存容量的增长速度一直跟不上大型语言模型(LLM)的大小,从而阻碍了模型的训练过程。特别是激活(activations)--在前向传播过程中产生并在后向传播中重复使用的中间张量--在GPU内存的使用中占主导地位。为了应对这一挑战,我们提出了 TBA 方法,将激活有效地卸载到大容量 NVMe SSD 上。这种方法通过自适应地将数据传输与计算重叠,在不影响性能的情况下减少了 GPU 内存的使用。TBA兼容PyTorch、Megatron和DeepSpeed等流行的深度学习框架,并采用了重复数据传输、转发和自适应卸载等技术来进一步提高效率。我们在 GPT、BERT 和 T5 上进行了大量实验。结果表明,TBA 有效降低了 47% 的激活峰值内存用量。同时,TBA 将 I/O 与计算完美地重叠在一起,产生的性能开销可以忽略不计。我们引入了计算-卸载-保持(ROK)曲线,将 TBA 卸载与其他双传感器放置策略(将激活保持在内存中和分层全计算)进行比较。我们发现,与分层全重新计算相比,TBA 能更好地节省内存,同时保留内存中激活的性能。
{"title":"TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading","authors":"Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu","doi":"arxiv-2408.10013","DOIUrl":"https://doi.org/arxiv-2408.10013","url":null,"abstract":"The growth rate of the GPU memory capacity has not been able to keep up with\u0000that of the size of large language models (LLMs), hindering the model training\u0000process. In particular, activations -- the intermediate tensors produced during\u0000forward propagation and reused in backward propagation -- dominate the GPU\u0000memory use. To address this challenge, we propose TBA to efficiently offload\u0000activations to high-capacity NVMe SSDs. This approach reduces GPU memory usage\u0000without impacting performance by adaptively overlapping data transfers with\u0000computation. TBA is compatible with popular deep learning frameworks like\u0000PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor\u0000deduplication, forwarding, and adaptive offloading to further enhance\u0000efficiency. We conduct extensive experiments on GPT, BERT, and T5. Results\u0000demonstrate that TBA effectively reduces 47% of the activation peak memory\u0000usage. At the same time, TBA perfectly overlaps the I/O with the computation\u0000and incurs negligible performance overhead. We introduce the\u0000recompute-offload-keep (ROK) curve to compare the TBA offloading with other two\u0000tensor placement strategies, keeping activations in memory and layerwise full\u0000recomputation. We find that TBA achieves better memory savings than layerwise\u0000full recomputation while retaining the performance of keeping the activations\u0000in memory.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion 用课程驱动的连续 DQN 扩展缓解自适应列车调度中的稳定性-弹性困境
Pub Date : 2024-08-19 DOI: arxiv-2408.09838
Achref Jaziri, Etienne Künzel, Visvanathan Ramesh
A continual learning agent builds on previous experiences to developincreasingly complex behaviors by adapting to non-stationary and dynamicenvironments while preserving previously acquired knowledge. However, scalingthese systems presents significant challenges, particularly in balancing thepreservation of previous policies with the adaptation of new ones to currentenvironments. This balance, known as the stability-plasticity dilemma, isespecially pronounced in complex multi-agent domains such as the trainscheduling problem, where environmental and agent behaviors are constantlychanging, and the search space is vast. In this work, we propose addressingthese challenges in the train scheduling problem using curriculum learning. Wedesign a curriculum with adjacent skills that build on each other to improvegeneralization performance. Introducing a curriculum with distinct tasksintroduces non-stationarity, which we address by proposing a new algorithm:Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamicallygenerates and adjusts Q-function subspaces to handle environmental changes andtask requirements. CDE mitigates catastrophic forgetting through EWC whileensuring high plasticity using adaptive rational activation functions.Experimental results demonstrate significant improvements in learningefficiency and adaptability compared to RL baselines and other adapted methodsfor continual learning, highlighting the potential of our method in managingthe stability-plasticity dilemma in the adaptive train scheduling setting.
持续学习型代理以先前的经验为基础,通过适应非稳态和动态环境来发展越来越复杂的行为,同时保留先前获得的知识。然而,这些系统的扩展面临着巨大的挑战,尤其是如何在保留以前的政策与适应当前环境的新政策之间取得平衡。这种平衡被称为 "稳定性-可塑性困境",在火车调度问题等复杂的多代理领域尤为突出,因为在这些领域中,环境和代理行为不断变化,搜索空间巨大。在这项工作中,我们建议利用课程学习来解决火车调度问题中的这些难题。我们设计的课程包含相邻的技能,这些技能相互促进,从而提高泛化性能。引入具有不同任务的课程会带来非稳定性,我们提出了一种新算法:连续深度 Q 网络(DQN)扩展(CDE)来解决这一问题。我们的方法可动态生成和调整 Q 函数子空间,以应对环境变化和任务要求。实验结果表明,与 RL 基线和其他适应持续学习的方法相比,我们的方法在学习效率和适应性方面都有显著提高,这凸显了我们的方法在自适应列车调度设置中处理稳定性和可塑性两难问题的潜力。
{"title":"Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion","authors":"Achref Jaziri, Etienne Künzel, Visvanathan Ramesh","doi":"arxiv-2408.09838","DOIUrl":"https://doi.org/arxiv-2408.09838","url":null,"abstract":"A continual learning agent builds on previous experiences to develop\u0000increasingly complex behaviors by adapting to non-stationary and dynamic\u0000environments while preserving previously acquired knowledge. However, scaling\u0000these systems presents significant challenges, particularly in balancing the\u0000preservation of previous policies with the adaptation of new ones to current\u0000environments. This balance, known as the stability-plasticity dilemma, is\u0000especially pronounced in complex multi-agent domains such as the train\u0000scheduling problem, where environmental and agent behaviors are constantly\u0000changing, and the search space is vast. In this work, we propose addressing\u0000these challenges in the train scheduling problem using curriculum learning. We\u0000design a curriculum with adjacent skills that build on each other to improve\u0000generalization performance. Introducing a curriculum with distinct tasks\u0000introduces non-stationarity, which we address by proposing a new algorithm:\u0000Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically\u0000generates and adjusts Q-function subspaces to handle environmental changes and\u0000task requirements. CDE mitigates catastrophic forgetting through EWC while\u0000ensuring high plasticity using adaptive rational activation functions.\u0000Experimental results demonstrate significant improvements in learning\u0000efficiency and adaptability compared to RL baselines and other adapted methods\u0000for continual learning, highlighting the potential of our method in managing\u0000the stability-plasticity dilemma in the adaptive train scheduling setting.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms 基于事件流的人类动作识别:高清基准数据集与算法
Pub Date : 2024-08-19 DOI: arxiv-2408.09764
Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian
Human Action Recognition (HAR) stands as a pivotal research domain in bothcomputer vision and artificial intelligence, with RGB cameras dominating as thepreferred tool for investigation and innovation in this field. However, inreal-world applications, RGB cameras encounter numerous challenges, includinglight conditions, fast motion, and privacy concerns. Consequently, bio-inspiredevent cameras have garnered increasing attention due to their advantages of lowenergy consumption, high dynamic range, etc. Nevertheless, most existingevent-based HAR datasets are low resolution ($346 times 260$). In this paper,we propose a large-scale, high-definition ($1280 times 800$) human actionrecognition dataset based on the CeleX-V event camera, termed CeleX-HAR. Itencompasses 150 commonly occurring action categories, comprising a total of124,625 video sequences. Various factors such as multi-view, illumination,action speed, and occlusion are considered when recording these data. To builda more comprehensive benchmark dataset, we report over 20 mainstream HAR modelsfor future works to compare. In addition, we also propose a novel Mamba visionbackbone network for event stream based HAR, termed EVMamba, which equips thespatial plane multi-directional scanning and novel voxel temporal scanningmechanism. By encoding and mining the spatio-temporal information of eventstreams, our EVMamba has achieved favorable results across multiple datasets.Both the dataset and source code will be released onurl{https://github.com/Event-AHU/CeleX-HAR}
人类动作识别(HAR)是计算机视觉和人工智能领域的一个重要研究领域,RGB 摄像机是该领域研究和创新的首选工具。然而,在现实世界的应用中,RGB 摄像机遇到了许多挑战,包括光线条件、快速运动和隐私问题。因此,生物事件相机因其低能耗、高动态范围等优点而受到越来越多的关注。然而,现有的基于事件的 HAR 数据集大多分辨率较低(346 美元/次 260 美元)。在本文中,我们提出了一个基于 CeleX-V 事件相机的大规模、高清晰度(1280 美元乘以 800 美元)人类动作识别数据集,称为 CeleX-HAR。该数据集涵盖 150 个常见动作类别,共包含 124625 个视频序列。在记录这些数据时,考虑了多视角、光照、动作速度和遮挡等各种因素。为了建立一个更全面的基准数据集,我们报告了 20 多个主流 HAR 模型,供未来的工作进行比较。此外,我们还为基于事件流的 HAR 提出了一种新颖的 Mamba 视觉骨干网络,称为 EVMamba,它配备了空间平面多向扫描和新颖的体素时间扫描机制。通过对事件流的时空信息进行编码和挖掘,我们的EVMamba在多个数据集上取得了良好的效果。数据集和源代码都将在(https://github.com/Event-AHU/CeleX-HAR)上发布。
{"title":"Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms","authors":"Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian","doi":"arxiv-2408.09764","DOIUrl":"https://doi.org/arxiv-2408.09764","url":null,"abstract":"Human Action Recognition (HAR) stands as a pivotal research domain in both\u0000computer vision and artificial intelligence, with RGB cameras dominating as the\u0000preferred tool for investigation and innovation in this field. However, in\u0000real-world applications, RGB cameras encounter numerous challenges, including\u0000light conditions, fast motion, and privacy concerns. Consequently, bio-inspired\u0000event cameras have garnered increasing attention due to their advantages of low\u0000energy consumption, high dynamic range, etc. Nevertheless, most existing\u0000event-based HAR datasets are low resolution ($346 times 260$). In this paper,\u0000we propose a large-scale, high-definition ($1280 times 800$) human action\u0000recognition dataset based on the CeleX-V event camera, termed CeleX-HAR. It\u0000encompasses 150 commonly occurring action categories, comprising a total of\u0000124,625 video sequences. Various factors such as multi-view, illumination,\u0000action speed, and occlusion are considered when recording these data. To build\u0000a more comprehensive benchmark dataset, we report over 20 mainstream HAR models\u0000for future works to compare. In addition, we also propose a novel Mamba vision\u0000backbone network for event stream based HAR, termed EVMamba, which equips the\u0000spatial plane multi-directional scanning and novel voxel temporal scanning\u0000mechanism. By encoding and mining the spatio-temporal information of event\u0000streams, our EVMamba has achieved favorable results across multiple datasets.\u0000Both the dataset and source code will be released on\u0000url{https://github.com/Event-AHU/CeleX-HAR}","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Population-based Search with Active Inference 用主动推理增强基于种群的搜索
Pub Date : 2024-08-18 DOI: arxiv-2408.09548
Nassim Dehouche, Daniel Friedman
The Active Inference framework models perception and action as a unifiedprocess, where agents use probabilistic models to predict and actively minimizesensory discrepancies. In complement and contrast, traditional population-basedmetaheuristics rely on reactive environmental interactions without anticipatoryadaptation. This paper proposes the integration of Active Inference into thesemetaheuristics to enhance performance through anticipatory environmentaladaptation. We demonstrate this approach specifically with Ant ColonyOptimization (ACO) on the Travelling Salesman Problem (TSP). Experimentalresults indicate that Active Inference can yield some improved solutions withonly a marginal increase in computational cost, with interesting patterns ofperformance that relate to number and topology of nodes in the graph. Furtherwork will characterize where and when different types of Active Inferenceaugmentation of population metaheuristics may be efficacious.
主动推理(Active Inference)框架将感知和行动作为一个统一的过程进行建模,其中代理使用概率模型进行预测,并主动将感知差异最小化。与之形成互补和对比的是,传统的基于种群的元启发式算法依赖于被动的环境互动,而不具备预期适应能力。本文提出将 "主动推理"(Active Inference)集成到元启发式算法中,通过预期环境适应来提高性能。我们在旅行推销员问题(TSP)的蚁群优化(ACO)中具体演示了这种方法。实验结果表明,主动推理可以产生一些改进的解决方案,而计算成本仅略有增加,其性能模式与图中节点的数量和拓扑结构有关。进一步的工作将描述不同类型的主动推理对群体元启发式算法的增强在何时何地可能有效。
{"title":"Enhancing Population-based Search with Active Inference","authors":"Nassim Dehouche, Daniel Friedman","doi":"arxiv-2408.09548","DOIUrl":"https://doi.org/arxiv-2408.09548","url":null,"abstract":"The Active Inference framework models perception and action as a unified\u0000process, where agents use probabilistic models to predict and actively minimize\u0000sensory discrepancies. In complement and contrast, traditional population-based\u0000metaheuristics rely on reactive environmental interactions without anticipatory\u0000adaptation. This paper proposes the integration of Active Inference into these\u0000metaheuristics to enhance performance through anticipatory environmental\u0000adaptation. We demonstrate this approach specifically with Ant Colony\u0000Optimization (ACO) on the Travelling Salesman Problem (TSP). Experimental\u0000results indicate that Active Inference can yield some improved solutions with\u0000only a marginal increase in computational cost, with interesting patterns of\u0000performance that relate to number and topology of nodes in the graph. Further\u0000work will characterize where and when different types of Active Inference\u0000augmentation of population metaheuristics may be efficacious.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization 论通过神经极化提高前向学习的泛化和稳定性
Pub Date : 2024-08-17 DOI: arxiv-2408.09210
Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas
Forward-only learning algorithms have recently gained attention asalternatives to gradient backpropagation, replacing the backward step of thislatter solver with an additional contrastive forward pass. Among theseapproaches, the so-called Forward-Forward Algorithm (FFA) has been shown toachieve competitive levels of performance in terms of generalization andcomplexity. Networks trained using FFA learn to contrastively maximize alayer-wise defined goodness score when presented with real data (denoted aspositive samples) and to minimize it when processing synthetic data (corr.negative samples). However, this algorithm still faces weaknesses thatnegatively affect the model accuracy and training stability, primarily due to agradient imbalance between positive and negative samples. To overcome thisissue, in this work we propose a novel implementation of the FFA algorithm,denoted as Polar-FFA, which extends the original formulation by introducing aneural division (emph{polarization}) between positive and negative instances.Neurons in each of these groups aim to maximize their goodness when presentedwith their respective data type, thereby creating a symmetric gradientbehavior. To empirically gauge the improved learning capabilities of ourproposed Polar-FFA, we perform several systematic experiments using differentactivation and goodness functions over image classification datasets. Ourresults demonstrate that Polar-FFA outperforms FFA in terms of accuracy andconvergence speed. Furthermore, its lower reliance on hyperparameters reducesthe need for hyperparameter tuning to guarantee optimal generalizationcapabilities, thereby allowing for a broader range of neural networkconfigurations.
作为梯度反向传播的替代方法,只向前学习算法最近备受关注,它以额外的对比性前向传递取代了梯度反向传播求解器的后向步骤。在这些算法中,所谓的前向算法(FFA)已被证明在泛化和复杂性方面达到了具有竞争力的性能水平。使用 FFA 训练的网络在处理真实数据(表示为阳性样本)时,会学习对比性地最大化按层定义的好度得分,而在处理合成数据(表示为阴性样本)时,会学习最小化好度得分。然而,这种算法仍然面临着一些弱点,对模型的准确性和训练稳定性造成了负面影响,这主要是由于正负样本之间的不平衡造成的。为了克服这一问题,我们在这项工作中提出了一种新的 FFA 算法实现方法,称为 Polar-FFA,该方法通过在正负实例之间引入神经划分(emph{polarization})对原始公式进行了扩展。为了从经验上衡量我们提出的 Polar-FFA 的改进学习能力,我们在图像分类数据集上使用不同的激活和良度函数进行了多次系统实验。结果表明,Polar-FFA 在准确性和收敛速度方面都优于 FFA。此外,Polar-FFA 对超参数的依赖性较低,减少了为保证最佳泛化能力而对超参数进行调整的需要,从而允许更广泛的神经网络配置。
{"title":"On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization","authors":"Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas","doi":"arxiv-2408.09210","DOIUrl":"https://doi.org/arxiv-2408.09210","url":null,"abstract":"Forward-only learning algorithms have recently gained attention as\u0000alternatives to gradient backpropagation, replacing the backward step of this\u0000latter solver with an additional contrastive forward pass. Among these\u0000approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to\u0000achieve competitive levels of performance in terms of generalization and\u0000complexity. Networks trained using FFA learn to contrastively maximize a\u0000layer-wise defined goodness score when presented with real data (denoted as\u0000positive samples) and to minimize it when processing synthetic data (corr.\u0000negative samples). However, this algorithm still faces weaknesses that\u0000negatively affect the model accuracy and training stability, primarily due to a\u0000gradient imbalance between positive and negative samples. To overcome this\u0000issue, in this work we propose a novel implementation of the FFA algorithm,\u0000denoted as Polar-FFA, which extends the original formulation by introducing a\u0000neural division (emph{polarization}) between positive and negative instances.\u0000Neurons in each of these groups aim to maximize their goodness when presented\u0000with their respective data type, thereby creating a symmetric gradient\u0000behavior. To empirically gauge the improved learning capabilities of our\u0000proposed Polar-FFA, we perform several systematic experiments using different\u0000activation and goodness functions over image classification datasets. Our\u0000results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and\u0000convergence speed. Furthermore, its lower reliance on hyperparameters reduces\u0000the need for hyperparameter tuning to guarantee optimal generalization\u0000capabilities, thereby allowing for a broader range of neural network\u0000configurations.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A theoretical framework for reservoir computing on networks of organic electrochemical transistors 有机电化学晶体管网络存储计算的理论框架
Pub Date : 2024-08-17 DOI: arxiv-2408.09223
Nicholas W. Landry, Beckett R. Hyde, Jake C. Perez, Sean E. Shaheen, Juan G. Restrepo
Efficient and accurate prediction of physical systems is important even whenthe rules of those systems cannot be easily learned. Reservoir computing, atype of recurrent neural network with fixed nonlinear units, is one suchprediction method and is valued for its ease of training. Organicelectrochemical transistors (OECTs) are physical devices with nonlineartransient properties that can be used as the nonlinear units of a reservoircomputer. We present a theoretical framework for simulating reservoir computersusing OECTs as the non-linear units as a test bed for designing physicalreservoir computers. We present a proof of concept demonstrating that such animplementation can accurately predict the Lorenz attractor with comparableperformance to standard reservoir computer implementations. We explore theeffect of operating parameters and find that the prediction performancestrongly depends on the pinch-off voltage of the OECTs.
对物理系统进行高效准确的预测非常重要,即使这些系统的规则不容易学习。储层计算是一种具有固定非线性单元的递归神经网络,就是这样一种预测方法,因其易于训练而备受推崇。有机电化学晶体管(OECTs)是一种具有非线性瞬态特性的物理器件,可用作储备计算的非线性单元。我们提出了一个模拟储层计算机的理论框架,将有机电化学晶体管作为非线性单元,作为设计物理储层计算机的试验平台。我们提出了一个概念验证,证明这种实施可以准确预测洛伦兹吸引子,其性能与标准水库计算机实施相当。我们探讨了操作参数的影响,发现预测性能在很大程度上取决于 OECTs 的掐断电压。
{"title":"A theoretical framework for reservoir computing on networks of organic electrochemical transistors","authors":"Nicholas W. Landry, Beckett R. Hyde, Jake C. Perez, Sean E. Shaheen, Juan G. Restrepo","doi":"arxiv-2408.09223","DOIUrl":"https://doi.org/arxiv-2408.09223","url":null,"abstract":"Efficient and accurate prediction of physical systems is important even when\u0000the rules of those systems cannot be easily learned. Reservoir computing, a\u0000type of recurrent neural network with fixed nonlinear units, is one such\u0000prediction method and is valued for its ease of training. Organic\u0000electrochemical transistors (OECTs) are physical devices with nonlinear\u0000transient properties that can be used as the nonlinear units of a reservoir\u0000computer. We present a theoretical framework for simulating reservoir computers\u0000using OECTs as the non-linear units as a test bed for designing physical\u0000reservoir computers. We present a proof of concept demonstrating that such an\u0000implementation can accurately predict the Lorenz attractor with comparable\u0000performance to standard reservoir computer implementations. We explore the\u0000effect of operating parameters and find that the prediction performance\u0000strongly depends on the pinch-off voltage of the OECTs.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward End-to-End Bearing Fault Diagnosis for Industrial Scenarios with Spiking Neural Networks 利用尖峰神经网络实现面向工业场景的端到端轴承故障诊断
Pub Date : 2024-08-17 DOI: arxiv-2408.11067
Yongqi Ding, Lin Zuo, Mengmeng Jing, Kunshan Yang, Biao Chen, Yunqian Yu
Spiking neural networks (SNNs) transmit information via low-power binaryspikes and have received widespread attention in areas such as computer visionand reinforcement learning. However, there have been very few explorations ofSNNs in more practical industrial scenarios. In this paper, we focus on theapplication of SNNs in bearing fault diagnosis to facilitate the integration ofhigh-performance AI algorithms and real-world industries. In particular, weidentify two key limitations of existing SNN fault diagnosis methods:inadequate encoding capacity that necessitates cumbersome data preprocessing,and non-spike-oriented architectures that constrain the performance of SNNs. Toalleviate these problems, we propose a Multi-scale Residual Attention SNN(MRA-SNN) to simultaneously improve the efficiency, performance, and robustnessof SNN methods. By incorporating a lightweight attention mechanism, we havedesigned a multi-scale attention encoding module to extract multiscale faultfeatures from vibration signals and encode them as spatio-temporal spikes,eliminating the need for complicated preprocessing. Then, the spike residualattention block extracts high-dimensional fault features and enhances theexpressiveness of sparse spikes with the attention mechanism for end-to-enddiagnosis. In addition, the performance and robustness of MRA-SNN is furtherenhanced by introducing the lightweight attention mechanism within the spikingneurons to simulate the biological dendritic filtering effect. Extensiveexperiments on MFPT and JNU benchmark datasets demonstrate that MRA-SNNsignificantly outperforms existing methods in terms of accuracy, energyconsumption and noise robustness, and is more feasible for deployment inreal-world industrial scenarios.
尖峰神经网络(SNN)通过低功耗二进制尖峰传递信息,在计算机视觉和强化学习等领域受到广泛关注。然而,在更实际的工业场景中,对 SNN 的探索却寥寥无几。在本文中,我们将重点讨论 SNN 在轴承故障诊断中的应用,以促进高性能人工智能算法与现实世界工业的融合。我们特别指出了现有 SNN 故障诊断方法的两个主要局限性:编码能力不足导致必须进行繁琐的数据预处理,以及非尖峰导向架构限制了 SNN 的性能。为了解决这些问题,我们提出了多尺度残差注意 SNN(MRA-SNN),以同时提高 SNN 方法的效率、性能和鲁棒性。通过采用轻量级注意机制,我们设计了一个多尺度注意编码模块,从振动信号中提取多尺度故障特征,并将其编码为时空尖峰,从而省去了复杂的预处理。然后,尖峰残差注意模块提取高维故障特征,并利用注意机制增强稀疏尖峰的可表达性,从而实现端到端诊断。此外,通过在尖峰神经元中引入轻量级注意机制来模拟生物树突过滤效应,进一步提高了 MRA-SNN 的性能和鲁棒性。在 MFPT 和 JNU 基准数据集上进行的广泛实验表明,MRA-SNN 在准确性、能耗和噪声鲁棒性方面明显优于现有方法,而且更适合部署在现实世界的工业场景中。
{"title":"Toward End-to-End Bearing Fault Diagnosis for Industrial Scenarios with Spiking Neural Networks","authors":"Yongqi Ding, Lin Zuo, Mengmeng Jing, Kunshan Yang, Biao Chen, Yunqian Yu","doi":"arxiv-2408.11067","DOIUrl":"https://doi.org/arxiv-2408.11067","url":null,"abstract":"Spiking neural networks (SNNs) transmit information via low-power binary\u0000spikes and have received widespread attention in areas such as computer vision\u0000and reinforcement learning. However, there have been very few explorations of\u0000SNNs in more practical industrial scenarios. In this paper, we focus on the\u0000application of SNNs in bearing fault diagnosis to facilitate the integration of\u0000high-performance AI algorithms and real-world industries. In particular, we\u0000identify two key limitations of existing SNN fault diagnosis methods:\u0000inadequate encoding capacity that necessitates cumbersome data preprocessing,\u0000and non-spike-oriented architectures that constrain the performance of SNNs. To\u0000alleviate these problems, we propose a Multi-scale Residual Attention SNN\u0000(MRA-SNN) to simultaneously improve the efficiency, performance, and robustness\u0000of SNN methods. By incorporating a lightweight attention mechanism, we have\u0000designed a multi-scale attention encoding module to extract multiscale fault\u0000features from vibration signals and encode them as spatio-temporal spikes,\u0000eliminating the need for complicated preprocessing. Then, the spike residual\u0000attention block extracts high-dimensional fault features and enhances the\u0000expressiveness of sparse spikes with the attention mechanism for end-to-end\u0000diagnosis. In addition, the performance and robustness of MRA-SNN is further\u0000enhanced by introducing the lightweight attention mechanism within the spiking\u0000neurons to simulate the biological dendritic filtering effect. Extensive\u0000experiments on MFPT and JNU benchmark datasets demonstrate that MRA-SNN\u0000significantly outperforms existing methods in terms of accuracy, energy\u0000consumption and noise robustness, and is more feasible for deployment in\u0000real-world industrial scenarios.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TACOS: Task Agnostic Continual Learning in Spiking Neural Networks TACOS:尖峰神经网络中与任务无关的持续学习
Pub Date : 2024-08-16 DOI: arxiv-2409.00021
Nicholas Soures, Peter Helfer, Anurag Daram, Tej Pandit, Dhireesha Kudithipudi
Catastrophic interference, the loss of previously learned information whenlearning new information, remains a major challenge in machine learning. Sinceliving organisms do not seem to suffer from this problem, researchers havetaken inspiration from biology to improve memory retention in artificialintelligence systems. However, previous attempts to use bio-inspired mechanismshave typically resulted in systems that rely on task boundary informationduring training and/or explicit task identification during inference,information that is not available in real-world scenarios. Here, we show thatneuro-inspired mechanisms such as synaptic consolidation and metaplasticity canmitigate catastrophic interference in a spiking neural network, using onlysynapse-local information, with no need for task awareness, and with a fixedmemory size that does not need to be increased when training on new tasks. Ourmodel, TACOS, combines neuromodulation with complex synaptic dynamics to enablenew learning while protecting previous information. We evaluate TACOS onsequential image recognition tasks and demonstrate its effectiveness inreducing catastrophic interference. Our results show that TACOS outperformsexisting regularization techniques in domain-incremental learning scenarios. Wealso report the results of an ablation study to elucidate the contribution ofeach neuro-inspired mechanism separately.
灾难性干扰,即在学习新信息时丢失以前学到的信息,仍然是机器学习中的一大挑战。生物似乎并不存在这个问题,因此研究人员从生物学中汲取灵感,改善人工智能系统的记忆保持能力。然而,以往使用生物启发机制的尝试通常会导致系统在训练过程中依赖于任务边界信息和/或在推理过程中依赖于明确的任务识别,而这些信息在现实世界中并不存在。在这里,我们展示了神经启发机制(如突触巩固和元弹性)可以缓解尖峰神经网络中的灾难性干扰,只需使用突触局部信息,无需任务感知,而且内存大小固定,在训练新任务时无需增加。我们的模型 TACOS 将神经调节与复杂的突触动态相结合,在保护先前信息的同时促进新的学习。我们在连续图像识别任务中对 TACOS 进行了评估,并证明了它在减少灾难性干扰方面的有效性。结果表明,TACOS 在领域递增学习场景中的表现优于现有的正则化技术。我们还报告了一项消融研究的结果,以分别阐明每种神经启发机制的贡献。
{"title":"TACOS: Task Agnostic Continual Learning in Spiking Neural Networks","authors":"Nicholas Soures, Peter Helfer, Anurag Daram, Tej Pandit, Dhireesha Kudithipudi","doi":"arxiv-2409.00021","DOIUrl":"https://doi.org/arxiv-2409.00021","url":null,"abstract":"Catastrophic interference, the loss of previously learned information when\u0000learning new information, remains a major challenge in machine learning. Since\u0000living organisms do not seem to suffer from this problem, researchers have\u0000taken inspiration from biology to improve memory retention in artificial\u0000intelligence systems. However, previous attempts to use bio-inspired mechanisms\u0000have typically resulted in systems that rely on task boundary information\u0000during training and/or explicit task identification during inference,\u0000information that is not available in real-world scenarios. Here, we show that\u0000neuro-inspired mechanisms such as synaptic consolidation and metaplasticity can\u0000mitigate catastrophic interference in a spiking neural network, using only\u0000synapse-local information, with no need for task awareness, and with a fixed\u0000memory size that does not need to be increased when training on new tasks. Our\u0000model, TACOS, combines neuromodulation with complex synaptic dynamics to enable\u0000new learning while protecting previous information. We evaluate TACOS on\u0000sequential image recognition tasks and demonstrate its effectiveness in\u0000reducing catastrophic interference. Our results show that TACOS outperforms\u0000existing regularization techniques in domain-incremental learning scenarios. We\u0000also report the results of an ablation study to elucidate the contribution of\u0000each neuro-inspired mechanism separately.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
$EvoAl^{2048}$ $EvoAl^{2048}$
Pub Date : 2024-08-15 DOI: arxiv-2408.16780
Bernhard J. BergerUniversity of Rostock, Software Engineering Chair Rostock, GermanyHamburg University of Technology, Institute of Embedded Systems, Germany, Christina PlumpDFKI - Cyber-Physical Systems Bremen, Germany, Rolf DrechslerUniversity of Bremen, Departments of Mathematics and Computer ScienceDFKI - Cyber-Physical Systems Bremen, Germany
As AI solutions enter safety-critical products, the explainability andinterpretability of solutions generated by AI products become increasinglyimportant. In the long term, such explanations are the key to gaining users'acceptance of AI-based systems' decisions. We report on applying amodel-driven-based optimisation to search for an interpretable and explainablepolicy that solves the game 2048. This paper describes a solution to theGECCO'24 Interpretable Control Competition using the open-source softwareEvoAl. We aimed to develop an approach for creating interpretable policies thatare easy to adapt to new ideas.
随着人工智能解决方案进入安全关键型产品,人工智能产品生成的解决方案的可解释性和可解读性变得越来越重要。从长远来看,这种解释是让用户接受人工智能系统决策的关键。我们报告了如何应用基于模型驱动的优化方法来寻找一种可解释和可解释的政策,以解决 2048 游戏。本文介绍了使用开源软件EvoAl为GECCO'24可解释控制竞赛(GECCO'24 Interpretable Control Competition)提供的解决方案。我们的目标是开发一种方法,用于创建易于适应新想法的可解释策略。
{"title":"$EvoAl^{2048}$","authors":"Bernhard J. BergerUniversity of Rostock, Software Engineering Chair Rostock, GermanyHamburg University of Technology, Institute of Embedded Systems, Germany, Christina PlumpDFKI - Cyber-Physical Systems Bremen, Germany, Rolf DrechslerUniversity of Bremen, Departments of Mathematics and Computer ScienceDFKI - Cyber-Physical Systems Bremen, Germany","doi":"arxiv-2408.16780","DOIUrl":"https://doi.org/arxiv-2408.16780","url":null,"abstract":"As AI solutions enter safety-critical products, the explainability and\u0000interpretability of solutions generated by AI products become increasingly\u0000important. In the long term, such explanations are the key to gaining users'\u0000acceptance of AI-based systems' decisions. We report on applying a\u0000model-driven-based optimisation to search for an interpretable and explainable\u0000policy that solves the game 2048. This paper describes a solution to the\u0000GECCO'24 Interpretable Control Competition using the open-source software\u0000EvoAl. We aimed to develop an approach for creating interpretable policies that\u0000are easy to adapt to new ideas.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Neural and Evolutionary Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1