首页 > 最新文献

arXiv - CS - Neural and Evolutionary Computing最新文献

英文 中文
PReLU: Yet Another Single-Layer Solution to the XOR Problem PReLU:XOR 问题的另一种单层解决方案
Pub Date : 2024-09-17 DOI: arxiv-2409.10821
Rafael C. Pinto, Anderson R. Tavares
This paper demonstrates that a single-layer neural network using ParametricRectified Linear Unit (PReLU) activation can solve the XOR problem, a simplefact that has been overlooked so far. We compare this solution to themulti-layer perceptron (MLP) and the Growing Cosine Unit (GCU) activationfunction and explain why PReLU enables this capability. Our results show thatthe single-layer PReLU network can achieve 100% success rate in a wider rangeof learning rates while using only three learnable parameters.
本文证明了使用参数线性单元(PReLU)激活的单层神经网络可以解决 XOR 问题,而这是一个迄今为止一直被忽视的简单问题。我们将这一解决方案与多层感知器(MLP)和增长余弦单元(GCU)激活功能进行了比较,并解释了为什么 PReLU 能够实现这一功能。我们的结果表明,单层 PReLU 网络可以在更宽的学习率范围内实现 100% 的成功率,同时只使用三个可学习参数。
{"title":"PReLU: Yet Another Single-Layer Solution to the XOR Problem","authors":"Rafael C. Pinto, Anderson R. Tavares","doi":"arxiv-2409.10821","DOIUrl":"https://doi.org/arxiv-2409.10821","url":null,"abstract":"This paper demonstrates that a single-layer neural network using Parametric\u0000Rectified Linear Unit (PReLU) activation can solve the XOR problem, a simple\u0000fact that has been overlooked so far. We compare this solution to the\u0000multi-layer perceptron (MLP) and the Growing Cosine Unit (GCU) activation\u0000function and explain why PReLU enables this capability. Our results show that\u0000the single-layer PReLU network can achieve 100% success rate in a wider range\u0000of learning rates while using only three learnable parameters.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferno: An Extensible Framework for Spiking Neural Networks 地狱尖峰神经网络的可扩展框架
Pub Date : 2024-09-17 DOI: arxiv-2409.11567
Marissa Dominijanni
This paper introduces Inferno, a software library built on top of PyTorchthat is designed to meet distinctive challenges of using spiking neuralnetworks (SNNs) for machine learning tasks. We describe the architecture ofInferno and key differentiators that make it uniquely well-suited to thesetasks. We show how Inferno supports trainable heterogeneous delays on both CPUsand GPUs, and how Inferno enables a "write once, apply everywhere" developmentmethodology for novel models and techniques. We compare Inferno's performanceto BindsNET, a library aimed at machine learning with SNNs, andBrian2/Brian2CUDA which is popular in neuroscience. Among several examples, weshow how the design decisions made by Inferno facilitate easily implementingthe new methods of Nadafian and Ganjtabesh in delay learning with spike-timingdependent plasticity.
本文介绍了 Inferno,这是一个建立在 PyTorch 基础上的软件库,旨在应对使用尖峰神经网络(SNN)完成机器学习任务所面临的独特挑战。我们描述了Inferno的架构以及使其能够独一无二地胜任这些任务的关键差异化因素。我们展示了Inferno如何在CPU和GPU上支持可训练的异构延迟,以及Inferno如何为新型模型和技术实现 "一次编写,随处应用 "的开发方法。我们将Inferno的性能与BindsNET和Brian2/Brian2CUDA进行了比较,BindsNET是一个针对使用SNN进行机器学习的库,而Brian2/Brian2CUDA则在神经科学领域非常流行。在几个例子中,我们展示了 Inferno 所做的设计决定是如何帮助轻松实现 Nadafian 和 Ganjtabesh 的新方法的,这些方法用于具有尖峰计时可塑性的延迟学习。
{"title":"Inferno: An Extensible Framework for Spiking Neural Networks","authors":"Marissa Dominijanni","doi":"arxiv-2409.11567","DOIUrl":"https://doi.org/arxiv-2409.11567","url":null,"abstract":"This paper introduces Inferno, a software library built on top of PyTorch\u0000that is designed to meet distinctive challenges of using spiking neural\u0000networks (SNNs) for machine learning tasks. We describe the architecture of\u0000Inferno and key differentiators that make it uniquely well-suited to these\u0000tasks. We show how Inferno supports trainable heterogeneous delays on both CPUs\u0000and GPUs, and how Inferno enables a \"write once, apply everywhere\" development\u0000methodology for novel models and techniques. We compare Inferno's performance\u0000to BindsNET, a library aimed at machine learning with SNNs, and\u0000Brian2/Brian2CUDA which is popular in neuroscience. Among several examples, we\u0000show how the design decisions made by Inferno facilitate easily implementing\u0000the new methods of Nadafian and Ganjtabesh in delay learning with spike-timing\u0000dependent plasticity.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models 受生物启发的曼巴:选择性状态空间模型中的时域性和可生物学习
Pub Date : 2024-09-17 DOI: arxiv-2409.11263
Jiahao Qin
This paper introduces Bio-Inspired Mamba (BIM), a novel online learningframework for selective state space models that integrates biological learningprinciples with the Mamba architecture. BIM combines Real-Time RecurrentLearning (RTRL) with Spike-Timing-Dependent Plasticity (STDP)-like locallearning rules, addressing the challenges of temporal locality and biologicalplausibility in training spiking neural networks. Our approach leverages theinherent connection between backpropagation through time and STDP, offering acomputationally efficient alternative that maintains the ability to capturelong-range dependencies. We evaluate BIM on language modeling, speechrecognition, and biomedical signal analysis tasks, demonstrating competitiveperformance against traditional methods while adhering to biological learningprinciples. Results show improved energy efficiency and potential forneuromorphic hardware implementation. BIM not only advances the field ofbiologically plausible machine learning but also provides insights into themechanisms of temporal information processing in biological neural networks.
本文介绍了生物启发曼巴(BIM),这是一种用于选择性状态空间模型的新型在线学习框架,它将生物学习原理与曼巴架构融为一体。BIM 将实时循环学习(Real-Time RecurrentLearning,RTRL)与类似于尖峰定时可塑性(Spike-Timing-Dependent Plasticity,STDP)的局部学习规则相结合,解决了尖峰神经网络训练中的时间局部性和生物可信性难题。我们的方法利用了时间反向传播和 STDP 之间的内在联系,提供了一种计算高效的替代方法,同时保持了捕捉长程依赖性的能力。我们在语言建模、语音识别和生物医学信号分析任务中对 BIM 进行了评估,结果表明,在遵循生物学习原理的同时,BIM 的性能与传统方法相比极具竞争力。结果表明,BIM 的能效得到了提高,并有可能实现超形态硬件。BIM 不仅推动了生物可信机器学习领域的发展,还为生物神经网络中的时间信息处理机制提供了新的见解。
{"title":"Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models","authors":"Jiahao Qin","doi":"arxiv-2409.11263","DOIUrl":"https://doi.org/arxiv-2409.11263","url":null,"abstract":"This paper introduces Bio-Inspired Mamba (BIM), a novel online learning\u0000framework for selective state space models that integrates biological learning\u0000principles with the Mamba architecture. BIM combines Real-Time Recurrent\u0000Learning (RTRL) with Spike-Timing-Dependent Plasticity (STDP)-like local\u0000learning rules, addressing the challenges of temporal locality and biological\u0000plausibility in training spiking neural networks. Our approach leverages the\u0000inherent connection between backpropagation through time and STDP, offering a\u0000computationally efficient alternative that maintains the ability to capture\u0000long-range dependencies. We evaluate BIM on language modeling, speech\u0000recognition, and biomedical signal analysis tasks, demonstrating competitive\u0000performance against traditional methods while adhering to biological learning\u0000principles. Results show improved energy efficiency and potential for\u0000neuromorphic hardware implementation. BIM not only advances the field of\u0000biologically plausible machine learning but also provides insights into the\u0000mechanisms of temporal information processing in biological neural networks.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Contrastive Forward-Forward Algorithm 自对焦前向算法
Pub Date : 2024-09-17 DOI: arxiv-2409.11593
Xing Chen, Dongshu Liu, Jeremie Laydevant, Julie Grollier
The Forward-Forward (FF) algorithm is a recent, purely forward-mode learningmethod, that updates weights locally and layer-wise and supports supervised aswell as unsupervised learning. These features make it ideal for applicationssuch as brain-inspired learning, low-power hardware neural networks, anddistributed learning in large models. However, while FF has shown promise onwritten digit recognition tasks, its performance on natural images andtime-series remains a challenge. A key limitation is the need to generatehigh-quality negative examples for contrastive learning, especially inunsupervised tasks, where versatile solutions are currently lacking. To addressthis, we introduce the Self-Contrastive Forward-Forward (SCFF) method, inspiredby self-supervised contrastive learning. SCFF generates positive and negativeexamples applicable across different datasets, surpassing existing localforward algorithms for unsupervised classification accuracy on MNIST (MLP:98.7%), CIFAR-10 (CNN: 80.75%), and STL-10 (CNN: 77.3%). Additionally, SCFF isthe first to enable FF training of recurrent neural networks, opening the doorto more complex tasks and continuous-time video and text processing.
前向-前向(FF)算法是一种最新的纯前向模式学习方法,它在局部和层上更新权重,支持有监督和无监督学习。这些特点使其成为大脑启发学习、低功耗硬件神经网络和大型模型分布式学习等应用的理想选择。然而,虽然 FF 在书面数字识别任务中表现出了良好的前景,但它在自然图像和时间序列上的表现仍然是一个挑战。一个关键的限制因素是需要为对比学习生成高质量的负面示例,特别是在无监督任务中,目前还缺乏通用的解决方案。为了解决这个问题,我们从自我监督对比学习中汲取灵感,引入了自对比前向(SCFF)方法。SCFF 可生成适用于不同数据集的正负样本,在 MNIST(MLP:98.7%)、CIFAR-10(CNN:80.75%)和 STL-10(CNN:77.3%)上的无监督分类准确率超过了现有的局部前向算法。此外,SCFF 还首次实现了循环神经网络的 FF 训练,为更复杂的任务以及连续时间视频和文本处理打开了大门。
{"title":"Self-Contrastive Forward-Forward Algorithm","authors":"Xing Chen, Dongshu Liu, Jeremie Laydevant, Julie Grollier","doi":"arxiv-2409.11593","DOIUrl":"https://doi.org/arxiv-2409.11593","url":null,"abstract":"The Forward-Forward (FF) algorithm is a recent, purely forward-mode learning\u0000method, that updates weights locally and layer-wise and supports supervised as\u0000well as unsupervised learning. These features make it ideal for applications\u0000such as brain-inspired learning, low-power hardware neural networks, and\u0000distributed learning in large models. However, while FF has shown promise on\u0000written digit recognition tasks, its performance on natural images and\u0000time-series remains a challenge. A key limitation is the need to generate\u0000high-quality negative examples for contrastive learning, especially in\u0000unsupervised tasks, where versatile solutions are currently lacking. To address\u0000this, we introduce the Self-Contrastive Forward-Forward (SCFF) method, inspired\u0000by self-supervised contrastive learning. SCFF generates positive and negative\u0000examples applicable across different datasets, surpassing existing local\u0000forward algorithms for unsupervised classification accuracy on MNIST (MLP:\u000098.7%), CIFAR-10 (CNN: 80.75%), and STL-10 (CNN: 77.3%). Additionally, SCFF is\u0000the first to enable FF training of recurrent neural networks, opening the door\u0000to more complex tasks and continuous-time video and text processing.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection 评估延迟标签环境下实例增量学习与批量学习的效果:用于欺诈检测的表格数据流实证研究
Pub Date : 2024-09-16 DOI: arxiv-2409.10111
Kodjo Mawuena Amekoe, Mustapha Lebbah, Gregoire Jaffre, Hanene Azzag, Zaineb Chelly Dagdia
Real-world tabular learning production scenarios typically involve evolvingdata streams, where data arrives continuously and its distribution may changeover time. In such a setting, most studies in the literature regardingsupervised learning favor the use of instance incremental algorithms due totheir ability to adapt to changes in the data distribution. Another significantreason for choosing these algorithms is textit{avoid storing observations inmemory} as commonly done in batch incremental settings. However, the design ofinstance incremental algorithms often assumes immediate availability of labels,which is an optimistic assumption. In many real-world scenarios, such as frauddetection or credit scoring, labels may be delayed. Consequently, batchincremental algorithms are widely used in many real-world tasks. This raises animportant question: "In delayed settings, is instance incremental learning thebest option regarding predictive performance and computational efficiency?"Unfortunately, this question has not been studied in depth, probably due to thescarcity of real datasets containing delayed information. In this study, weconduct a comprehensive empirical evaluation and analysis of this questionusing a real-world fraud detection problem and commonly used generateddatasets. Our findings indicate that instance incremental learning is not thesuperior option, considering on one side state-of-the-art models such asAdaptive Random Forest (ARF) and other side batch learning models such asXGBoost. Additionally, when considering the interpretability of the learningsystems, batch incremental solutions tend to be favored. Code:url{https://github.com/anselmeamekoe/DelayedLabelStream}
现实世界中的表格学习生产场景通常涉及不断发展的数据流,其中数据不断到达,其分布可能随时间发生变化。在这种情况下,大多数关于监督学习的文献研究都倾向于使用实例增量算法,因为它们能够适应数据分布的变化。选择这些算法的另一个重要原因是,它们可以避免将观察结果存储在内存中,而批量增量算法通常就是这样做的。然而,实例增量算法的设计通常假设标签立即可用,这是一个乐观的假设。在现实世界的许多场景中,如欺诈检测或信用评分,标签可能会延迟。因此,批量递增算法被广泛应用于许多实际任务中。这就提出了一个重要问题:"不幸的是,这个问题还没有得到深入研究,这可能是由于包含延迟信息的真实数据集非常稀少。在本研究中,我们利用现实世界中的欺诈检测问题和常用的生成数据集对这一问题进行了全面的实证评估和分析。我们的研究结果表明,考虑到自适应随机森林(ARF)等最先进的模型和 XGBoost 等批量学习模型,实例增量学习并不是更优的选择。此外,考虑到学习系统的可解释性,批量增量解决方案往往更受青睐。代码:url{https://github.com/anselmeamekoe/DelayedLabelStream}
{"title":"Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection","authors":"Kodjo Mawuena Amekoe, Mustapha Lebbah, Gregoire Jaffre, Hanene Azzag, Zaineb Chelly Dagdia","doi":"arxiv-2409.10111","DOIUrl":"https://doi.org/arxiv-2409.10111","url":null,"abstract":"Real-world tabular learning production scenarios typically involve evolving\u0000data streams, where data arrives continuously and its distribution may change\u0000over time. In such a setting, most studies in the literature regarding\u0000supervised learning favor the use of instance incremental algorithms due to\u0000their ability to adapt to changes in the data distribution. Another significant\u0000reason for choosing these algorithms is textit{avoid storing observations in\u0000memory} as commonly done in batch incremental settings. However, the design of\u0000instance incremental algorithms often assumes immediate availability of labels,\u0000which is an optimistic assumption. In many real-world scenarios, such as fraud\u0000detection or credit scoring, labels may be delayed. Consequently, batch\u0000incremental algorithms are widely used in many real-world tasks. This raises an\u0000important question: \"In delayed settings, is instance incremental learning the\u0000best option regarding predictive performance and computational efficiency?\"\u0000Unfortunately, this question has not been studied in depth, probably due to the\u0000scarcity of real datasets containing delayed information. In this study, we\u0000conduct a comprehensive empirical evaluation and analysis of this question\u0000using a real-world fraud detection problem and commonly used generated\u0000datasets. Our findings indicate that instance incremental learning is not the\u0000superior option, considering on one side state-of-the-art models such as\u0000Adaptive Random Forest (ARF) and other side batch learning models such as\u0000XGBoost. Additionally, when considering the interpretability of the learning\u0000systems, batch incremental solutions tend to be favored. Code:\u0000url{https://github.com/anselmeamekoe/DelayedLabelStream}","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kolmogorov-Arnold Transformer 柯尔莫哥洛夫-阿诺德变换器
Pub Date : 2024-09-16 DOI: arxiv-2409.10594
Xingyi Yang, Xinchao Wang
Transformers stand as the cornerstone of mordern deep learning.Traditionally, these models rely on multi-layer perceptron (MLP) layers to mixthe information between channels. In this paper, we introduce theKolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLPlayers with Kolmogorov-Arnold Network (KAN) layers to enhance theexpressiveness and performance of the model. Integrating KANs intotransformers, however, is no easy feat, especially when scaled up.Specifically, we identify three key challenges: (C1) Base function. Thestandard B-spline function used in KANs is not optimized for parallel computingon modern hardware, resulting in slower inference speeds. (C2) Parameter andComputation Inefficiency. KAN requires a unique function for each input-outputpair, making the computation extremely large. (C3) Weight initialization. Theinitialization of weights in KANs is particularly challenging due to theirlearnable activation functions, which are critical for achieving convergence indeep neural networks. To overcome the aforementioned challenges, we proposethree key solutions: (S1) Rational basis. We replace B-spline functions withrational functions to improve compatibility with modern GPUs. By implementingthis in CUDA, we achieve faster computations. (S2) Group KAN. We share theactivation weights through a group of neurons, to reduce the computational loadwithout sacrificing performance. (S3) Variance-preserving initialization. Wecarefully initialize the activation weights to make sure that the activationvariance is maintained across layers. With these designs, KAT scaleseffectively and readily outperforms traditional MLP-based transformers.
变压器是现代深度学习的基石。传统上,这些模型依靠多层感知器(MLP)层来混合通道之间的信息。在本文中,我们介绍了柯尔莫哥洛夫-阿诺德变换器(KAT),这是一种新颖的架构,用柯尔莫哥洛夫-阿诺德网络(KAN)层取代了 MLP 层,从而提高了模型的可执行性和性能。然而,将 KAN 集成到转换器中并非易事,尤其是在扩大规模时。具体而言,我们发现了三个关键挑战:(C1)基础函数。KANs 中使用的标准 B-样条函数没有针对现代硬件的并行计算进行优化,导致推断速度较慢。(C2) 参数和计算效率低下。KAN 要求每个输入输出对都使用唯一的函数,这使得计算量极大。(C3) 权重初始化。KAN 中权重的初始化尤其具有挑战性,因为其激活函数是可学习的,而激活函数是实现深度神经网络收敛的关键。为了克服上述挑战,我们提出了三个主要解决方案:(S1)有理基础。我们用有理函数取代 B-样条函数,以提高与现代 GPU 的兼容性。通过在 CUDA 中实施,我们实现了更快的计算速度。(S2) 组 KAN。我们通过一组神经元共享激活权重,在不牺牲性能的情况下减少计算负荷。(S3) 保留方差的初始化。我们精心初始化激活权重,以确保各层之间保持激活方差。有了这些设计,KAT 可以有效地扩展,并轻松超越基于 MLP 的传统转换器。
{"title":"Kolmogorov-Arnold Transformer","authors":"Xingyi Yang, Xinchao Wang","doi":"arxiv-2409.10594","DOIUrl":"https://doi.org/arxiv-2409.10594","url":null,"abstract":"Transformers stand as the cornerstone of mordern deep learning.\u0000Traditionally, these models rely on multi-layer perceptron (MLP) layers to mix\u0000the information between channels. In this paper, we introduce the\u0000Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLP\u0000layers with Kolmogorov-Arnold Network (KAN) layers to enhance the\u0000expressiveness and performance of the model. Integrating KANs into\u0000transformers, however, is no easy feat, especially when scaled up.\u0000Specifically, we identify three key challenges: (C1) Base function. The\u0000standard B-spline function used in KANs is not optimized for parallel computing\u0000on modern hardware, resulting in slower inference speeds. (C2) Parameter and\u0000Computation Inefficiency. KAN requires a unique function for each input-output\u0000pair, making the computation extremely large. (C3) Weight initialization. The\u0000initialization of weights in KANs is particularly challenging due to their\u0000learnable activation functions, which are critical for achieving convergence in\u0000deep neural networks. To overcome the aforementioned challenges, we propose\u0000three key solutions: (S1) Rational basis. We replace B-spline functions with\u0000rational functions to improve compatibility with modern GPUs. By implementing\u0000this in CUDA, we achieve faster computations. (S2) Group KAN. We share the\u0000activation weights through a group of neurons, to reduce the computational load\u0000without sacrificing performance. (S3) Variance-preserving initialization. We\u0000carefully initialize the activation weights to make sure that the activation\u0000variance is maintained across layers. With these designs, KAT scales\u0000effectively and readily outperforms traditional MLP-based transformers.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"105 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Steinmetz Neural Networks for Complex-Valued Data 用于复值数据的 Steinmetz 神经网络
Pub Date : 2024-09-16 DOI: arxiv-2409.10075
Shyam Venkatasubramanian, Ali Pezeshki, Vahid Tarokh
In this work, we introduce a new approach to processing complex-valued datausing DNNs consisting of parallel real-valued subnetworks with coupled outputs.Our proposed class of architectures, referred to as Steinmetz Neural Networks,leverages multi-view learning to construct more interpretable representationswithin the latent space. Subsequently, we present the Analytic Neural Network,which implements a consistency penalty that encourages analytic signalrepresentations in the Steinmetz neural network's latent space. This penaltyenforces a deterministic and orthogonal relationship between the real andimaginary components. Utilizing an information-theoretic construction, wedemonstrate that the upper bound on the generalization error posited by theanalytic neural network is lower than that of the general class of Steinmetzneural networks. Our numerical experiments demonstrate the improved performanceand robustness to additive noise, afforded by our proposed networks onbenchmark datasets and synthetic examples.
在这项工作中,我们介绍了一种处理复值数据的新方法,即使用由具有耦合输出的并行实值子网络组成的 DNN。我们提出的这一类架构被称为 Steinmetz 神经网络,它利用多视角学习在潜在空间中构建更多可解释的表示。随后,我们提出了分析神经网络,它实施了一种一致性惩罚,鼓励在 Steinmetz 神经网络的潜在空间中进行分析信号表示。这种惩罚加强了实分量和虚分量之间的确定性和正交关系。利用信息论结构,我们证明了分析神经网络假设的泛化误差上限低于一般的斯坦梅茨神经网络。我们的数值实验证明,我们提出的网络在基准数据集和合成示例上具有更高的性能和对加性噪声的鲁棒性。
{"title":"Steinmetz Neural Networks for Complex-Valued Data","authors":"Shyam Venkatasubramanian, Ali Pezeshki, Vahid Tarokh","doi":"arxiv-2409.10075","DOIUrl":"https://doi.org/arxiv-2409.10075","url":null,"abstract":"In this work, we introduce a new approach to processing complex-valued data\u0000using DNNs consisting of parallel real-valued subnetworks with coupled outputs.\u0000Our proposed class of architectures, referred to as Steinmetz Neural Networks,\u0000leverages multi-view learning to construct more interpretable representations\u0000within the latent space. Subsequently, we present the Analytic Neural Network,\u0000which implements a consistency penalty that encourages analytic signal\u0000representations in the Steinmetz neural network's latent space. This penalty\u0000enforces a deterministic and orthogonal relationship between the real and\u0000imaginary components. Utilizing an information-theoretic construction, we\u0000demonstrate that the upper bound on the generalization error posited by the\u0000analytic neural network is lower than that of the general class of Steinmetz\u0000neural networks. Our numerical experiments demonstrate the improved performance\u0000and robustness to additive noise, afforded by our proposed networks on\u0000benchmark datasets and synthetic examples.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning SHIRE:在强化学习中利用人类直觉提高采样效率
Pub Date : 2024-09-16 DOI: arxiv-2409.09990
Amogh Joshi, Adarsh Kumar Kosta, Kaushik Roy
The ability of neural networks to perform robotic perception and controltasks such as depth and optical flow estimation, simultaneous localization andmapping (SLAM), and automatic control has led to their widespread adoption inrecent years. Deep Reinforcement Learning has been used extensively in thesesettings, as it does not have the unsustainable training costs associated withsupervised learning. However, DeepRL suffers from poor sample efficiency, i.e.,it requires a large number of environmental interactions to converge to anacceptable solution. Modern RL algorithms such as Deep Q Learning and SoftActor-Critic attempt to remedy this shortcoming but can not provide theexplainability required in applications such as autonomous robotics. Humansintuitively understand the long-time-horizon sequential tasks common inrobotics. Properly using such intuition can make RL policies more explainablewhile enhancing their sample efficiency. In this work, we propose SHIRE, anovel framework for encoding human intuition using Probabilistic GraphicalModels (PGMs) and using it in the Deep RL training pipeline to enhance sampleefficiency. Our framework achieves 25-78% sample efficiency gains across theenvironments we evaluate at negligible overhead cost. Additionally, by teachingRL agents the encoded elementary behavior, SHIRE enhances policyexplainability. A real-world demonstration further highlights the efficacy ofpolicies trained using our framework.
神经网络能够执行深度和光流估计、同步定位和映射(SLAM)以及自动控制等机器人感知和控制任务,因此近年来被广泛采用。深度强化学习(Deep Reinforcement Learning)在这些环境中得到了广泛应用,因为它不存在与监督学习相关的不可持续的训练成本。然而,深度强化学习的采样效率较低,也就是说,它需要大量的环境交互才能收敛到可接受的解决方案。Deep Q Learning 和 SoftActor-Critic 等现代 RL 算法试图弥补这一缺陷,但无法提供自主机器人等应用所需的可解释性。人类凭直觉就能理解机器人技术中常见的长时间跨度顺序任务。适当利用这种直觉可以使 RL 策略更具可解释性,同时提高其采样效率。在这项工作中,我们提出了一个新的框架--SHIRE,用于使用概率图形模型(PGM)对人类直觉进行编码,并将其用于深度 RL 训练管道以提高采样效率。在我们评估的环境中,我们的框架以可忽略不计的开销成本实现了 25-78% 的样本效率提升。此外,通过向 RL 代理教授编码的基本行为,SHIRE 增强了政策的可解释性。现实世界的演示进一步凸显了使用我们的框架训练出的政策的有效性。
{"title":"SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning","authors":"Amogh Joshi, Adarsh Kumar Kosta, Kaushik Roy","doi":"arxiv-2409.09990","DOIUrl":"https://doi.org/arxiv-2409.09990","url":null,"abstract":"The ability of neural networks to perform robotic perception and control\u0000tasks such as depth and optical flow estimation, simultaneous localization and\u0000mapping (SLAM), and automatic control has led to their widespread adoption in\u0000recent years. Deep Reinforcement Learning has been used extensively in these\u0000settings, as it does not have the unsustainable training costs associated with\u0000supervised learning. However, DeepRL suffers from poor sample efficiency, i.e.,\u0000it requires a large number of environmental interactions to converge to an\u0000acceptable solution. Modern RL algorithms such as Deep Q Learning and Soft\u0000Actor-Critic attempt to remedy this shortcoming but can not provide the\u0000explainability required in applications such as autonomous robotics. Humans\u0000intuitively understand the long-time-horizon sequential tasks common in\u0000robotics. Properly using such intuition can make RL policies more explainable\u0000while enhancing their sample efficiency. In this work, we propose SHIRE, a\u0000novel framework for encoding human intuition using Probabilistic Graphical\u0000Models (PGMs) and using it in the Deep RL training pipeline to enhance sample\u0000efficiency. Our framework achieves 25-78% sample efficiency gains across the\u0000environments we evaluate at negligible overhead cost. Additionally, by teaching\u0000RL agents the encoded elementary behavior, SHIRE enhances policy\u0000explainability. A real-world demonstration further highlights the efficacy of\u0000policies trained using our framework.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COSCO: A Sharpness-Aware Training Framework for Few-shot Multivariate Time Series Classification COSCO:用于少镜头多变量时间序列分类的锐度感知训练框架
Pub Date : 2024-09-15 DOI: arxiv-2409.09645
Jesus Barreda, Ashley Gomez, Ruben Puga, Kaixiong Zhou, Li Zhang
Multivariate time series classification is an important task with widespreaddomains of applications. Recently, deep neural networks (DNN) have achievedstate-of-the-art performance in time series classification. However, they oftenrequire large expert-labeled training datasets which can be infeasible inpractice. In few-shot settings, i.e. only a limited number of samples per classare available in training data, DNNs show a significant drop in testingaccuracy and poor generalization ability. In this paper, we propose to addressthese problems from an optimization and a loss function perspective.Specifically, we propose a new learning framework named COSCO consisting of asharpness-aware minimization (SAM) optimization and a Prototypical lossfunction to improve the generalization ability of DNN for multivariate timeseries classification problems under few-shot setting. Our experimentsdemonstrate our proposed method outperforms the existing baseline methods. Oursource code is available at: https://github.com/JRB9/COSCO.
多变量时间序列分类是一项应用领域广泛的重要任务。最近,深度神经网络(DNN)在时间序列分类方面取得了最先进的性能。然而,它们通常需要大量专家标注的训练数据集,这在实践中是不可行的。在少数几个样本的情况下,即每个类别只有有限数量的样本作为训练数据,DNNs 的测试精度会显著下降,泛化能力也很差。具体来说,我们提出了一种名为 COSCO 的新学习框架,该框架由锐利度感知最小化(SAM)优化和原型损失函数组成,用于提高 DNN 在少样本设置下对多变量时间序列分类问题的泛化能力。实验证明,我们提出的方法优于现有的基线方法。我们的源代码可在以下网址获取:https://github.com/JRB9/COSCO。
{"title":"COSCO: A Sharpness-Aware Training Framework for Few-shot Multivariate Time Series Classification","authors":"Jesus Barreda, Ashley Gomez, Ruben Puga, Kaixiong Zhou, Li Zhang","doi":"arxiv-2409.09645","DOIUrl":"https://doi.org/arxiv-2409.09645","url":null,"abstract":"Multivariate time series classification is an important task with widespread\u0000domains of applications. Recently, deep neural networks (DNN) have achieved\u0000state-of-the-art performance in time series classification. However, they often\u0000require large expert-labeled training datasets which can be infeasible in\u0000practice. In few-shot settings, i.e. only a limited number of samples per class\u0000are available in training data, DNNs show a significant drop in testing\u0000accuracy and poor generalization ability. In this paper, we propose to address\u0000these problems from an optimization and a loss function perspective.\u0000Specifically, we propose a new learning framework named COSCO consisting of a\u0000sharpness-aware minimization (SAM) optimization and a Prototypical loss\u0000function to improve the generalization ability of DNN for multivariate time\u0000series classification problems under few-shot setting. Our experiments\u0000demonstrate our proposed method outperforms the existing baseline methods. Our\u0000source code is available at: https://github.com/JRB9/COSCO.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classification TX-Gen:时间序列分类稀疏反事实解释的多目标优化
Pub Date : 2024-09-14 DOI: arxiv-2409.09461
Qi Huang, Sofoklis Kitharidis, Thomas Bäck, Niki van Stein
In time-series classification, understanding model decisions is crucial fortheir application in high-stakes domains such as healthcare and finance.Counterfactual explanations, which provide insights by presenting alternativeinputs that change model predictions, offer a promising solution. However,existing methods for generating counterfactual explanations for time-seriesdata often struggle with balancing key objectives like proximity, sparsity, andvalidity. In this paper, we introduce TX-Gen, a novel algorithm for generatingcounterfactual explanations based on the Non-dominated Sorting GeneticAlgorithm II (NSGA-II). TX-Gen leverages evolutionary multi-objectiveoptimization to find a diverse set of counterfactuals that are both sparse andvalid, while maintaining minimal dissimilarity to the original time series. Byincorporating a flexible reference-guided mechanism, our method improves theplausibility and interpretability of the counterfactuals without relying onpredefined assumptions. Extensive experiments on benchmark datasets demonstratethat TX-Gen outperforms existing methods in generating high-qualitycounterfactuals, making time-series models more transparent and interpretable.
在时间序列分类中,理解模型的决策对其在医疗保健和金融等高风险领域的应用至关重要。反事实解释通过提出改变模型预测的替代输入来提供洞察力,提供了一种有前途的解决方案。然而,现有的为时间序列数据生成反事实解释的方法往往难以在接近性、稀疏性和有效性等关键目标之间取得平衡。本文介绍了 TX-Gen,这是一种基于非优势排序遗传算法 II(NSGA-II)的生成反事实解释的新型算法。TX-Gen 利用进化式多目标优化找到了一组既稀疏又有效的多样化反事实,同时保持了与原始时间序列的最小相似性。通过结合灵活的参考引导机制,我们的方法提高了反事实的可信度和可解释性,而无需依赖预先定义的假设。在基准数据集上进行的大量实验证明,TX-Gen 在生成高质量反事实方面优于现有方法,从而使时间序列模型更加透明和可解释。
{"title":"TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classification","authors":"Qi Huang, Sofoklis Kitharidis, Thomas Bäck, Niki van Stein","doi":"arxiv-2409.09461","DOIUrl":"https://doi.org/arxiv-2409.09461","url":null,"abstract":"In time-series classification, understanding model decisions is crucial for\u0000their application in high-stakes domains such as healthcare and finance.\u0000Counterfactual explanations, which provide insights by presenting alternative\u0000inputs that change model predictions, offer a promising solution. However,\u0000existing methods for generating counterfactual explanations for time-series\u0000data often struggle with balancing key objectives like proximity, sparsity, and\u0000validity. In this paper, we introduce TX-Gen, a novel algorithm for generating\u0000counterfactual explanations based on the Non-dominated Sorting Genetic\u0000Algorithm II (NSGA-II). TX-Gen leverages evolutionary multi-objective\u0000optimization to find a diverse set of counterfactuals that are both sparse and\u0000valid, while maintaining minimal dissimilarity to the original time series. By\u0000incorporating a flexible reference-guided mechanism, our method improves the\u0000plausibility and interpretability of the counterfactuals without relying on\u0000predefined assumptions. Extensive experiments on benchmark datasets demonstrate\u0000that TX-Gen outperforms existing methods in generating high-quality\u0000counterfactuals, making time-series models more transparent and interpretable.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"190 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142249056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Neural and Evolutionary Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1