This paper introduces an analog spiking neuron that utilizes time-domain information, i.e., a time interval of two signal transitions and a pulse width, to construct a spiking neural network (SNN) for a hardware-friendly physical reservoir computing (RC) on a complementary metal-oxide-semiconductor (CMOS) platform. A neuron with leaky integrate-and-fire is realized by employing two voltage-controlled oscillators (VCOs) with opposite sensitivities to the internal control voltage, and the neuron connection structure is restricted by the use of only 4 neighboring neurons on the 2-dimensional plane to feasibly construct a regular network topology. Such a system enables us to compose an SNN with a counter-based readout circuit, which simplifies the hardware implementation of the SNN. Moreover, another technical advantage thanks to the bottom-up integration is the capability of dynamically capturing every neuron state in the network, which can significantly contribute to finding guidelines on how to enhance the performance for various computational tasks in temporal information processing. Diverse nonlinear physical dynamics needed for RC can be realized by collective behavior through dynamic interaction between neurons, like coupled oscillators, despite the simple network structure. With behavioral system-level simulations, we demonstrate physical RC through short-term memory and exclusive OR tasks, and the spoken digit recognition task with an accuracy of 97.7% as well. Our system is considerably feasible for practical applications and also can be a useful platform for studying the mechanism of physical RC.
{"title":"Hardware-Friendly Implementation of Physical Reservoir Computing with CMOS-based Time-domain Analog Spiking Neurons","authors":"Nanako Kimura, Ckristian Duran, Zolboo Byambadorj, Ryosho Nakane, Tetsuya Iizuka","doi":"arxiv-2409.11612","DOIUrl":"https://doi.org/arxiv-2409.11612","url":null,"abstract":"This paper introduces an analog spiking neuron that utilizes time-domain\u0000information, i.e., a time interval of two signal transitions and a pulse width,\u0000to construct a spiking neural network (SNN) for a hardware-friendly physical\u0000reservoir computing (RC) on a complementary metal-oxide-semiconductor (CMOS)\u0000platform. A neuron with leaky integrate-and-fire is realized by employing two\u0000voltage-controlled oscillators (VCOs) with opposite sensitivities to the\u0000internal control voltage, and the neuron connection structure is restricted by\u0000the use of only 4 neighboring neurons on the 2-dimensional plane to feasibly\u0000construct a regular network topology. Such a system enables us to compose an\u0000SNN with a counter-based readout circuit, which simplifies the hardware\u0000implementation of the SNN. Moreover, another technical advantage thanks to the\u0000bottom-up integration is the capability of dynamically capturing every neuron\u0000state in the network, which can significantly contribute to finding guidelines\u0000on how to enhance the performance for various computational tasks in temporal\u0000information processing. Diverse nonlinear physical dynamics needed for RC can\u0000be realized by collective behavior through dynamic interaction between neurons,\u0000like coupled oscillators, despite the simple network structure. With behavioral\u0000system-level simulations, we demonstrate physical RC through short-term memory\u0000and exclusive OR tasks, and the spoken digit recognition task with an accuracy\u0000of 97.7% as well. Our system is considerably feasible for practical\u0000applications and also can be a useful platform for studying the mechanism of\u0000physical RC.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper demonstrates that a single-layer neural network using Parametric Rectified Linear Unit (PReLU) activation can solve the XOR problem, a simple fact that has been overlooked so far. We compare this solution to the multi-layer perceptron (MLP) and the Growing Cosine Unit (GCU) activation function and explain why PReLU enables this capability. Our results show that the single-layer PReLU network can achieve 100% success rate in a wider range of learning rates while using only three learnable parameters.
{"title":"PReLU: Yet Another Single-Layer Solution to the XOR Problem","authors":"Rafael C. Pinto, Anderson R. Tavares","doi":"arxiv-2409.10821","DOIUrl":"https://doi.org/arxiv-2409.10821","url":null,"abstract":"This paper demonstrates that a single-layer neural network using Parametric\u0000Rectified Linear Unit (PReLU) activation can solve the XOR problem, a simple\u0000fact that has been overlooked so far. We compare this solution to the\u0000multi-layer perceptron (MLP) and the Growing Cosine Unit (GCU) activation\u0000function and explain why PReLU enables this capability. Our results show that\u0000the single-layer PReLU network can achieve 100% success rate in a wider range\u0000of learning rates while using only three learnable parameters.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces Inferno, a software library built on top of PyTorch that is designed to meet distinctive challenges of using spiking neural networks (SNNs) for machine learning tasks. We describe the architecture of Inferno and key differentiators that make it uniquely well-suited to these tasks. We show how Inferno supports trainable heterogeneous delays on both CPUs and GPUs, and how Inferno enables a "write once, apply everywhere" development methodology for novel models and techniques. We compare Inferno's performance to BindsNET, a library aimed at machine learning with SNNs, and Brian2/Brian2CUDA which is popular in neuroscience. Among several examples, we show how the design decisions made by Inferno facilitate easily implementing the new methods of Nadafian and Ganjtabesh in delay learning with spike-timing dependent plasticity.
{"title":"Inferno: An Extensible Framework for Spiking Neural Networks","authors":"Marissa Dominijanni","doi":"arxiv-2409.11567","DOIUrl":"https://doi.org/arxiv-2409.11567","url":null,"abstract":"This paper introduces Inferno, a software library built on top of PyTorch\u0000that is designed to meet distinctive challenges of using spiking neural\u0000networks (SNNs) for machine learning tasks. We describe the architecture of\u0000Inferno and key differentiators that make it uniquely well-suited to these\u0000tasks. We show how Inferno supports trainable heterogeneous delays on both CPUs\u0000and GPUs, and how Inferno enables a \"write once, apply everywhere\" development\u0000methodology for novel models and techniques. We compare Inferno's performance\u0000to BindsNET, a library aimed at machine learning with SNNs, and\u0000Brian2/Brian2CUDA which is popular in neuroscience. Among several examples, we\u0000show how the design decisions made by Inferno facilitate easily implementing\u0000the new methods of Nadafian and Ganjtabesh in delay learning with spike-timing\u0000dependent plasticity.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial Neural Networks (ANNs) have significantly advanced various fields by effectively recognizing patterns and solving complex problems. Despite these advancements, their interpretability remains a critical challenge, especially in applications where transparency and accountability are essential. To address this, explainable AI (XAI) has made progress in demystifying ANNs, yet interpretability alone is often insufficient. In certain applications, model predictions must align with expert-imposed requirements, sometimes exemplified by partial monotonicity constraints. While monotonic approaches are found in the literature for traditional Multi-layer Perceptrons (MLPs), they still face difficulties in achieving both interpretability and certified partial monotonicity. Recently, the Kolmogorov-Arnold Network (KAN) architecture, based on learnable activation functions parametrized as splines, has been proposed as a more interpretable alternative to MLPs. Building on this, we introduce a novel ANN architecture called MonoKAN, which is based on the KAN architecture and achieves certified partial monotonicity while enhancing interpretability. To achieve this, we employ cubic Hermite splines, which guarantee monotonicity through a set of straightforward conditions. Additionally, by using positive weights in the linear combinations of these splines, we ensure that the network preserves the monotonic relationships between input and output. Our experiments demonstrate that MonoKAN not only enhances interpretability but also improves predictive performance across the majority of benchmarks, outperforming state-of-the-art monotonic MLP approaches.
人工神经网络(ANN)通过有效识别模式和解决复杂问题,极大地推动了各个领域的发展。尽管取得了这些进步,但其可解释性仍然是一个严峻的挑战,尤其是在对透明度和问责制至关重要的应用领域。为了解决这个问题,可解释人工智能(XAI)在揭开人工智能的神秘面纱方面取得了进展,但仅有可解释性往往是不够的。在某些应用中,模型预测必须符合专家提出的要求,有时部分单调性约束就是一个例子。虽然传统多层感知器(MLP)的单调性方法已见诸文献,但它们在实现可解释性和经认证的部分单调性方面仍然面临困难。最近,有人提出了基于参数化为劈线的可学习激活函数的 Kolmogorov-Arnold 网络(KAN)架构,作为 MLP 的一种更可解释的替代方案。在此基础上,我们推出了一种名为 MonoKAN 的新型 ANN 架构,它以 KAN 架构为基础,在增强可解释性的同时实现了经认证的部分单调性。此外,通过在这些样条的线性组合中使用正权重,我们确保了网络保留了输入和输出之间的单调关系。我们的实验证明,MonoKAN 不仅增强了可解释性,而且在大多数基准测试中提高了预测性能,表现优于最先进的单调 MLP 方法。
{"title":"MonoKAN: Certified Monotonic Kolmogorov-Arnold Network","authors":"Alejandro Polo-Molina, David Alfaya, Jose Portela","doi":"arxiv-2409.11078","DOIUrl":"https://doi.org/arxiv-2409.11078","url":null,"abstract":"Artificial Neural Networks (ANNs) have significantly advanced various fields\u0000by effectively recognizing patterns and solving complex problems. Despite these\u0000advancements, their interpretability remains a critical challenge, especially\u0000in applications where transparency and accountability are essential. To address\u0000this, explainable AI (XAI) has made progress in demystifying ANNs, yet\u0000interpretability alone is often insufficient. In certain applications, model\u0000predictions must align with expert-imposed requirements, sometimes exemplified\u0000by partial monotonicity constraints. While monotonic approaches are found in\u0000the literature for traditional Multi-layer Perceptrons (MLPs), they still face\u0000difficulties in achieving both interpretability and certified partial\u0000monotonicity. Recently, the Kolmogorov-Arnold Network (KAN) architecture, based\u0000on learnable activation functions parametrized as splines, has been proposed as\u0000a more interpretable alternative to MLPs. Building on this, we introduce a\u0000novel ANN architecture called MonoKAN, which is based on the KAN architecture\u0000and achieves certified partial monotonicity while enhancing interpretability.\u0000To achieve this, we employ cubic Hermite splines, which guarantee monotonicity\u0000through a set of straightforward conditions. Additionally, by using positive\u0000weights in the linear combinations of these splines, we ensure that the network\u0000preserves the monotonic relationships between input and output. Our experiments\u0000demonstrate that MonoKAN not only enhances interpretability but also improves\u0000predictive performance across the majority of benchmarks, outperforming\u0000state-of-the-art monotonic MLP approaches.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces Bio-Inspired Mamba (BIM), a novel online learning framework for selective state space models that integrates biological learning principles with the Mamba architecture. BIM combines Real-Time Recurrent Learning (RTRL) with Spike-Timing-Dependent Plasticity (STDP)-like local learning rules, addressing the challenges of temporal locality and biological plausibility in training spiking neural networks. Our approach leverages the inherent connection between backpropagation through time and STDP, offering a computationally efficient alternative that maintains the ability to capture long-range dependencies. We evaluate BIM on language modeling, speech recognition, and biomedical signal analysis tasks, demonstrating competitive performance against traditional methods while adhering to biological learning principles. Results show improved energy efficiency and potential for neuromorphic hardware implementation. BIM not only advances the field of biologically plausible machine learning but also provides insights into the mechanisms of temporal information processing in biological neural networks.
{"title":"Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models","authors":"Jiahao Qin","doi":"arxiv-2409.11263","DOIUrl":"https://doi.org/arxiv-2409.11263","url":null,"abstract":"This paper introduces Bio-Inspired Mamba (BIM), a novel online learning\u0000framework for selective state space models that integrates biological learning\u0000principles with the Mamba architecture. BIM combines Real-Time Recurrent\u0000Learning (RTRL) with Spike-Timing-Dependent Plasticity (STDP)-like local\u0000learning rules, addressing the challenges of temporal locality and biological\u0000plausibility in training spiking neural networks. Our approach leverages the\u0000inherent connection between backpropagation through time and STDP, offering a\u0000computationally efficient alternative that maintains the ability to capture\u0000long-range dependencies. We evaluate BIM on language modeling, speech\u0000recognition, and biomedical signal analysis tasks, demonstrating competitive\u0000performance against traditional methods while adhering to biological learning\u0000principles. Results show improved energy efficiency and potential for\u0000neuromorphic hardware implementation. BIM not only advances the field of\u0000biologically plausible machine learning but also provides insights into the\u0000mechanisms of temporal information processing in biological neural networks.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Forward-Forward (FF) algorithm is a recent, purely forward-mode learning method, that updates weights locally and layer-wise and supports supervised as well as unsupervised learning. These features make it ideal for applications such as brain-inspired learning, low-power hardware neural networks, and distributed learning in large models. However, while FF has shown promise on written digit recognition tasks, its performance on natural images and time-series remains a challenge. A key limitation is the need to generate high-quality negative examples for contrastive learning, especially in unsupervised tasks, where versatile solutions are currently lacking. To address this, we introduce the Self-Contrastive Forward-Forward (SCFF) method, inspired by self-supervised contrastive learning. SCFF generates positive and negative examples applicable across different datasets, surpassing existing local forward algorithms for unsupervised classification accuracy on MNIST (MLP: 98.7%), CIFAR-10 (CNN: 80.75%), and STL-10 (CNN: 77.3%). Additionally, SCFF is the first to enable FF training of recurrent neural networks, opening the door to more complex tasks and continuous-time video and text processing.
{"title":"Self-Contrastive Forward-Forward Algorithm","authors":"Xing Chen, Dongshu Liu, Jeremie Laydevant, Julie Grollier","doi":"arxiv-2409.11593","DOIUrl":"https://doi.org/arxiv-2409.11593","url":null,"abstract":"The Forward-Forward (FF) algorithm is a recent, purely forward-mode learning\u0000method, that updates weights locally and layer-wise and supports supervised as\u0000well as unsupervised learning. These features make it ideal for applications\u0000such as brain-inspired learning, low-power hardware neural networks, and\u0000distributed learning in large models. However, while FF has shown promise on\u0000written digit recognition tasks, its performance on natural images and\u0000time-series remains a challenge. A key limitation is the need to generate\u0000high-quality negative examples for contrastive learning, especially in\u0000unsupervised tasks, where versatile solutions are currently lacking. To address\u0000this, we introduce the Self-Contrastive Forward-Forward (SCFF) method, inspired\u0000by self-supervised contrastive learning. SCFF generates positive and negative\u0000examples applicable across different datasets, surpassing existing local\u0000forward algorithms for unsupervised classification accuracy on MNIST (MLP:\u000098.7%), CIFAR-10 (CNN: 80.75%), and STL-10 (CNN: 77.3%). Additionally, SCFF is\u0000the first to enable FF training of recurrent neural networks, opening the door\u0000to more complex tasks and continuous-time video and text processing.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-world tabular learning production scenarios typically involve evolving data streams, where data arrives continuously and its distribution may change over time. In such a setting, most studies in the literature regarding supervised learning favor the use of instance incremental algorithms due to their ability to adapt to changes in the data distribution. Another significant reason for choosing these algorithms is textit{avoid storing observations in memory} as commonly done in batch incremental settings. However, the design of instance incremental algorithms often assumes immediate availability of labels, which is an optimistic assumption. In many real-world scenarios, such as fraud detection or credit scoring, labels may be delayed. Consequently, batch incremental algorithms are widely used in many real-world tasks. This raises an important question: "In delayed settings, is instance incremental learning the best option regarding predictive performance and computational efficiency?" Unfortunately, this question has not been studied in depth, probably due to the scarcity of real datasets containing delayed information. In this study, we conduct a comprehensive empirical evaluation and analysis of this question using a real-world fraud detection problem and commonly used generated datasets. Our findings indicate that instance incremental learning is not the superior option, considering on one side state-of-the-art models such as Adaptive Random Forest (ARF) and other side batch learning models such as XGBoost. Additionally, when considering the interpretability of the learning systems, batch incremental solutions tend to be favored. Code: url{https://github.com/anselmeamekoe/DelayedLabelStream}
{"title":"Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection","authors":"Kodjo Mawuena Amekoe, Mustapha Lebbah, Gregoire Jaffre, Hanene Azzag, Zaineb Chelly Dagdia","doi":"arxiv-2409.10111","DOIUrl":"https://doi.org/arxiv-2409.10111","url":null,"abstract":"Real-world tabular learning production scenarios typically involve evolving\u0000data streams, where data arrives continuously and its distribution may change\u0000over time. In such a setting, most studies in the literature regarding\u0000supervised learning favor the use of instance incremental algorithms due to\u0000their ability to adapt to changes in the data distribution. Another significant\u0000reason for choosing these algorithms is textit{avoid storing observations in\u0000memory} as commonly done in batch incremental settings. However, the design of\u0000instance incremental algorithms often assumes immediate availability of labels,\u0000which is an optimistic assumption. In many real-world scenarios, such as fraud\u0000detection or credit scoring, labels may be delayed. Consequently, batch\u0000incremental algorithms are widely used in many real-world tasks. This raises an\u0000important question: \"In delayed settings, is instance incremental learning the\u0000best option regarding predictive performance and computational efficiency?\"\u0000Unfortunately, this question has not been studied in depth, probably due to the\u0000scarcity of real datasets containing delayed information. In this study, we\u0000conduct a comprehensive empirical evaluation and analysis of this question\u0000using a real-world fraud detection problem and commonly used generated\u0000datasets. Our findings indicate that instance incremental learning is not the\u0000superior option, considering on one side state-of-the-art models such as\u0000Adaptive Random Forest (ARF) and other side batch learning models such as\u0000XGBoost. Additionally, when considering the interpretability of the learning\u0000systems, batch incremental solutions tend to be favored. Code:\u0000url{https://github.com/anselmeamekoe/DelayedLabelStream}","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present the first parameterized analysis of a standard (1+1) Evolutionary Algorithm on a distribution of vertex cover problems. We show that if the planted cover is at most logarithmic, restarting the (1+1) EA every $O(n log n)$ steps will find a cover at least as small as the planted cover in polynomial time for sufficiently dense random graphs $p > 0.71$. For superlogarithmic planted covers, we prove that the (1+1) EA finds a solution in fixed-parameter tractable time in expectation. We complement these theoretical investigations with a number of computational experiments that highlight the interplay between planted cover size, graph density and runtime.
{"title":"Fixed-Parameter Tractability of the (1+1) Evolutionary Algorithm on Random Planted Vertex Covers","authors":"Jack Kearney, Frank Neumann, Andrew M. Sutton","doi":"arxiv-2409.10144","DOIUrl":"https://doi.org/arxiv-2409.10144","url":null,"abstract":"We present the first parameterized analysis of a standard (1+1) Evolutionary\u0000Algorithm on a distribution of vertex cover problems. We show that if the\u0000planted cover is at most logarithmic, restarting the (1+1) EA every $O(n log\u0000n)$ steps will find a cover at least as small as the planted cover in\u0000polynomial time for sufficiently dense random graphs $p > 0.71$. For\u0000superlogarithmic planted covers, we prove that the (1+1) EA finds a solution in\u0000fixed-parameter tractable time in expectation. We complement these theoretical investigations with a number of computational\u0000experiments that highlight the interplay between planted cover size, graph\u0000density and runtime.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transformers stand as the cornerstone of mordern deep learning. Traditionally, these models rely on multi-layer perceptron (MLP) layers to mix the information between channels. In this paper, we introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLP layers with Kolmogorov-Arnold Network (KAN) layers to enhance the expressiveness and performance of the model. Integrating KANs into transformers, however, is no easy feat, especially when scaled up. Specifically, we identify three key challenges: (C1) Base function. The standard B-spline function used in KANs is not optimized for parallel computing on modern hardware, resulting in slower inference speeds. (C2) Parameter and Computation Inefficiency. KAN requires a unique function for each input-output pair, making the computation extremely large. (C3) Weight initialization. The initialization of weights in KANs is particularly challenging due to their learnable activation functions, which are critical for achieving convergence in deep neural networks. To overcome the aforementioned challenges, we propose three key solutions: (S1) Rational basis. We replace B-spline functions with rational functions to improve compatibility with modern GPUs. By implementing this in CUDA, we achieve faster computations. (S2) Group KAN. We share the activation weights through a group of neurons, to reduce the computational load without sacrificing performance. (S3) Variance-preserving initialization. We carefully initialize the activation weights to make sure that the activation variance is maintained across layers. With these designs, KAT scales effectively and readily outperforms traditional MLP-based transformers.
{"title":"Kolmogorov-Arnold Transformer","authors":"Xingyi Yang, Xinchao Wang","doi":"arxiv-2409.10594","DOIUrl":"https://doi.org/arxiv-2409.10594","url":null,"abstract":"Transformers stand as the cornerstone of mordern deep learning.\u0000Traditionally, these models rely on multi-layer perceptron (MLP) layers to mix\u0000the information between channels. In this paper, we introduce the\u0000Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLP\u0000layers with Kolmogorov-Arnold Network (KAN) layers to enhance the\u0000expressiveness and performance of the model. Integrating KANs into\u0000transformers, however, is no easy feat, especially when scaled up.\u0000Specifically, we identify three key challenges: (C1) Base function. The\u0000standard B-spline function used in KANs is not optimized for parallel computing\u0000on modern hardware, resulting in slower inference speeds. (C2) Parameter and\u0000Computation Inefficiency. KAN requires a unique function for each input-output\u0000pair, making the computation extremely large. (C3) Weight initialization. The\u0000initialization of weights in KANs is particularly challenging due to their\u0000learnable activation functions, which are critical for achieving convergence in\u0000deep neural networks. To overcome the aforementioned challenges, we propose\u0000three key solutions: (S1) Rational basis. We replace B-spline functions with\u0000rational functions to improve compatibility with modern GPUs. By implementing\u0000this in CUDA, we achieve faster computations. (S2) Group KAN. We share the\u0000activation weights through a group of neurons, to reduce the computational load\u0000without sacrificing performance. (S3) Variance-preserving initialization. We\u0000carefully initialize the activation weights to make sure that the activation\u0000variance is maintained across layers. With these designs, KAT scales\u0000effectively and readily outperforms traditional MLP-based transformers.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"105 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shyam Venkatasubramanian, Ali Pezeshki, Vahid Tarokh
In this work, we introduce a new approach to processing complex-valued data using DNNs consisting of parallel real-valued subnetworks with coupled outputs. Our proposed class of architectures, referred to as Steinmetz Neural Networks, leverages multi-view learning to construct more interpretable representations within the latent space. Subsequently, we present the Analytic Neural Network, which implements a consistency penalty that encourages analytic signal representations in the Steinmetz neural network's latent space. This penalty enforces a deterministic and orthogonal relationship between the real and imaginary components. Utilizing an information-theoretic construction, we demonstrate that the upper bound on the generalization error posited by the analytic neural network is lower than that of the general class of Steinmetz neural networks. Our numerical experiments demonstrate the improved performance and robustness to additive noise, afforded by our proposed networks on benchmark datasets and synthetic examples.
{"title":"Steinmetz Neural Networks for Complex-Valued Data","authors":"Shyam Venkatasubramanian, Ali Pezeshki, Vahid Tarokh","doi":"arxiv-2409.10075","DOIUrl":"https://doi.org/arxiv-2409.10075","url":null,"abstract":"In this work, we introduce a new approach to processing complex-valued data\u0000using DNNs consisting of parallel real-valued subnetworks with coupled outputs.\u0000Our proposed class of architectures, referred to as Steinmetz Neural Networks,\u0000leverages multi-view learning to construct more interpretable representations\u0000within the latent space. Subsequently, we present the Analytic Neural Network,\u0000which implements a consistency penalty that encourages analytic signal\u0000representations in the Steinmetz neural network's latent space. This penalty\u0000enforces a deterministic and orthogonal relationship between the real and\u0000imaginary components. Utilizing an information-theoretic construction, we\u0000demonstrate that the upper bound on the generalization error posited by the\u0000analytic neural network is lower than that of the general class of Steinmetz\u0000neural networks. Our numerical experiments demonstrate the improved performance\u0000and robustness to additive noise, afforded by our proposed networks on\u0000benchmark datasets and synthetic examples.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}