Maelstrom Networks

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-29 DOI:arxiv-2408.16632

Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos

{"title":"Maelstrom Networks","authors":"Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos","doi":"arxiv-2408.16632","DOIUrl":null,"url":null,"abstract":"Artificial Neural Networks has struggled to devise a way to incorporate\nworking memory into neural networks. While the ``long term'' memory can be seen\nas the learned weights, the working memory consists likely more of dynamical\nactivity, that is missing from feed-forward models. Current state of the art\nmodels such as transformers tend to ``solve'' this by ignoring working memory\nentirely and simply process the sequence as an entire piece of data; however\nthis means the network cannot process the sequence in an online fashion, and\nleads to an immense explosion in memory requirements. Here, inspired by a\ncombination of controls, reservoir computing, deep learning, and recurrent\nneural networks, we offer an alternative paradigm that combines the strength of\nrecurrent networks, with the pattern matching capability of feed-forward neural\nnetworks, which we call the \\textit{Maelstrom Networks} paradigm. This paradigm\nleaves the recurrent component - the \\textit{Maelstrom} - unlearned, and\noffloads the learning to a powerful feed-forward network. This allows the\nnetwork to leverage the strength of feed-forward training without unrolling the\nnetwork, and allows for the memory to be implemented in new neuromorphic\nhardware. It endows a neural network with a sequential memory that takes\nadvantage of the inductive bias that data is organized causally in the temporal\ndomain, and imbues the network with a state that represents the agent's\n``self'', moving through the environment. This could also lead the way to\ncontinual learning, with the network modularized and ``'protected'' from\noverwrites that come with new data. In addition to aiding in solving these\nperformance problems that plague current non-temporal deep networks, this also\ncould finally lead towards endowing artificial networks with a sense of\n``self''.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial Neural Networks has struggled to devise a way to incorporate working memory into neural networks. While the ``long term'' memory can be seen as the learned weights, the working memory consists likely more of dynamical activity, that is missing from feed-forward models. Current state of the art models such as transformers tend to ``solve'' this by ignoring working memory entirely and simply process the sequence as an entire piece of data; however this means the network cannot process the sequence in an online fashion, and leads to an immense explosion in memory requirements. Here, inspired by a combination of controls, reservoir computing, deep learning, and recurrent neural networks, we offer an alternative paradigm that combines the strength of recurrent networks, with the pattern matching capability of feed-forward neural networks, which we call the \textit{Maelstrom Networks} paradigm. This paradigm leaves the recurrent component - the \textit{Maelstrom} - unlearned, and offloads the learning to a powerful feed-forward network. This allows the network to leverage the strength of feed-forward training without unrolling the network, and allows for the memory to be implemented in new neuromorphic hardware. It endows a neural network with a sequential memory that takes advantage of the inductive bias that data is organized causally in the temporal domain, and imbues the network with a state that represents the agent's ``self'', moving through the environment. This could also lead the way to continual learning, with the network modularized and ``'protected'' from overwrites that come with new data. In addition to aiding in solving these performance problems that plague current non-temporal deep networks, this also could finally lead towards endowing artificial networks with a sense of ``self''.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

漩涡网络

人工神经网络（Artificial Neural Networks）一直在努力设计一种将工作记忆纳入神经网络的方法。虽然 "长期 "记忆可以看作是学习到的权重，但工作记忆可能更多地由动态活动组成，这是前馈模型所缺少的。目前最先进的模型（如变换器）倾向于通过完全忽略工作记忆来 "解决 "这一问题，并简单地将序列作为整块数据进行处理；然而，这意味着网络无法以在线方式处理序列，并导致内存需求急剧膨胀。在此，我们从控制、水库计算、深度学习和递归神经网络的结合中得到启发，提出了一种替代范式，它结合了递归网络的优势和前馈神经网络的模式匹配能力，我们称之为 \textit{Maelstrom 网络}范式。这种范式不学习递归组件--textit{Maelstrom}，而是将学习工作交给功能强大的前馈网络。这样，网络就可以在不展开的情况下利用前馈训练的优势，并允许在新的神经形态硬件中实现记忆。它赋予神经网络一种顺序存储器，利用了数据在时间域中因果组织的归纳偏差，并为网络注入了一种状态，这种状态代表了在环境中移动的代理 "自己"。这也可能导致持续学习，使网络模块化，并"'保护'"网络免受新数据带来的改写。除了有助于解决困扰当前非时态深度网络的性能问题，这还可能最终赋予人工网络 "自我 "感。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Neural and Evolutionary Computing

自引率

0.00%

发文量