Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos
{"title":"Maelstrom Networks","authors":"Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos","doi":"arxiv-2408.16632","DOIUrl":null,"url":null,"abstract":"Artificial Neural Networks has struggled to devise a way to incorporate\nworking memory into neural networks. While the ``long term'' memory can be seen\nas the learned weights, the working memory consists likely more of dynamical\nactivity, that is missing from feed-forward models. Current state of the art\nmodels such as transformers tend to ``solve'' this by ignoring working memory\nentirely and simply process the sequence as an entire piece of data; however\nthis means the network cannot process the sequence in an online fashion, and\nleads to an immense explosion in memory requirements. Here, inspired by a\ncombination of controls, reservoir computing, deep learning, and recurrent\nneural networks, we offer an alternative paradigm that combines the strength of\nrecurrent networks, with the pattern matching capability of feed-forward neural\nnetworks, which we call the \\textit{Maelstrom Networks} paradigm. This paradigm\nleaves the recurrent component - the \\textit{Maelstrom} - unlearned, and\noffloads the learning to a powerful feed-forward network. This allows the\nnetwork to leverage the strength of feed-forward training without unrolling the\nnetwork, and allows for the memory to be implemented in new neuromorphic\nhardware. It endows a neural network with a sequential memory that takes\nadvantage of the inductive bias that data is organized causally in the temporal\ndomain, and imbues the network with a state that represents the agent's\n``self'', moving through the environment. This could also lead the way to\ncontinual learning, with the network modularized and ``'protected'' from\noverwrites that come with new data. In addition to aiding in solving these\nperformance problems that plague current non-temporal deep networks, this also\ncould finally lead towards endowing artificial networks with a sense of\n``self''.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"7 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial Neural Networks has struggled to devise a way to incorporate
working memory into neural networks. While the ``long term'' memory can be seen
as the learned weights, the working memory consists likely more of dynamical
activity, that is missing from feed-forward models. Current state of the art
models such as transformers tend to ``solve'' this by ignoring working memory
entirely and simply process the sequence as an entire piece of data; however
this means the network cannot process the sequence in an online fashion, and
leads to an immense explosion in memory requirements. Here, inspired by a
combination of controls, reservoir computing, deep learning, and recurrent
neural networks, we offer an alternative paradigm that combines the strength of
recurrent networks, with the pattern matching capability of feed-forward neural
networks, which we call the \textit{Maelstrom Networks} paradigm. This paradigm
leaves the recurrent component - the \textit{Maelstrom} - unlearned, and
offloads the learning to a powerful feed-forward network. This allows the
network to leverage the strength of feed-forward training without unrolling the
network, and allows for the memory to be implemented in new neuromorphic
hardware. It endows a neural network with a sequential memory that takes
advantage of the inductive bias that data is organized causally in the temporal
domain, and imbues the network with a state that represents the agent's
``self'', moving through the environment. This could also lead the way to
continual learning, with the network modularized and ``'protected'' from
overwrites that come with new data. In addition to aiding in solving these
performance problems that plague current non-temporal deep networks, this also
could finally lead towards endowing artificial networks with a sense of
``self''.