递归神经网络利用非线性表征学习存储和生成序列

Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger
{"title":"递归神经网络利用非线性表征学习存储和生成序列","authors":"Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger","doi":"arxiv-2408.10920","DOIUrl":null,"url":null,"abstract":"The Linear Representation Hypothesis (LRH) states that neural networks learn\nto encode concepts as directions in activation space, and a strong version of\nthe LRH states that models learn only such encodings. In this paper, we present\na counterexample to this strong LRH: when trained to repeat an input token\nsequence, gated recurrent neural networks (RNNs) learn to represent the token\nat each position with a particular order of magnitude, rather than a direction.\nThese representations have layered features that are impossible to locate in\ndistinct linear subspaces. To show this, we train interventions to predict and\nmanipulate tokens by learning the scaling factor corresponding to each sequence\nposition. These interventions indicate that the smallest RNNs find only this\nmagnitude-based solution, while larger RNNs have linear representations. These\nfindings strongly indicate that interpretability research should not be\nconfined by the LRH.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations\",\"authors\":\"Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger\",\"doi\":\"arxiv-2408.10920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Linear Representation Hypothesis (LRH) states that neural networks learn\\nto encode concepts as directions in activation space, and a strong version of\\nthe LRH states that models learn only such encodings. In this paper, we present\\na counterexample to this strong LRH: when trained to repeat an input token\\nsequence, gated recurrent neural networks (RNNs) learn to represent the token\\nat each position with a particular order of magnitude, rather than a direction.\\nThese representations have layered features that are impossible to locate in\\ndistinct linear subspaces. To show this, we train interventions to predict and\\nmanipulate tokens by learning the scaling factor corresponding to each sequence\\nposition. These interventions indicate that the smallest RNNs find only this\\nmagnitude-based solution, while larger RNNs have linear representations. These\\nfindings strongly indicate that interpretability research should not be\\nconfined by the LRH.\",\"PeriodicalId\":501347,\"journal\":{\"name\":\"arXiv - CS - Neural and Evolutionary Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Neural and Evolutionary Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.10920\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

线性表征假说(Larine Representation Hypothesis,LRH)指出,神经网络学习将概念编码为激活空间中的方向,而 LRH 的强版本指出,模型只学习这样的编码。在本文中,我们提出了这个强 LRH 的反例:当训练重复输入的标记序列时,门控递归神经网络(RNN)会学习用特定的数量级而不是方向来表示每个位置上的标记。为了说明这一点,我们通过学习与每个序列位置相对应的缩放因子来训练预测和操纵标记的干预。这些干预表明,最小的 RNN 只能找到这种基于幅度的解决方案,而较大的 RNN 则具有线性表征。这些发现有力地表明,可解释性研究不应受限于 LRH。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that interpretability research should not be confined by the LRH.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hardware-Friendly Implementation of Physical Reservoir Computing with CMOS-based Time-domain Analog Spiking Neurons Self-Contrastive Forward-Forward Algorithm Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models PReLU: Yet Another Single-Layer Solution to the XOR Problem Inferno: An Extensible Framework for Spiking Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1