Linear Recency Bias During Training Improves Transformers' Fit to Reading Times

arXiv - CS - Computation and Language Pub Date : 2024-09-17 DOI:arxiv-2409.11250

Christian Clark, Byung-Doh Oh, William Schuler

引用次数: 0

Abstract

Recent psycholinguistic research has compared human reading times to surprisal estimates from language models to study the factors shaping human sentence processing difficulty. Previous studies have shown a strong fit between surprisal values from Transformers and reading times. However, standard Transformers work with a lossless representation of the entire previous linguistic context, unlike models of human language processing that include memory decay. To bridge this gap, this paper evaluates a modification of the Transformer model that uses ALiBi (Press et al., 2022), a recency bias added to attention scores. Surprisal estimates with ALiBi show an improved fit to human reading times compared to a standard Transformer baseline. A subsequent analysis of attention heads suggests that ALiBi's mixture of slopes -- which determine the rate of memory decay in each attention head -- may play a role in the improvement by helping models with ALiBi to track different kinds of linguistic dependencies.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

训练过程中的线性回忆偏差可提高变压器与阅读时间的匹配度

最近的心理语言学研究将人类的阅读时间与语言模型的意外估计值进行了比较，以研究影响人类句子处理难度的因素。以往的研究表明，Transformers 得出的惊奇值与阅读时间之间的拟合度很高。然而，标准转换器使用的是整个先前语言上下文的无损表示，这与包含记忆衰减的人类语言处理模型不同。为了弥补这一差距，本文评估了使用 ALiBi（Press 等人，2022 年）对 Transformer 模型进行的修改。与标准Transformer基线相比，使用ALiBi的惊奇估计结果显示与人类阅读时间的拟合度有所提高。随后对注意力头的分析表明，ALiBi 的混合斜率--它决定了每个注意力头的记忆衰减速度--可能通过帮助使用 ALiBi 的模型跟踪不同类型的语言依赖性而起到了改善的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Computation and Language

自引率

0.00%

发文量