Evaluating the language abilities of Large Language Models vs. humans: Three caveats

Evelina Leivada, Vittoria Dentella, Fritz Günther
{"title":"Evaluating the language abilities of Large Language Models vs. humans: Three caveats","authors":"Evelina Leivada, Vittoria Dentella, Fritz Günther","doi":"10.5964/bioling.14391","DOIUrl":null,"url":null,"abstract":"We identify and analyze three caveats that may arise when analyzing the linguistic abilities of Large Language Models. The problem of unlicensed generalizations refers to the danger of interpreting performance in one task as predictive of the models’ overall capabilities, based on the assumption that because a specific task performance is indicative of certain underlying capabilities in humans, the same association holds for models. The human-like paradox refers to the problem of lacking human comparisons, while at the same time attributing human-like abilities to the models. Last, the problem of double standards refers to the use of tasks and methodologies that either cannot be applied to humans or they are evaluated differently in models vs. humans. While we recognize the impressive linguistic abilities of LLMs, we conclude that specific claims about the models’ human-likeness in the grammatical domain are premature.","PeriodicalId":504415,"journal":{"name":"Biolinguistics","volume":" 17","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biolinguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5964/bioling.14391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We identify and analyze three caveats that may arise when analyzing the linguistic abilities of Large Language Models. The problem of unlicensed generalizations refers to the danger of interpreting performance in one task as predictive of the models’ overall capabilities, based on the assumption that because a specific task performance is indicative of certain underlying capabilities in humans, the same association holds for models. The human-like paradox refers to the problem of lacking human comparisons, while at the same time attributing human-like abilities to the models. Last, the problem of double standards refers to the use of tasks and methodologies that either cannot be applied to humans or they are evaluated differently in models vs. humans. While we recognize the impressive linguistic abilities of LLMs, we conclude that specific claims about the models’ human-likeness in the grammatical domain are premature.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估大型语言模型与人类的语言能力:三个注意事项
我们指出并分析了在分析大型语言模型的语言能力时可能出现的三个注意事项。无证概括问题指的是将某项任务的表现解释为模型整体能力的危险性,这种解释是基于这样的假设,即由于特定任务的表现表明了人类的某些基本能力,因此同样的关联也适用于模型。类人悖论指的是缺乏与人类的比较,但同时又将类人能力归因于模型的问题。最后,双重标准问题是指使用的任务和方法要么不能适用于人类,要么对模型和人类的评价不同。虽然我们承认 LLMs 令人印象深刻的语言能力,但我们的结论是,关于模型在语法领域与人类相似的具体说法还为时过早。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On Hilbert’s epsilon operator in FormSequence Evaluating the language abilities of Large Language Models vs. humans: Three caveats Uniformity and diversity of language in an evolutionary context Uniformity and diversity of language in an evolutionary context
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1