Evaluating the language abilities of Large Language Models vs. humans: Three caveats

Biolinguistics Pub Date : 2024-04-19 DOI:10.5964/bioling.14391

Evelina Leivada, Vittoria Dentella, Fritz Günther

引用次数: 0

Abstract

We identify and analyze three caveats that may arise when analyzing the linguistic abilities of Large Language Models. The problem of unlicensed generalizations refers to the danger of interpreting performance in one task as predictive of the models’ overall capabilities, based on the assumption that because a specific task performance is indicative of certain underlying capabilities in humans, the same association holds for models. The human-like paradox refers to the problem of lacking human comparisons, while at the same time attributing human-like abilities to the models. Last, the problem of double standards refers to the use of tasks and methodologies that either cannot be applied to humans or they are evaluated differently in models vs. humans. While we recognize the impressive linguistic abilities of LLMs, we conclude that specific claims about the models’ human-likeness in the grammatical domain are premature.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估大型语言模型与人类的语言能力：三个注意事项

我们指出并分析了在分析大型语言模型的语言能力时可能出现的三个注意事项。无证概括问题指的是将某项任务的表现解释为模型整体能力的危险性，这种解释是基于这样的假设，即由于特定任务的表现表明了人类的某些基本能力，因此同样的关联也适用于模型。类人悖论指的是缺乏与人类的比较，但同时又将类人能力归因于模型的问题。最后，双重标准问题是指使用的任务和方法要么不能适用于人类，要么对模型和人类的评价不同。虽然我们承认 LLMs 令人印象深刻的语言能力，但我们的结论是，关于模型在语法领域与人类相似的具体说法还为时过早。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biolinguistics

自引率

0.00%

发文量

期刊最新文献

On Hilbert’s epsilon operator in FormSequence Evaluating the language abilities of Large Language Models vs. humans: Three caveats Uniformity and diversity of language in an evolutionary context Uniformity and diversity of language in an evolutionary context