超越基于计数的词频:人工智能生成的单词和短语熟悉度估计是语言知识的一个有趣的附加指标。

IF 4.6 2区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Behavior Research Methods Pub Date : 2024-12-28 DOI:10.3758/s13428-024-02561-7
Marc Brysbaert, Gonzalo Martínez, Pedro Reviriego
{"title":"超越基于计数的词频:人工智能生成的单词和短语熟悉度估计是语言知识的一个有趣的附加指标。","authors":"Marc Brysbaert, Gonzalo Martínez, Pedro Reviriego","doi":"10.3758/s13428-024-02561-7","DOIUrl":null,"url":null,"abstract":"<p><p>This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures. We then applied LLM estimates to MWEs, also finding their effectiveness in measuring familiarity for these expressions. We have created a list of more than 400,000 English words and MWEs with LLM-generated familiarity estimates, which we hope will be a valuable resource for researchers. There is also a cleaned-up list of nearly 150,000 entries, excluding lesser-known stimuli, to streamline stimulus selection. Our findings highlight the advantages of LLM-based familiarity estimates, including their better performance than traditional word frequency measures (particularly for predicting word recognition accuracy), their ability to generalize to MWEs, availability for large lists of words, and ease of obtaining new estimates for all types of stimuli.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"28"},"PeriodicalIF":4.6000,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge.\",\"authors\":\"Marc Brysbaert, Gonzalo Martínez, Pedro Reviriego\",\"doi\":\"10.3758/s13428-024-02561-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures. We then applied LLM estimates to MWEs, also finding their effectiveness in measuring familiarity for these expressions. We have created a list of more than 400,000 English words and MWEs with LLM-generated familiarity estimates, which we hope will be a valuable resource for researchers. There is also a cleaned-up list of nearly 150,000 entries, excluding lesser-known stimuli, to streamline stimulus selection. Our findings highlight the advantages of LLM-based familiarity estimates, including their better performance than traditional word frequency measures (particularly for predicting word recognition accuracy), their ability to generalize to MWEs, availability for large lists of words, and ease of obtaining new estimates for all types of stimuli.</p>\",\"PeriodicalId\":8717,\"journal\":{\"name\":\"Behavior Research Methods\",\"volume\":\"57 1\",\"pages\":\"28\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behavior Research Methods\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.3758/s13428-024-02561-7\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-024-02561-7","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

本研究探讨了大型语言模型(llm)在估计单词和多词表达(MWEs)熟悉度方面的潜力。我们使用现有的人类熟悉度评级验证了LLM对孤立单词的估计,并发现了很强的相关性。在大型研究中,LLM熟悉度估计在预测词汇决策和命名性能方面比最好的可用词频测量方法表现得更好。然后,我们将LLM估计应用于MWEs,也发现它们在测量这些表达的熟悉度方面的有效性。我们已经创建了一个包含超过40万个英语单词和MWEs的列表,并使用llm生成的熟悉度估算,我们希望这将成为研究人员的宝贵资源。此外,为了简化刺激因素的选择,还清理了近15万个条目,不包括不太知名的刺激因素。我们的研究结果突出了基于llm的熟悉度估计的优势,包括它们比传统的词频测量(特别是预测单词识别准确性)更好的性能,它们推广到MWEs的能力,大单词列表的可用性,以及易于获得所有类型刺激的新估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge.

This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures. We then applied LLM estimates to MWEs, also finding their effectiveness in measuring familiarity for these expressions. We have created a list of more than 400,000 English words and MWEs with LLM-generated familiarity estimates, which we hope will be a valuable resource for researchers. There is also a cleaned-up list of nearly 150,000 entries, excluding lesser-known stimuli, to streamline stimulus selection. Our findings highlight the advantages of LLM-based familiarity estimates, including their better performance than traditional word frequency measures (particularly for predicting word recognition accuracy), their ability to generalize to MWEs, availability for large lists of words, and ease of obtaining new estimates for all types of stimuli.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
10.30
自引率
9.30%
发文量
266
期刊介绍: Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.
期刊最新文献
Testing for group differences in multilevel vector autoregressive models. Distribution-free Bayesian analyses with the DFBA statistical package. Jiwar: A database and calculator for word neighborhood measures in 40 languages. Open-access network science: Investigating phonological similarity networks based on the SUBTLEX-US lexicon. Survey measures of metacognitive monitoring are often false.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1