词性之外的意义:用语言模型捕捉伪词定义

IF 9.3 2区 计算机科学 Computational Linguistics Pub Date : 2024-07-30 DOI:10.1162/coli_a_00527
Andrea Gregor de Varda, Daniele Gatti, Marco Marelli, Fritz Günther
{"title":"词性之外的意义:用语言模型捕捉伪词定义","authors":"Andrea Gregor de Varda, Daniele Gatti, Marco Marelli, Fritz Günther","doi":"10.1162/coli_a_00527","DOIUrl":null,"url":null,"abstract":"Pseudowords such as “knackets” or “spechy” – letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon – are traditionally considered to be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated with these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"55 1","pages":""},"PeriodicalIF":9.3000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Meaning beyond lexicality: Capturing Pseudoword Definitions with Language Models\",\"authors\":\"Andrea Gregor de Varda, Daniele Gatti, Marco Marelli, Fritz Günther\",\"doi\":\"10.1162/coli_a_00527\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pseudowords such as “knackets” or “spechy” – letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon – are traditionally considered to be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated with these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.\",\"PeriodicalId\":49089,\"journal\":{\"name\":\"Computational Linguistics\",\"volume\":\"55 1\",\"pages\":\"\"},\"PeriodicalIF\":9.3000,\"publicationDate\":\"2024-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Linguistics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1162/coli_a_00527\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00527","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

像 "knackets "或 "spechy "这样的伪词--符合一种语言的正字法规则但不出现在其词典中的字母串--传统上被认为是没有意义的,在实证研究中也是这样使用的。然而,最近的研究显示了与这些词相关的特定语义模式,以及对人类伪词处理的语义影响,这些研究使人们对这种观点产生了怀疑。虽然这些研究表明伪词是有意义的,但对于人类是否能够将明确的陈述性语义内容赋予不熟悉的词形,这些研究只提供了极为有限的见解。在本研究中,我们采用了探索-确认研究设计来探讨这一问题。在第一项探索性研究中,我们从已有的单词和假词数据集以及人类为这些项目生成的定义入手。通过使用 18 种不同的语言模型,我们发现,与其他项目的定义相比,实际生成的(伪)词定义更接近各自的(伪)词。在这些初步结果的基础上,我们进行了第二次预先登记的高功率确认性研究,收集了一组新的、受控的(伪)词释义。第二次研究证实了第一次研究的结果。综上所述,这些研究结果支持这样一种观点,即意义建构是由一个灵活的形式-意义映射系统支持的,该系统基于语言环境中的统计规律性,能够在遇到新词条目时立即将其纳入其中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Meaning beyond lexicality: Capturing Pseudoword Definitions with Language Models
Pseudowords such as “knackets” or “spechy” – letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon – are traditionally considered to be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated with these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computational Linguistics
Computational Linguistics Computer Science-Artificial Intelligence
自引率
0.00%
发文量
45
期刊介绍: Computational Linguistics is the longest-running publication devoted exclusively to the computational and mathematical properties of language and the design and analysis of natural language processing systems. This highly regarded quarterly offers university and industry linguists, computational linguists, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, and philosophers the latest information about the computational aspects of all the facets of research on language.
期刊最新文献
Dotless Arabic text for Natural Language Processing Humans Learn Language from Situated Communicative Interactions. What about Machines? Exploring temporal sensitivity in the brain using multi-timescale language models: an EEG decoding study Meaning beyond lexicality: Capturing Pseudoword Definitions with Language Models Perception of Phonological Assimilation by Neural Speech Recognition Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1