来自“小词世界”项目的大型汉语词汇关联数据库。

IF 4.6 2区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Behavior Research Methods Pub Date : 2024-12-30 DOI:10.3758/s13428-024-02513-1
Bing Li, Ziyi Ding, Simon De Deyne, Qing Cai
{"title":"来自“小词世界”项目的大型汉语词汇关联数据库。","authors":"Bing Li, Ziyi Ding, Simon De Deyne, Qing Cai","doi":"10.3758/s13428-024-02513-1","DOIUrl":null,"url":null,"abstract":"<p><p>Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"34"},"PeriodicalIF":4.6000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A large-scale database of Mandarin Chinese word associations from the Small World of Words Project.\",\"authors\":\"Bing Li, Ziyi Ding, Simon De Deyne, Qing Cai\",\"doi\":\"10.3758/s13428-024-02513-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.</p>\",\"PeriodicalId\":8717,\"journal\":{\"name\":\"Behavior Research Methods\",\"volume\":\"57 1\",\"pages\":\"34\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behavior Research Methods\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.3758/s13428-024-02513-1\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-024-02513-1","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

单词联想是衡量人类思维中单词含义的最直接方法之一,它捕捉到各种关系,甚至是非语言经验形成的关系。虽然荷兰语、英语和西班牙语存在大规模的词汇关联,但汉语普通话缺乏数据,汉语是一个不同语系中使用最广泛的语言。本文提出了一个基于三反应词关联任务的汉语词关联数据集——小词世界-中文(中文)(SWOW-ZH)。该数据集涵盖了来自40,000多名参与者的10,000多个提示词的回复。我们基于该数据集构建了一个语义网络,并通过预测人类处理延迟,并将其与基于文本的度量和词嵌入进行比较,来评估基于关联的度量的并发有效性。我们的研究结果表明,单词中心性显著预测词汇决策和单词命名速度。此外,SWOW-ZH在预测不同样本量的人类评价词关系方面明显优于基于文本的嵌入和基于转换器的大型语言模型。我们还强调了汉语单词联想的独特特点,特别注重构词。综上所述,我们的研究结果强调了大规模人类实验数据的重要性及其对理解语言的复杂性和丰富性的独特贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A large-scale database of Mandarin Chinese word associations from the Small World of Words Project.

Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
10.30
自引率
9.30%
发文量
266
期刊介绍: Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.
期刊最新文献
Testing for group differences in multilevel vector autoregressive models. Distribution-free Bayesian analyses with the DFBA statistical package. Jiwar: A database and calculator for word neighborhood measures in 40 languages. Open-access network science: Investigating phonological similarity networks based on the SUBTLEX-US lexicon. Survey measures of metacognitive monitoring are often false.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1