Semantic alignment: A measure to quantify the degree of semantic equivalence for English-Chinese translation equivalents based on distributional semantics.

IF 4.6 2区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Behavior Research Methods Pub Date : 2025-01-08 DOI:10.3758/s13428-024-02527-9
Yufeng Liu, Shifa Chen, Yi Yang
{"title":"Semantic alignment: A measure to quantify the degree of semantic equivalence for English-Chinese translation equivalents based on distributional semantics.","authors":"Yufeng Liu, Shifa Chen, Yi Yang","doi":"10.3758/s13428-024-02527-9","DOIUrl":null,"url":null,"abstract":"<p><p>The degree of semantic equivalence of translation pairs is typically measured by asking bilinguals to rate the semantic similarity of them or comparing the number and meaning of dictionary entries. Such measures are subjective, labor-intensive, and unable to capture the fine-grained variation in the degree of semantic equivalence. Thompson et al. (in Nature Human Behaviour, 4(10), 1029-1038, 2020) propose a computational method to quantify the extent to which translation equivalents are semantically aligned by measuring the contextual use across languages. Here, we refine this method to quantify semantic alignment of English-Chinese translation equivalents using word2vec based on the proposal that the degree of similarity between the contexts associated with a word and those of its multiple translations vary continuously. We validate our measure using semantic alignment from GloVe and fastText, and data from two behavioral datasets. The consistency of semantic alignment induced across different models confirms the robustness of our method. We demonstrate that semantic alignment not only reflects human semantic similarity judgment of translation equivalents but also captures bilinguals' usage frequency of translations. We also show that our method is more cognitively plausible than Thompson et al.'s method. Furthermore, the correlations between semantic alignment and key psycholinguistic factors mirror those between human-rated semantic similarity and these variables, indicating that computed semantic alignment reflects the degree of semantic overlap of translation equivalents in the bilingual mental lexicon. We further provide the largest English-Chinese translation equivalent dataset to date, encompassing 50,088 translation pairs for 15,734 English words, their dominant Chinese translation equivalents, and their semantic alignment Rc values.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"51"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-024-02527-9","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

The degree of semantic equivalence of translation pairs is typically measured by asking bilinguals to rate the semantic similarity of them or comparing the number and meaning of dictionary entries. Such measures are subjective, labor-intensive, and unable to capture the fine-grained variation in the degree of semantic equivalence. Thompson et al. (in Nature Human Behaviour, 4(10), 1029-1038, 2020) propose a computational method to quantify the extent to which translation equivalents are semantically aligned by measuring the contextual use across languages. Here, we refine this method to quantify semantic alignment of English-Chinese translation equivalents using word2vec based on the proposal that the degree of similarity between the contexts associated with a word and those of its multiple translations vary continuously. We validate our measure using semantic alignment from GloVe and fastText, and data from two behavioral datasets. The consistency of semantic alignment induced across different models confirms the robustness of our method. We demonstrate that semantic alignment not only reflects human semantic similarity judgment of translation equivalents but also captures bilinguals' usage frequency of translations. We also show that our method is more cognitively plausible than Thompson et al.'s method. Furthermore, the correlations between semantic alignment and key psycholinguistic factors mirror those between human-rated semantic similarity and these variables, indicating that computed semantic alignment reflects the degree of semantic overlap of translation equivalents in the bilingual mental lexicon. We further provide the largest English-Chinese translation equivalent dataset to date, encompassing 50,088 translation pairs for 15,734 English words, their dominant Chinese translation equivalents, and their semantic alignment Rc values.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语义对齐:一种量化基于分布语义的英汉对等翻译语义对等程度的方法。
翻译对的语义对等程度通常是通过要求双语者评价它们的语义相似度或比较字典条目的数量和含义来衡量的。这种度量是主观的、劳动密集型的,并且无法捕捉语义等价程度的细粒度变化。Thompson等人(Nature Human Behaviour, 4(10), 1029-1038, 2020)提出了一种计算方法,通过测量跨语言的上下文使用来量化翻译对等物在语义上的对齐程度。在此,我们基于一个词的上下文与多个翻译的上下文之间的相似度连续变化的建议,对该方法进行改进,使用word2vec来量化英汉对等翻译的语义对齐。我们使用GloVe和fastText的语义对齐以及来自两个行为数据集的数据来验证我们的测量。不同模型间语义对齐的一致性证实了我们方法的鲁棒性。研究表明,语义对齐不仅反映了人类对翻译对等物的语义相似度判断,而且反映了双语者对翻译的使用频率。我们还表明,我们的方法比汤普森等人的方法在认知上更合理。此外,语义对齐与关键心理语言学因素之间的相关性反映了人类评定的语义相似度与这些变量之间的相关性,表明计算的语义对齐反映了双语心理词典中翻译对等物的语义重叠程度。我们进一步提供了迄今为止最大的英汉翻译等效数据集,包括15,734个英语单词的50,088个翻译对,它们的主要中文翻译等效,以及它们的语义对齐Rc值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
10.30
自引率
9.30%
发文量
266
期刊介绍: Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.
期刊最新文献
Testing for group differences in multilevel vector autoregressive models. Distribution-free Bayesian analyses with the DFBA statistical package. Jiwar: A database and calculator for word neighborhood measures in 40 languages. Open-access network science: Investigating phonological similarity networks based on the SUBTLEX-US lexicon. Survey measures of metacognitive monitoring are often false.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1