基于词向量相似性的英汉机器翻译转换研究

IF 0.8 Q4 ROBOTICS Artificial Life and Robotics Pub Date : 2024-09-16 DOI:10.1007/s10015-024-00964-5
Qingqing Ma
{"title":"基于词向量相似性的英汉机器翻译转换研究","authors":"Qingqing Ma","doi":"10.1007/s10015-024-00964-5","DOIUrl":null,"url":null,"abstract":"<div><p>In English–Chinese machine translation shift, the processing of out-of-vocabulary (OOV) words has a great impact on translation quality. Aiming at OOV, this paper proposed a method based on word vector similarity, calculated the word vector similarity based on the Skip-gram model, used the most similar words to replace OOV in the source sentences, and used the replaced corpus to train the Transformer model. It was found that when the original corpus was used for training, the bilingual evaluation understudy-4 (BLEU-4) of the Transformer model on NIST2006 and NIST2008 was 37.29 and 30.73, respectively. However, when the word vector similarity was used for processing and low-frequency OOV words were retained, the BLEU-4 of the Transformer model on NIST2006 and NIST2008 was improved to 37.36 and 30.78 respectively, showing an increase. Moreover, the translation quality obtained by retaining low-frequency OOV words was better than that obtained by removing low-frequency OOV words. The experimental results prove that the English–Chinese machine translation shift method based on word vector similarity is reliable and can be applied in practice.</p></div>","PeriodicalId":46050,"journal":{"name":"Artificial Life and Robotics","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on English–Chinese machine translation shift based on word vector similarity\",\"authors\":\"Qingqing Ma\",\"doi\":\"10.1007/s10015-024-00964-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In English–Chinese machine translation shift, the processing of out-of-vocabulary (OOV) words has a great impact on translation quality. Aiming at OOV, this paper proposed a method based on word vector similarity, calculated the word vector similarity based on the Skip-gram model, used the most similar words to replace OOV in the source sentences, and used the replaced corpus to train the Transformer model. It was found that when the original corpus was used for training, the bilingual evaluation understudy-4 (BLEU-4) of the Transformer model on NIST2006 and NIST2008 was 37.29 and 30.73, respectively. However, when the word vector similarity was used for processing and low-frequency OOV words were retained, the BLEU-4 of the Transformer model on NIST2006 and NIST2008 was improved to 37.36 and 30.78 respectively, showing an increase. Moreover, the translation quality obtained by retaining low-frequency OOV words was better than that obtained by removing low-frequency OOV words. The experimental results prove that the English–Chinese machine translation shift method based on word vector similarity is reliable and can be applied in practice.</p></div>\",\"PeriodicalId\":46050,\"journal\":{\"name\":\"Artificial Life and Robotics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Life and Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10015-024-00964-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Life and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-024-00964-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

摘要

在英汉机器翻译转换中,词汇外(OOV)词的处理对翻译质量有很大影响。针对 OOV,本文提出了一种基于词向量相似性的方法,基于 Skip-gram 模型计算词向量相似性,用最相似的词替换源句中的 OOV,并用替换后的语料训练 Transformer 模型。结果发现,当使用原始语料进行训练时,Transformer 模型在 NIST2006 和 NIST2008 上的双语评估劣度-4(BLEU-4)分别为 37.29 和 30.73。然而,当使用词向量相似性进行处理并保留低频 OOV 词时,Transformer 模型在 NIST2006 和 NIST2008 上的 BLEU-4 分别提高到 37.36 和 30.78,显示出提高。此外,保留低频 OOV 词所获得的翻译质量优于去除低频 OOV 词所获得的翻译质量。实验结果证明,基于词向量相似性的英汉机器翻译转换方法是可靠的,可以在实践中应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Research on English–Chinese machine translation shift based on word vector similarity

In English–Chinese machine translation shift, the processing of out-of-vocabulary (OOV) words has a great impact on translation quality. Aiming at OOV, this paper proposed a method based on word vector similarity, calculated the word vector similarity based on the Skip-gram model, used the most similar words to replace OOV in the source sentences, and used the replaced corpus to train the Transformer model. It was found that when the original corpus was used for training, the bilingual evaluation understudy-4 (BLEU-4) of the Transformer model on NIST2006 and NIST2008 was 37.29 and 30.73, respectively. However, when the word vector similarity was used for processing and low-frequency OOV words were retained, the BLEU-4 of the Transformer model on NIST2006 and NIST2008 was improved to 37.36 and 30.78 respectively, showing an increase. Moreover, the translation quality obtained by retaining low-frequency OOV words was better than that obtained by removing low-frequency OOV words. The experimental results prove that the English–Chinese machine translation shift method based on word vector similarity is reliable and can be applied in practice.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.00
自引率
22.20%
发文量
101
期刊介绍: Artificial Life and Robotics is an international journal publishing original technical papers and authoritative state-of-the-art reviews on the development of new technologies concerning artificial life and robotics, especially computer-based simulation and hardware for the twenty-first century. This journal covers a broad multidisciplinary field, including areas such as artificial brain research, artificial intelligence, artificial life, artificial living, artificial mind research, brain science, chaos, cognitive science, complexity, computer graphics, evolutionary computations, fuzzy control, genetic algorithms, innovative computations, intelligent control and modelling, micromachines, micro-robot world cup soccer tournament, mobile vehicles, neural networks, neurocomputers, neurocomputing technologies and applications, robotics, robus virtual engineering, and virtual reality. Hardware-oriented submissions are particularly welcome. Publishing body: International Symposium on Artificial Life and RoboticsEditor-in-Chiei: Hiroshi Tanaka Hatanaka R Apartment 101, Hatanaka 8-7A, Ooaza-Hatanaka, Oita city, Oita, Japan 870-0856 ©International Symposium on Artificial Life and Robotics
期刊最新文献
AI robots pioneer the Smarter Inclusive Society Research on coordinated control strategy of distributed static synchronous series compensator based on multi-objective optimization immune algorithm Probabilistic model for high-level intention estimation and trajectory prediction in urban environments Preservation of emotional context in tweet embeddings on social networking sites Spiking neural networks-based generation of caterpillar-like soft robot crawling motions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1