基于预训练语言模型的知识图挖掘的NMT增强

2020 22nd International Conference on Advanced Communication Technology (ICACT) Pub Date : 2020-02-01 DOI:10.23919/ICACT48636.2020.9061292

Hao Yang, Ying Qin, Yao Deng, Minghan Wang

{"title":"基于预训练语言模型的知识图挖掘的NMT增强","authors":"Hao Yang, Ying Qin, Yao Deng, Minghan Wang","doi":"10.23919/ICACT48636.2020.9061292","DOIUrl":null,"url":null,"abstract":"Pre-trained language models like Bert, RoBERTa, GPT, etc. have achieved SOTA effects on multiple NLP tasks (e.g. sentiment classification, information extraction, event extraction, etc.). We propose a simple method based on knowledge graph to improve the quality of machine translation. First, we propose a multi-task learning model that learns subjects, objects, and predicates at the same time. Second, we treat different predicates as different fields, and improve the recognition ability of NMT models in different fields through classification labels. Finally, beam search combined with L2R, R2L rearranges results through entities. Based on the CWMT2018 experimental data, using the predicate's domain classification identifier, the BLUE score increased from 33.58% to 37.63%, and through L2R, R2L rearrangement, the BLEU score increased to 39.25%, overall improvement is more than 5 percentage","PeriodicalId":296763,"journal":{"name":"2020 22nd International Conference on Advanced Communication Technology (ICACT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"NMT Enhancement based on Knowledge Graph Mining with Pre-trained Language Model\",\"authors\":\"Hao Yang, Ying Qin, Yao Deng, Minghan Wang\",\"doi\":\"10.23919/ICACT48636.2020.9061292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pre-trained language models like Bert, RoBERTa, GPT, etc. have achieved SOTA effects on multiple NLP tasks (e.g. sentiment classification, information extraction, event extraction, etc.). We propose a simple method based on knowledge graph to improve the quality of machine translation. First, we propose a multi-task learning model that learns subjects, objects, and predicates at the same time. Second, we treat different predicates as different fields, and improve the recognition ability of NMT models in different fields through classification labels. Finally, beam search combined with L2R, R2L rearranges results through entities. Based on the CWMT2018 experimental data, using the predicate's domain classification identifier, the BLUE score increased from 33.58% to 37.63%, and through L2R, R2L rearrangement, the BLEU score increased to 39.25%, overall improvement is more than 5 percentage\",\"PeriodicalId\":296763,\"journal\":{\"name\":\"2020 22nd International Conference on Advanced Communication Technology (ICACT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 22nd International Conference on Advanced Communication Technology (ICACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/ICACT48636.2020.9061292\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 22nd International Conference on Advanced Communication Technology (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICACT48636.2020.9061292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

Bert、RoBERTa、GPT等预训练语言模型已经在多个NLP任务(如情感分类、信息提取、事件提取等)上实现了SOTA效果。提出了一种基于知识图的简单方法来提高机器翻译的质量。首先，我们提出了一个同时学习主语、宾语和谓语的多任务学习模型。其次，我们将不同的谓词视为不同的领域，并通过分类标签提高NMT模型在不同领域的识别能力。最后，波束搜索结合L2R、R2L通过实体对结果进行重新排列。基于CWMT2018实验数据，使用谓词的领域分类标识符，BLUE得分从33.58%提高到37.63%，通过L2R、R2L重排，BLEU得分提高到39.25%，整体提升幅度超过5个百分点

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NMT Enhancement based on Knowledge Graph Mining with Pre-trained Language Model

Pre-trained language models like Bert, RoBERTa, GPT, etc. have achieved SOTA effects on multiple NLP tasks (e.g. sentiment classification, information extraction, event extraction, etc.). We propose a simple method based on knowledge graph to improve the quality of machine translation. First, we propose a multi-task learning model that learns subjects, objects, and predicates at the same time. Second, we treat different predicates as different fields, and improve the recognition ability of NMT models in different fields through classification labels. Finally, beam search combined with L2R, R2L rearranges results through entities. Based on the CWMT2018 experimental data, using the predicate's domain classification identifier, the BLUE score increased from 33.58% to 37.63%, and through L2R, R2L rearrangement, the BLEU score increased to 39.25%, overall improvement is more than 5 percentage

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 22nd International Conference on Advanced Communication Technology (ICACT)

自引率

0.00%

发文量

期刊最新文献

Classify and Analyze the Security Issues and Challenges in Mobile banking in Uzbekistan 2 to 4 Digital Optical Line Decoder based on Photonic Micro-Ring Resonators Session Overview Analysis and Protection of Computer Network Security Issues Preliminary Study of the Voice-controlled Electric Heat Radiator