{"title":"Research on the Application of BERT in Mongolian-Chinese Neural Machine Translation","authors":"Xiu Zhi, Siriguleng Wang","doi":"10.1145/3457682.3457744","DOIUrl":null,"url":null,"abstract":"In recent years, the research of neural networks has brought new solutions to machine translation. The application of sequence-tosequence model has made a qualitative leap in the performance of machine translation. The training of neural machine translation model depends on large-scale bilingual parallel corpus, the size of corpus directly affects the performance of neural machine translation. Under the guidance of BERT (Bidirectional Encoder) model to calculate the semantic similarity degree for the extension of training corpus in this paper. The scores of two sentences were calculated by using dot product and cosine similarity, and then the sentences with high scores were expanded to the training corpus with a scale of 540,000 sentence pairs. Finally, Transformer was used to train the Mongolian and Chinese neural machine translation system, which was 0.91 percentage points higher than the BLEU value in the baseline experiment.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457682.3457744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, the research of neural networks has brought new solutions to machine translation. The application of sequence-tosequence model has made a qualitative leap in the performance of machine translation. The training of neural machine translation model depends on large-scale bilingual parallel corpus, the size of corpus directly affects the performance of neural machine translation. Under the guidance of BERT (Bidirectional Encoder) model to calculate the semantic similarity degree for the extension of training corpus in this paper. The scores of two sentences were calculated by using dot product and cosine similarity, and then the sentences with high scores were expanded to the training corpus with a scale of 540,000 sentence pairs. Finally, Transformer was used to train the Mongolian and Chinese neural machine translation system, which was 0.91 percentage points higher than the BLEU value in the baseline experiment.