{"title":"Multi-word Expressions in English-Latvian Machine Translation","authors":"I. Skadina","doi":"10.22364/BJMC.2016.4.4.14","DOIUrl":null,"url":null,"abstract":"The paper presents series of experiments that aim to find best method how to treat multi-word expressions (MWE) in machine translation task. Methods have been investigated in a framework of statistical machine translation (SMT) for translation form English into Latvian. MWE candidates have been extracted using pattern-based and statistical approaches. Different techniques for MWE integration into SMT system are analysed. The best result +0.59 BLEU points – has been achieved by combining two phrase tables bilingual MWE dictionary and phrase table created from the parallel corpus in which statistically extracted MWE candidates are treated as single tokens. Using only bilingual dictionary as additional source of information the best result (+0.36 BLEU points) is obtained by combining two phrase tables. In case of statistically obtained MWE lists, the best result (+0.51 BLEU points) is achieved with the largest list of MWE candidates.","PeriodicalId":431209,"journal":{"name":"Balt. J. Mod. Comput.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Balt. J. Mod. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22364/BJMC.2016.4.4.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The paper presents series of experiments that aim to find best method how to treat multi-word expressions (MWE) in machine translation task. Methods have been investigated in a framework of statistical machine translation (SMT) for translation form English into Latvian. MWE candidates have been extracted using pattern-based and statistical approaches. Different techniques for MWE integration into SMT system are analysed. The best result +0.59 BLEU points – has been achieved by combining two phrase tables bilingual MWE dictionary and phrase table created from the parallel corpus in which statistically extracted MWE candidates are treated as single tokens. Using only bilingual dictionary as additional source of information the best result (+0.36 BLEU points) is obtained by combining two phrase tables. In case of statistically obtained MWE lists, the best result (+0.51 BLEU points) is achieved with the largest list of MWE candidates.