{"title":"A study of certain morphological structures of Kazakh and their impact on the machine translation quality","authors":"Eldar Bekbulatov, Amandyk Kartbayev","doi":"10.1109/ICAICT.2014.7036013","DOIUrl":null,"url":null,"abstract":"This paper describes a morphological analysis of the Kazakh language for Kazakh-English statistical machine translation through changing the compound words of Kazakh language, and explores the effect of using the modified input on translation quality with a large number of training sentences. Word alignment problem would become more serious for translation from morphologically rich language such as Kazakh to morphologically simple one such as English, due to the problem of data sparseness on translation word forms in many different morphological variants. We present our investigations on unsupervised Kazakh morphological segmentation over newspaper corpus and compare unsupervised segmentation against rule-based language processing tools. In our experiments, the results show that our proposed method can improve word alignment and translation quality.","PeriodicalId":103329,"journal":{"name":"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICT.2014.7036013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
This paper describes a morphological analysis of the Kazakh language for Kazakh-English statistical machine translation through changing the compound words of Kazakh language, and explores the effect of using the modified input on translation quality with a large number of training sentences. Word alignment problem would become more serious for translation from morphologically rich language such as Kazakh to morphologically simple one such as English, due to the problem of data sparseness on translation word forms in many different morphological variants. We present our investigations on unsupervised Kazakh morphological segmentation over newspaper corpus and compare unsupervised segmentation against rule-based language processing tools. In our experiments, the results show that our proposed method can improve word alignment and translation quality.