{"title":"Odia音译引擎使用moses","authors":"R. Balabantaray, Deepak Sahoo","doi":"10.1109/ICBIM.2014.6970927","DOIUrl":null,"url":null,"abstract":"Transliteration is an important Natural Language Processing task. Transliteration is an automatic method for converting words in one language into phonetically equivalent ones in another language. In this paper we use the popular phrase-based SMT techniques for the task of machine transliteration, for Odia-English and Odia-Hindi language pair. We have created two models for syllable based splits (Odia-English, Odia-Hindi) on 50,900 parallel entries and two models for character based splits (Odia-English, Odia-Hindi) on 1,10,000 parallel entries. SRILM is used to build statistical language models. GIZA++ is used to perform word alignments over parallel corpora. We have achieved an accuracy of 89% for Odia-English and 86% for Odia-Hindi on Syllable based split and 71% for Odia-English and 85% for Odia-Hindi on character based split.","PeriodicalId":6549,"journal":{"name":"2014 2nd International Conference on Business and Information Management (ICBIM)","volume":"69 1","pages":"27-29"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Odia transliteration engine using moses\",\"authors\":\"R. Balabantaray, Deepak Sahoo\",\"doi\":\"10.1109/ICBIM.2014.6970927\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transliteration is an important Natural Language Processing task. Transliteration is an automatic method for converting words in one language into phonetically equivalent ones in another language. In this paper we use the popular phrase-based SMT techniques for the task of machine transliteration, for Odia-English and Odia-Hindi language pair. We have created two models for syllable based splits (Odia-English, Odia-Hindi) on 50,900 parallel entries and two models for character based splits (Odia-English, Odia-Hindi) on 1,10,000 parallel entries. SRILM is used to build statistical language models. GIZA++ is used to perform word alignments over parallel corpora. We have achieved an accuracy of 89% for Odia-English and 86% for Odia-Hindi on Syllable based split and 71% for Odia-English and 85% for Odia-Hindi on character based split.\",\"PeriodicalId\":6549,\"journal\":{\"name\":\"2014 2nd International Conference on Business and Information Management (ICBIM)\",\"volume\":\"69 1\",\"pages\":\"27-29\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 2nd International Conference on Business and Information Management (ICBIM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBIM.2014.6970927\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 2nd International Conference on Business and Information Management (ICBIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBIM.2014.6970927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Transliteration is an important Natural Language Processing task. Transliteration is an automatic method for converting words in one language into phonetically equivalent ones in another language. In this paper we use the popular phrase-based SMT techniques for the task of machine transliteration, for Odia-English and Odia-Hindi language pair. We have created two models for syllable based splits (Odia-English, Odia-Hindi) on 50,900 parallel entries and two models for character based splits (Odia-English, Odia-Hindi) on 1,10,000 parallel entries. SRILM is used to build statistical language models. GIZA++ is used to perform word alignments over parallel corpora. We have achieved an accuracy of 89% for Odia-English and 86% for Odia-Hindi on Syllable based split and 71% for Odia-English and 85% for Odia-Hindi on character based split.