{"title":"英汉波斯语统计机器翻译中训练数据变化的影响分析","authors":"Mahsa Mohaghegh, A. Sarrafzadeh","doi":"10.1109/IIT.2009.5413782","DOIUrl":null,"url":null,"abstract":"Globalization has made machine translation an attractive area of research and development. As technology opens up e-commerce opportunities, companies must overcome language barriers to reach new potential customers and partners. Web2.0 with tools like Google Translate makes the web more accessible. Statistical Machine Translation has been used for translation between many language pairs contributing to its popularity in recent years. It has however not been used for the English/Persian pair. This paper presents the first such attempt and describes the problems faced in creating a corpus and building a base line system. Our experience with the construction of a parallel corpus during this study and the problems encountered especially with the process of alignment are discussed. The prototype constructed and its evaluation is described and results analyzed. In the final part of the paper, conclusions are drawn and work planned for the future is discussed.","PeriodicalId":239829,"journal":{"name":"2009 International Conference on Innovations in Information Technology (IIT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An analysis of the effect of training data variation in English-Persian Statistical Machine Translation\",\"authors\":\"Mahsa Mohaghegh, A. Sarrafzadeh\",\"doi\":\"10.1109/IIT.2009.5413782\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Globalization has made machine translation an attractive area of research and development. As technology opens up e-commerce opportunities, companies must overcome language barriers to reach new potential customers and partners. Web2.0 with tools like Google Translate makes the web more accessible. Statistical Machine Translation has been used for translation between many language pairs contributing to its popularity in recent years. It has however not been used for the English/Persian pair. This paper presents the first such attempt and describes the problems faced in creating a corpus and building a base line system. Our experience with the construction of a parallel corpus during this study and the problems encountered especially with the process of alignment are discussed. The prototype constructed and its evaluation is described and results analyzed. In the final part of the paper, conclusions are drawn and work planned for the future is discussed.\",\"PeriodicalId\":239829,\"journal\":{\"name\":\"2009 International Conference on Innovations in Information Technology (IIT)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Innovations in Information Technology (IIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IIT.2009.5413782\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Innovations in Information Technology (IIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIT.2009.5413782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An analysis of the effect of training data variation in English-Persian Statistical Machine Translation
Globalization has made machine translation an attractive area of research and development. As technology opens up e-commerce opportunities, companies must overcome language barriers to reach new potential customers and partners. Web2.0 with tools like Google Translate makes the web more accessible. Statistical Machine Translation has been used for translation between many language pairs contributing to its popularity in recent years. It has however not been used for the English/Persian pair. This paper presents the first such attempt and describes the problems faced in creating a corpus and building a base line system. Our experience with the construction of a parallel corpus during this study and the problems encountered especially with the process of alignment are discussed. The prototype constructed and its evaluation is described and results analyzed. In the final part of the paper, conclusions are drawn and work planned for the future is discussed.