{"title":"基于统计机器翻译和联合序列模型的音译技术比较分析","authors":"Nam Cao, Nhut M. Pham, Q. Vu","doi":"10.1145/1852611.1852624","DOIUrl":null,"url":null,"abstract":"The inability to deal with words in foreign languages imposes difficulties to both Vietnamese speech recognition and text-to-speech systems. A common solution is to look up a dictionary, but the number of available entries is finite and therefore not flexible because speech recognition and text-to-speech systems are expected to handle arbitrary words. Alternatively, data-driven approaches can be employed to transliterate a foreign word into its Vietnamese pronunciation by learning samples and predicting unseen words. This paper presents a comparative analysis between two data-driven approaches based on statistical machine translation and joint-sequence model. Two systems based on these approaches are developed and tested using the same experimental protocol and a dataset consisting of 8050 English words. Results show that joint-sequence model outperforms statistical machine translation in English-to-Vietnamese transliteration.","PeriodicalId":388053,"journal":{"name":"Proceedings of the 1st Symposium on Information and Communication Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Comparative analysis of transliteration techniques based on statistical machine translation and joint-sequence model\",\"authors\":\"Nam Cao, Nhut M. Pham, Q. Vu\",\"doi\":\"10.1145/1852611.1852624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The inability to deal with words in foreign languages imposes difficulties to both Vietnamese speech recognition and text-to-speech systems. A common solution is to look up a dictionary, but the number of available entries is finite and therefore not flexible because speech recognition and text-to-speech systems are expected to handle arbitrary words. Alternatively, data-driven approaches can be employed to transliterate a foreign word into its Vietnamese pronunciation by learning samples and predicting unseen words. This paper presents a comparative analysis between two data-driven approaches based on statistical machine translation and joint-sequence model. Two systems based on these approaches are developed and tested using the same experimental protocol and a dataset consisting of 8050 English words. Results show that joint-sequence model outperforms statistical machine translation in English-to-Vietnamese transliteration.\",\"PeriodicalId\":388053,\"journal\":{\"name\":\"Proceedings of the 1st Symposium on Information and Communication Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st Symposium on Information and Communication Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1852611.1852624\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Symposium on Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1852611.1852624","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative analysis of transliteration techniques based on statistical machine translation and joint-sequence model
The inability to deal with words in foreign languages imposes difficulties to both Vietnamese speech recognition and text-to-speech systems. A common solution is to look up a dictionary, but the number of available entries is finite and therefore not flexible because speech recognition and text-to-speech systems are expected to handle arbitrary words. Alternatively, data-driven approaches can be employed to transliterate a foreign word into its Vietnamese pronunciation by learning samples and predicting unseen words. This paper presents a comparative analysis between two data-driven approaches based on statistical machine translation and joint-sequence model. Two systems based on these approaches are developed and tested using the same experimental protocol and a dataset consisting of 8050 English words. Results show that joint-sequence model outperforms statistical machine translation in English-to-Vietnamese transliteration.