M. T. Uliniansyah, Hammam Riza, Agung Santosa, Gunarso, Made Gunawan, Elvira Nurfadhilah
{"title":"印尼语语音到语音翻译系统的文本和语音语料库的开发","authors":"M. T. Uliniansyah, Hammam Riza, Agung Santosa, Gunarso, Made Gunawan, Elvira Nurfadhilah","doi":"10.1109/ICSDA.2017.8384448","DOIUrl":null,"url":null,"abstract":"This paper describes our natural language resources especially text and speech corpora for developing an Indonesian speech-to-speech translation (S2ST) system. The corpora are used to create models for Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT), and Text-to-Speech (TTS) systems. The corpora collected since 1987 from various sources and projects such as Multilingual Machine Translation System (MMTS), PAN Localization, ASEAN MT, U-STAR, etc. Text corpora are created by either collecting from online resources or translating manually from textual sources. Speech corpora are made from several recording projects. Availability of these corpora enables us to develop Indonesian speech-to- speech translation system.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"15 13","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Development of text and speech corpus for an Indonesian speech-to-speech translation system\",\"authors\":\"M. T. Uliniansyah, Hammam Riza, Agung Santosa, Gunarso, Made Gunawan, Elvira Nurfadhilah\",\"doi\":\"10.1109/ICSDA.2017.8384448\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes our natural language resources especially text and speech corpora for developing an Indonesian speech-to-speech translation (S2ST) system. The corpora are used to create models for Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT), and Text-to-Speech (TTS) systems. The corpora collected since 1987 from various sources and projects such as Multilingual Machine Translation System (MMTS), PAN Localization, ASEAN MT, U-STAR, etc. Text corpora are created by either collecting from online resources or translating manually from textual sources. Speech corpora are made from several recording projects. Availability of these corpora enables us to develop Indonesian speech-to- speech translation system.\",\"PeriodicalId\":255147,\"journal\":{\"name\":\"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)\",\"volume\":\"15 13\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSDA.2017.8384448\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2017.8384448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Development of text and speech corpus for an Indonesian speech-to-speech translation system
This paper describes our natural language resources especially text and speech corpora for developing an Indonesian speech-to-speech translation (S2ST) system. The corpora are used to create models for Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT), and Text-to-Speech (TTS) systems. The corpora collected since 1987 from various sources and projects such as Multilingual Machine Translation System (MMTS), PAN Localization, ASEAN MT, U-STAR, etc. Text corpora are created by either collecting from online resources or translating manually from textual sources. Speech corpora are made from several recording projects. Availability of these corpora enables us to develop Indonesian speech-to- speech translation system.