{"title":"Development of under-resourced Bahasa Indonesia speech corpus","authors":"E. Cahyaningtyas, D. Arifianto","doi":"10.1109/APSIPA.2017.8282191","DOIUrl":null,"url":null,"abstract":"Although Bahasa Indonesia is used by about 263 milion people in the world, it is calssified into an under- resourced language. In this paper we outlined the development of casual sentences of Bahasa Indonesia speech corpus in which contains a speech database and its transcription. Firstly, we selected casual Bahasa Indonesia sentences from movie and drama trasncript and formed 1029 declarative sentences and 500 question sentences, respectively. We hired six professional radio news readers to utter the sentences to avoid local dialect in sound-proof booth. Then segmentation and labeling was performed to make create transcription including the time label of each invidual phoneme. To ensure the quality of the database, we manually inspected the waveform and the frequency of the individual sentences using spectrogram. The results suggest that the speech corpus may be used for speech processing project like speech recognition and speech synthesis. In the on-going research, we are developing high quality of speech synthesis, namely speaker adaptation and speaker averaging.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8282191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Although Bahasa Indonesia is used by about 263 milion people in the world, it is calssified into an under- resourced language. In this paper we outlined the development of casual sentences of Bahasa Indonesia speech corpus in which contains a speech database and its transcription. Firstly, we selected casual Bahasa Indonesia sentences from movie and drama trasncript and formed 1029 declarative sentences and 500 question sentences, respectively. We hired six professional radio news readers to utter the sentences to avoid local dialect in sound-proof booth. Then segmentation and labeling was performed to make create transcription including the time label of each invidual phoneme. To ensure the quality of the database, we manually inspected the waveform and the frequency of the individual sentences using spectrogram. The results suggest that the speech corpus may be used for speech processing project like speech recognition and speech synthesis. In the on-going research, we are developing high quality of speech synthesis, namely speaker adaptation and speaker averaging.