{"title":"Acoustic and language models adaptation for Indonesian spontaneous speech recognition","authors":"D. Lestari, Angela Irfani","doi":"10.1109/ICAICTA.2015.7335375","DOIUrl":null,"url":null,"abstract":"Performance of Indonesian Automatic Speech Recognition is decreased significantly when recognizing spontaneous speech. Spontaneous speech has particular characteristics differ from read speech both in acoustic and language rule. In spontaneous speech, the pronunciation and expression of the speech varies depending on the speaker fluency and the topic. Disfluencies in speech disrupt a fluent sentence and more often violates the rule of the formal language. To improve Indonesian automatic speech recognizer to recognize spontaneous speech, several model enhancement methods was conducted by adding spontaneous data and retrain both acoustic model and language model using those data, by adapting the acoustic model based on the maximum likelihood linear regression and maximum a posteriori approach, and by adapting the language model employing the language model linear interpolation. Experimental results show all methods are effective in increasing the capability of the Indonesian automatic speech recognizer to recognize spontaneous data. However, all methods decreased the accuracy of read speech recognition. On average, retraining both acoustic and language models using combination of read and spontaneous data is more effective than conducting model adaptation. The absolute improvement of 28.34% accuracy is achieved after retraining both language model and acoustic model using combination of read data and spontaneous data.","PeriodicalId":319020,"journal":{"name":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2015.7335375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Performance of Indonesian Automatic Speech Recognition is decreased significantly when recognizing spontaneous speech. Spontaneous speech has particular characteristics differ from read speech both in acoustic and language rule. In spontaneous speech, the pronunciation and expression of the speech varies depending on the speaker fluency and the topic. Disfluencies in speech disrupt a fluent sentence and more often violates the rule of the formal language. To improve Indonesian automatic speech recognizer to recognize spontaneous speech, several model enhancement methods was conducted by adding spontaneous data and retrain both acoustic model and language model using those data, by adapting the acoustic model based on the maximum likelihood linear regression and maximum a posteriori approach, and by adapting the language model employing the language model linear interpolation. Experimental results show all methods are effective in increasing the capability of the Indonesian automatic speech recognizer to recognize spontaneous data. However, all methods decreased the accuracy of read speech recognition. On average, retraining both acoustic and language models using combination of read and spontaneous data is more effective than conducting model adaptation. The absolute improvement of 28.34% accuracy is achieved after retraining both language model and acoustic model using combination of read data and spontaneous data.