{"title":"利用电话误差分布设计文本语料库进行声学建模","authors":"H. Murakami, K. Shinoda, S. Furui","doi":"10.1109/ASRU.2011.6163929","DOIUrl":null,"url":null,"abstract":"It is expensive to prepare a sufficient amount of training data for acoustic modeling for developing large vocabulary continuous speech recognition systems. This is a serious problem especially for resource-deficient languages. We propose an active learning method that effectively reduces the amount of training data without any degradation in recognition performance. It is used to design a text corpus for read speech collection. It first estimates phone-error distribution using a small amount of fully transcribed speech data. Second, it constructs a sentence set whose phone-occurrence distribution is close to the phone-error distribution and collects its speech data. It then extends this process to diphones and triphones and collects more speech data. We evaluated our method with simulation experiments using the Corpus of Spontaneous Japanese. It required only 76 h of speech data to achieve word accuracy of 74.7%, while the conventional training method required 152 h of data to achieve the same rate.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Designing text corpus using phone-error distribution for acoustic modeling\",\"authors\":\"H. Murakami, K. Shinoda, S. Furui\",\"doi\":\"10.1109/ASRU.2011.6163929\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is expensive to prepare a sufficient amount of training data for acoustic modeling for developing large vocabulary continuous speech recognition systems. This is a serious problem especially for resource-deficient languages. We propose an active learning method that effectively reduces the amount of training data without any degradation in recognition performance. It is used to design a text corpus for read speech collection. It first estimates phone-error distribution using a small amount of fully transcribed speech data. Second, it constructs a sentence set whose phone-occurrence distribution is close to the phone-error distribution and collects its speech data. It then extends this process to diphones and triphones and collects more speech data. We evaluated our method with simulation experiments using the Corpus of Spontaneous Japanese. It required only 76 h of speech data to achieve word accuracy of 74.7%, while the conventional training method required 152 h of data to achieve the same rate.\",\"PeriodicalId\":338241,\"journal\":{\"name\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2011.6163929\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Designing text corpus using phone-error distribution for acoustic modeling
It is expensive to prepare a sufficient amount of training data for acoustic modeling for developing large vocabulary continuous speech recognition systems. This is a serious problem especially for resource-deficient languages. We propose an active learning method that effectively reduces the amount of training data without any degradation in recognition performance. It is used to design a text corpus for read speech collection. It first estimates phone-error distribution using a small amount of fully transcribed speech data. Second, it constructs a sentence set whose phone-occurrence distribution is close to the phone-error distribution and collects its speech data. It then extends this process to diphones and triphones and collects more speech data. We evaluated our method with simulation experiments using the Corpus of Spontaneous Japanese. It required only 76 h of speech data to achieve word accuracy of 74.7%, while the conventional training method required 152 h of data to achieve the same rate.