{"title":"Unsupervised training of a speech recognizer using TV broadcasts","authors":"T. Kemp, A. Waibel","doi":"10.21437/ICSLP.1998-632","DOIUrl":null,"url":null,"abstract":"Current speech recognition systems require large amounts of transcribed data for parameter estimation. The transcription, however, is tedious and expensive. In this work we describe our experiments which are aimed at training a speech recognizer without transcriptions. The experiments were carried out with TV newscasts, that were recorded using a satellite receiver and a simple MPEG coding hardware. The newscasts were automatically segmented into segments of similar acoustic background condition. This material is inexpensive and can be made available in large quantities, but there are no transcriptions available. We develop a training scheme, where a recognizer is boot-strapped using very little transcribed data and is improved using new, untranscribed speech. We show that it is neces-sary to use a con(cid:12)dence measure to judge the initial transcriptions of the recognizer before using them. Higher im-provements can be achieved if the number of parameters in the system is increased when more data becomes available. We show, that the bene(cid:12)cial e(cid:11)ect of unsupervised training is not compensated by MLLR adaptation on the hypothesis. In a (cid:12)nal experiment, the e(cid:11)ect of untranscribed data is compared with the e(cid:11)ect of transcribed speech. Using the described methods, we found that the untranscribed data gives roughly one third of the improvement of the transcribed material.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Spoken Language Processing (ICSLP 1998)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ICSLP.1998-632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37
Abstract
Current speech recognition systems require large amounts of transcribed data for parameter estimation. The transcription, however, is tedious and expensive. In this work we describe our experiments which are aimed at training a speech recognizer without transcriptions. The experiments were carried out with TV newscasts, that were recorded using a satellite receiver and a simple MPEG coding hardware. The newscasts were automatically segmented into segments of similar acoustic background condition. This material is inexpensive and can be made available in large quantities, but there are no transcriptions available. We develop a training scheme, where a recognizer is boot-strapped using very little transcribed data and is improved using new, untranscribed speech. We show that it is neces-sary to use a con(cid:12)dence measure to judge the initial transcriptions of the recognizer before using them. Higher im-provements can be achieved if the number of parameters in the system is increased when more data becomes available. We show, that the bene(cid:12)cial e(cid:11)ect of unsupervised training is not compensated by MLLR adaptation on the hypothesis. In a (cid:12)nal experiment, the e(cid:11)ect of untranscribed data is compared with the e(cid:11)ect of transcribed speech. Using the described methods, we found that the untranscribed data gives roughly one third of the improvement of the transcribed material.