Unsupervised training of a speech recognizer using TV broadcasts

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI:10.21437/ICSLP.1998-632

T. Kemp, A. Waibel

{"title":"Unsupervised training of a speech recognizer using TV broadcasts","authors":"T. Kemp, A. Waibel","doi":"10.21437/ICSLP.1998-632","DOIUrl":null,"url":null,"abstract":"Current speech recognition systems require large amounts of transcribed data for parameter estimation. The transcription, however, is tedious and expensive. In this work we describe our experiments which are aimed at training a speech recognizer without transcriptions. The experiments were carried out with TV newscasts, that were recorded using a satellite receiver and a simple MPEG coding hardware. The newscasts were automatically segmented into segments of similar acoustic background condition. This material is inexpensive and can be made available in large quantities, but there are no transcriptions available. We develop a training scheme, where a recognizer is boot-strapped using very little transcribed data and is improved using new, untranscribed speech. We show that it is neces-sary to use a con(cid:12)dence measure to judge the initial transcriptions of the recognizer before using them. Higher im-provements can be achieved if the number of parameters in the system is increased when more data becomes available. We show, that the bene(cid:12)cial e(cid:11)ect of unsupervised training is not compensated by MLLR adaptation on the hypothesis. In a (cid:12)nal experiment, the e(cid:11)ect of untranscribed data is compared with the e(cid:11)ect of transcribed speech. Using the described methods, we found that the untranscribed data gives roughly one third of the improvement of the transcribed material.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Spoken Language Processing (ICSLP 1998)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ICSLP.1998-632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

Abstract

Current speech recognition systems require large amounts of transcribed data for parameter estimation. The transcription, however, is tedious and expensive. In this work we describe our experiments which are aimed at training a speech recognizer without transcriptions. The experiments were carried out with TV newscasts, that were recorded using a satellite receiver and a simple MPEG coding hardware. The newscasts were automatically segmented into segments of similar acoustic background condition. This material is inexpensive and can be made available in large quantities, but there are no transcriptions available. We develop a training scheme, where a recognizer is boot-strapped using very little transcribed data and is improved using new, untranscribed speech. We show that it is neces-sary to use a con(cid:12)dence measure to judge the initial transcriptions of the recognizer before using them. Higher im-provements can be achieved if the number of parameters in the system is increased when more data becomes available. We show, that the bene(cid:12)cial e(cid:11)ect of unsupervised training is not compensated by MLLR adaptation on the hypothesis. In a (cid:12)nal experiment, the e(cid:11)ect of untranscribed data is compared with the e(cid:11)ect of transcribed speech. Using the described methods, we found that the untranscribed data gives roughly one third of the improvement of the transcribed material.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用电视广播对语音识别器进行无监督训练

当前的语音识别系统需要大量的转录数据进行参数估计。然而，抄写既繁琐又昂贵。在这项工作中，我们描述了我们的实验，旨在训练一个没有转录的语音识别器。实验是用电视新闻节目进行的，这些节目是用卫星接收器和一个简单的MPEG编码硬件录制的。新闻广播被自动分割成具有相似声学背景条件的片段。这种材料价格低廉，可以大量获得，但没有可用的转录本。我们开发了一个训练方案，其中识别器使用很少的转录数据进行引导，并使用新的，未转录的语音进行改进。我们表明，在使用识别器的初始转录之前，有必要使用一个可信度(cid:12)度量来判断它们。如果在可用数据更多的情况下增加系统中的参数数量，则可以实现更高的改进。我们发现，基于假设的MLLR适应不能补偿无监督训练的收益(cid:12)和收益(cid:11)效应。在(cid:12)nal实验中，将未转录数据的e(cid:11)ect与转录语音的e(cid:11)ect进行了比较。使用所描述的方法，我们发现未转录的数据提供了大约三分之一的转录材料的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

5th International Conference on Spoken Language Processing (ICSLP 1998)

自引率

0.00%

发文量

期刊最新文献

Assimilation of place in Japanese and dutch Articulatory analysis using a codebook for articulatory based low bit-rate speech coding Phonetic and phonological characteristics of paralinguistic information in spoken Japanese HMM-based visual speech recognition using intensity and location normalization Speech recognition via phonetically featured syllables