基于自动语音选择的儿童语音识别的DNN自适应

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846331

M. Matassoni, D. Falavigna, D. Giuliani

{"title":"基于自动语音选择的儿童语音识别的DNN自适应","authors":"M. Matassoni, D. Falavigna, D. Giuliani","doi":"10.1109/SLT.2016.7846331","DOIUrl":null,"url":null,"abstract":"This paper describes an approach for adapting a DNN trained on adult speech to children voices. The method extends a previous one, based on the Kullback-Leibler divergence between the original (adult) DNN output distribution and the target one, by accounting for the quality of the supervision of the adaptation utterances. In addition, starting from the observation that by gradually removing from the adaptation set the sentences with higher WERs significant performance improvements can be achieved, we also investigate the usage of automatic selection of adaptation utterances. For determining transcription quality we investigate the use of confidence estimates of recognized hypotheses. We present experiments and related results achieved on an Italian data set of children's speech. We show that the proposed DNN adaptation approach allows to significantly reduce the WER on a given test set from 14.2% (corresponding to using the non adapted DNN, trained on adult speech) to 10.6%. It is worth mentioning that the latter result has been achieved without making use of any training data specific of children's speech.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"DNN adaptation for recognition of children speech through automatic utterance selection\",\"authors\":\"M. Matassoni, D. Falavigna, D. Giuliani\",\"doi\":\"10.1109/SLT.2016.7846331\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes an approach for adapting a DNN trained on adult speech to children voices. The method extends a previous one, based on the Kullback-Leibler divergence between the original (adult) DNN output distribution and the target one, by accounting for the quality of the supervision of the adaptation utterances. In addition, starting from the observation that by gradually removing from the adaptation set the sentences with higher WERs significant performance improvements can be achieved, we also investigate the usage of automatic selection of adaptation utterances. For determining transcription quality we investigate the use of confidence estimates of recognized hypotheses. We present experiments and related results achieved on an Italian data set of children's speech. We show that the proposed DNN adaptation approach allows to significantly reduce the WER on a given test set from 14.2% (corresponding to using the non adapted DNN, trained on adult speech) to 10.6%. It is worth mentioning that the latter result has been achieved without making use of any training data specific of children's speech.\",\"PeriodicalId\":281635,\"journal\":{\"name\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2016.7846331\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

本文描述了一种将经过成人语言训练的深度神经网络应用于儿童语音的方法。该方法基于原始(成人)DNN输出分布与目标DNN输出分布之间的Kullback-Leibler分歧，通过考虑对自适应话语的监督质量，扩展了之前的方法。此外，我们还从逐步从适应集合中移除具有更高wer的句子可以显著提高性能的观察出发，研究了自动选择适应话语的使用情况。为了确定转录质量，我们研究了对公认假设的置信度估计的使用。本文介绍了在意大利语儿童语言数据集上的实验和相关结果。我们表明，提出的深度神经网络自适应方法可以将给定测试集上的WER从14.2%(对应于使用未经自适应的深度神经网络，对成人语音进行训练)显著降低到10.6%。值得一提的是，后一种结果是在没有使用任何针对儿童言语的训练数据的情况下取得的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DNN adaptation for recognition of children speech through automatic utterance selection

This paper describes an approach for adapting a DNN trained on adult speech to children voices. The method extends a previous one, based on the Kullback-Leibler divergence between the original (adult) DNN output distribution and the target one, by accounting for the quality of the supervision of the adaptation utterances. In addition, starting from the observation that by gradually removing from the adaptation set the sentences with higher WERs significant performance improvements can be achieved, we also investigate the usage of automatic selection of adaptation utterances. For determining transcription quality we investigate the use of confidence estimates of recognized hypotheses. We present experiments and related results achieved on an Italian data set of children's speech. We show that the proposed DNN adaptation approach allows to significantly reduce the WER on a given test set from 14.2% (corresponding to using the non adapted DNN, trained on adult speech) to 10.6%. It is worth mentioning that the latter result has been achieved without making use of any training data specific of children's speech.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量