基于dnn的成人语音与非母语儿童语音自动识别方法研究

The ... Workshop on Child, Computer and Interaction Pub Date : 2016-09-06 DOI:10.21437/WOCCI.2016-7

Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft

{"title":"基于dnn的成人语音与非母语儿童语音自动识别方法研究","authors":"Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft","doi":"10.21437/WOCCI.2016-7","DOIUrl":null,"url":null,"abstract":"Acoustic models for state-of-the-art DNN-based speech recognition systems are typically trained using at least several hundred hours of task-specific training data. However, this amount of training data is not always available for some applications. In this paper, we investigate how to use an adult speech corpus to improve DNN-based automatic speech recognition for non-native children's speech. Although there are many acoustic and linguistic mismatches between the speech of adults and children, adult speech can still be used to boost the performance of a speech recognizer for children using acoustic modeling techniques based on the DNN framework. The experimental results show that the best recognition performance can be achieved by combining children's training data with adult training data of approximately the same size and initializing the DNN with the weights obtained by pre-training using the full training set of the adult corpus. This system can outperform the baseline system trained on only children's speech with an overall relative WER reduction of 11.9%. Among the three speaking tasks studied, the picture narration task shows the largest gain with a WER reduction from 24.6 % to 20.1%.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":"33 1","pages":"40-44"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Improving DNN-Based Automatic Recognition of Non-native Children Speech with Adult Speech\",\"authors\":\"Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft\",\"doi\":\"10.21437/WOCCI.2016-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Acoustic models for state-of-the-art DNN-based speech recognition systems are typically trained using at least several hundred hours of task-specific training data. However, this amount of training data is not always available for some applications. In this paper, we investigate how to use an adult speech corpus to improve DNN-based automatic speech recognition for non-native children's speech. Although there are many acoustic and linguistic mismatches between the speech of adults and children, adult speech can still be used to boost the performance of a speech recognizer for children using acoustic modeling techniques based on the DNN framework. The experimental results show that the best recognition performance can be achieved by combining children's training data with adult training data of approximately the same size and initializing the DNN with the weights obtained by pre-training using the full training set of the adult corpus. This system can outperform the baseline system trained on only children's speech with an overall relative WER reduction of 11.9%. Among the three speaking tasks studied, the picture narration task shows the largest gain with a WER reduction from 24.6 % to 20.1%.\",\"PeriodicalId\":91973,\"journal\":{\"name\":\"The ... Workshop on Child, Computer and Interaction\",\"volume\":\"33 1\",\"pages\":\"40-44\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The ... Workshop on Child, Computer and Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/WOCCI.2016-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The ... Workshop on Child, Computer and Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/WOCCI.2016-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

最先进的基于dnn的语音识别系统的声学模型通常使用至少数百小时的特定任务训练数据进行训练。然而，对于某些应用程序，这种数量的训练数据并不总是可用的。在本文中，我们研究了如何使用成人语音语料库来改进基于dnn的非母语儿童语音自动识别。尽管成人和儿童的语音之间存在许多声学和语言不匹配，但使用基于深度神经网络框架的声学建模技术，成人语音仍然可以用来提高儿童语音识别器的性能。实验结果表明，将儿童训练数据与大小大致相同的成人训练数据相结合，使用成人语料库的完整训练集预训练得到的权值初始化DNN，可以获得最佳的识别性能。该系统可以比只训练儿童语言的基线系统表现得更好，总体相对WER降低了11.9%。在研究的三个口语任务中，图片叙述任务的增益最大，WER从24.6%下降到20.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving DNN-Based Automatic Recognition of Non-native Children Speech with Adult Speech

Acoustic models for state-of-the-art DNN-based speech recognition systems are typically trained using at least several hundred hours of task-specific training data. However, this amount of training data is not always available for some applications. In this paper, we investigate how to use an adult speech corpus to improve DNN-based automatic speech recognition for non-native children's speech. Although there are many acoustic and linguistic mismatches between the speech of adults and children, adult speech can still be used to boost the performance of a speech recognizer for children using acoustic modeling techniques based on the DNN framework. The experimental results show that the best recognition performance can be achieved by combining children's training data with adult training data of approximately the same size and initializing the DNN with the weights obtained by pre-training using the full training set of the adult corpus. This system can outperform the baseline system trained on only children's speech with an overall relative WER reduction of 11.9%. Among the three speaking tasks studied, the picture narration task shows the largest gain with a WER reduction from 24.6 % to 20.1%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The ... Workshop on Child, Computer and Interaction

自引率

0.00%

发文量