{"title":"A Study on Improving Acoustic Model for Robust and Far-Field Speech Recognition","authors":"Shaofei Xue, Zhijie Yan, Tao Yu, Zhang Liu","doi":"10.1109/ICDSP.2018.8631862","DOIUrl":null,"url":null,"abstract":"Far-field speech recognition is an essential technique for man-machine interactions. It aims to enable smart devices to recognize distant human speech. This technology is applied to many scenarios such as smart home appliances (smart loudspeaker, smart TV) and meeting transcription. Despite the significant advancement made in robust and far-field speech recognition after the introduction of deep neural network based acoustic models, the far-field speech recognition remains a challenging task due to various factors such as background noise, reverberation and even human voice interference. In this paper, we describe several technical advances for improving the performance of large-scale far-field speech recognition, including simulated data generation, improvements on front-end modules and neural network based acoustic models. Experimental results on several Mandarin Chinese speech recognition tasks have demonstrated that the combination of these technical advances can significantly outperform the conventional models.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2018.8631862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Far-field speech recognition is an essential technique for man-machine interactions. It aims to enable smart devices to recognize distant human speech. This technology is applied to many scenarios such as smart home appliances (smart loudspeaker, smart TV) and meeting transcription. Despite the significant advancement made in robust and far-field speech recognition after the introduction of deep neural network based acoustic models, the far-field speech recognition remains a challenging task due to various factors such as background noise, reverberation and even human voice interference. In this paper, we describe several technical advances for improving the performance of large-scale far-field speech recognition, including simulated data generation, improvements on front-end modules and neural network based acoustic models. Experimental results on several Mandarin Chinese speech recognition tasks have demonstrated that the combination of these technical advances can significantly outperform the conventional models.