Virender Kadyan, Shashi Bala, Puneet Bawa, Mohit Mittal
{"title":"Developing in-vehicular noise robust children ASR system using Tandem-NN-based acoustic modelling","authors":"Virender Kadyan, Shashi Bala, Puneet Bawa, Mohit Mittal","doi":"10.1504/ijvas.2020.10039663","DOIUrl":null,"url":null,"abstract":"Processing of children's speech is always challenging due to data scarcity and inefficient modelling input feature vectors. Accuracy of the modelling phase is always dependent upon extracted input features. In this paper, posterior probabilities are estimated over a phone set using first discriminatively trained model through neural-net pre-processor. This Neural Network (NN) classifier is first trained on original speech and then context-independent phone posterior probabilities are estimated on Tandem-NN system. The output vectors are employed as default features which are processed on Deep Neural Network-Hidden Markov Model (DNN-HMM) models. The original data-based system performance is improved by extending it using data augmentation. To see the robustness of the augmented speech various in-vehicle data are investigated and found that it is superior to that of other systems. Finally, we combine all augmented data to overcome data scarcity challenges to enhance system performance. It gives a relative improvement of 23.77% over the baseline system.","PeriodicalId":39322,"journal":{"name":"International Journal of Vehicle Autonomous Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Vehicle Autonomous Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijvas.2020.10039663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 1
Abstract
Processing of children's speech is always challenging due to data scarcity and inefficient modelling input feature vectors. Accuracy of the modelling phase is always dependent upon extracted input features. In this paper, posterior probabilities are estimated over a phone set using first discriminatively trained model through neural-net pre-processor. This Neural Network (NN) classifier is first trained on original speech and then context-independent phone posterior probabilities are estimated on Tandem-NN system. The output vectors are employed as default features which are processed on Deep Neural Network-Hidden Markov Model (DNN-HMM) models. The original data-based system performance is improved by extending it using data augmentation. To see the robustness of the augmented speech various in-vehicle data are investigated and found that it is superior to that of other systems. Finally, we combine all augmented data to overcome data scarcity challenges to enhance system performance. It gives a relative improvement of 23.77% over the baseline system.