{"title":"The optimization of PLP feature extraction for LVCSR recognition of MP3 data","authors":"M. Borský, P. Pollák","doi":"10.1109/AE.2014.7011667","DOIUrl":null,"url":null,"abstract":"This paper analyses the contribution of optimized PLP feature extraction setup and application of feature normalization to improve the performance of automatic speech recognition system for data compressed by MP3 algorithm. The experimental study performed on loop-digit recognition and large vocabulary continues speech recognition task showed that proper setup can negate the effect of lower compression rates which can achieve results comparable with higher rates. The second finding is that the normalization techniques contribute significantly to overall performance, especially for shorter windows/shifts and lower compression rates. The acoustic models trained on 160kbits/s, 32kbits/s and 16kbits/s data performed at 34.17%, 41.88% and 36.4% WER respectively on LVCSR task. In comparison the non-compressed acoustic models performed at 28.56% WER.","PeriodicalId":149779,"journal":{"name":"2014 International Conference on Applied Electronics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Applied Electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AE.2014.7011667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper analyses the contribution of optimized PLP feature extraction setup and application of feature normalization to improve the performance of automatic speech recognition system for data compressed by MP3 algorithm. The experimental study performed on loop-digit recognition and large vocabulary continues speech recognition task showed that proper setup can negate the effect of lower compression rates which can achieve results comparable with higher rates. The second finding is that the normalization techniques contribute significantly to overall performance, especially for shorter windows/shifts and lower compression rates. The acoustic models trained on 160kbits/s, 32kbits/s and 16kbits/s data performed at 34.17%, 41.88% and 36.4% WER respectively on LVCSR task. In comparison the non-compressed acoustic models performed at 28.56% WER.