{"title":"鲁棒语音识别的子带调制频谱补偿","authors":"Wen-hsiang Tu, Sheng-Yuan Huang, J. Hung","doi":"10.1109/ASRU.2009.5373506","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel scheme in performing feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first converted into the modulation spectral domain. The magnitude part of the modulation spectrum is decomposed into non-uniform sub-band segments, and then each sub-band segment is individually processed by the well-known normalization methods, like mean normalization (MN), mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all the modified sub-band magnitude spectral segments and the original phase spectrum using the inverse DFT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately. For the Aurora-2 clean-condition training task, the new proposed sub-band spectral MVN and HEQ provide relative error rate reductions of 18.66% and 23.58% over the conventional temporal MVN and HEQ, respectively.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Sub-band modulation spectrum compensation for robust speech recognition\",\"authors\":\"Wen-hsiang Tu, Sheng-Yuan Huang, J. Hung\",\"doi\":\"10.1109/ASRU.2009.5373506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a novel scheme in performing feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first converted into the modulation spectral domain. The magnitude part of the modulation spectrum is decomposed into non-uniform sub-band segments, and then each sub-band segment is individually processed by the well-known normalization methods, like mean normalization (MN), mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all the modified sub-band magnitude spectral segments and the original phase spectrum using the inverse DFT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately. For the Aurora-2 clean-condition training task, the new proposed sub-band spectral MVN and HEQ provide relative error rate reductions of 18.66% and 23.58% over the conventional temporal MVN and HEQ, respectively.\",\"PeriodicalId\":292194,\"journal\":{\"name\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2009.5373506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sub-band modulation spectrum compensation for robust speech recognition
This paper proposes a novel scheme in performing feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first converted into the modulation spectral domain. The magnitude part of the modulation spectrum is decomposed into non-uniform sub-band segments, and then each sub-band segment is individually processed by the well-known normalization methods, like mean normalization (MN), mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all the modified sub-band magnitude spectral segments and the original phase spectrum using the inverse DFT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately. For the Aurora-2 clean-condition training task, the new proposed sub-band spectral MVN and HEQ provide relative error rate reductions of 18.66% and 23.58% over the conventional temporal MVN and HEQ, respectively.