{"title":"噪声模型转移:抗非平稳噪声鲁棒性的新方法","authors":"Takuya Yoshioka, T. Nakatani","doi":"10.1109/TASL.2013.2272513","DOIUrl":null,"url":null,"abstract":"This paper proposes an approach, called noise model transfer (NMT), for estimating the rapidly changing parameter values of a feature-domain noise model, which can be used to enhance feature vectors corrupted by highly nonstationary noise. Unlike conventional methods, the proposed approach can exploit both observed feature vectors, representing spectral envelopes, and other signal properties that are usually discarded during feature extraction but that are useful for separating nonstationary noise from speech. Specifically, we assume the availability of a noise power spectrum estimator that can capture rapid changes in noise characteristics by leveraging such signal properties. NMT determines the optimal transformation from the estimated noise power spectra into the feature-domain noise model parameter values in the sense of maximum likelihood. NMT is successfully applied to meeting speech recognition, where the main noise sources are competing talkers; and reverberant speech recognition, where the late reverberation is regarded as highly nonstationary additive noise.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2272513","citationCount":"11","resultStr":"{\"title\":\"Noise Model Transfer: Novel Approach to Robustness Against Nonstationary Noise\",\"authors\":\"Takuya Yoshioka, T. Nakatani\",\"doi\":\"10.1109/TASL.2013.2272513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes an approach, called noise model transfer (NMT), for estimating the rapidly changing parameter values of a feature-domain noise model, which can be used to enhance feature vectors corrupted by highly nonstationary noise. Unlike conventional methods, the proposed approach can exploit both observed feature vectors, representing spectral envelopes, and other signal properties that are usually discarded during feature extraction but that are useful for separating nonstationary noise from speech. Specifically, we assume the availability of a noise power spectrum estimator that can capture rapid changes in noise characteristics by leveraging such signal properties. NMT determines the optimal transformation from the estimated noise power spectra into the feature-domain noise model parameter values in the sense of maximum likelihood. NMT is successfully applied to meeting speech recognition, where the main noise sources are competing talkers; and reverberant speech recognition, where the late reverberation is regarded as highly nonstationary additive noise.\",\"PeriodicalId\":55014,\"journal\":{\"name\":\"IEEE Transactions on Audio Speech and Language Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TASL.2013.2272513\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Audio Speech and Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TASL.2013.2272513\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2272513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Noise Model Transfer: Novel Approach to Robustness Against Nonstationary Noise
This paper proposes an approach, called noise model transfer (NMT), for estimating the rapidly changing parameter values of a feature-domain noise model, which can be used to enhance feature vectors corrupted by highly nonstationary noise. Unlike conventional methods, the proposed approach can exploit both observed feature vectors, representing spectral envelopes, and other signal properties that are usually discarded during feature extraction but that are useful for separating nonstationary noise from speech. Specifically, we assume the availability of a noise power spectrum estimator that can capture rapid changes in noise characteristics by leveraging such signal properties. NMT determines the optimal transformation from the estimated noise power spectra into the feature-domain noise model parameter values in the sense of maximum likelihood. NMT is successfully applied to meeting speech recognition, where the main noise sources are competing talkers; and reverberant speech recognition, where the late reverberation is regarded as highly nonstationary additive noise.
期刊介绍:
The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.