{"title":"使用STFT域语音失真模型直接在ASR特征域中计算MMSE估计和残差不确定性","authors":"Ramón Fernández Astudillo, R. Orglmeister","doi":"10.1109/TASL.2013.2244085","DOIUrl":null,"url":null,"abstract":"In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2244085","citationCount":"29","resultStr":"{\"title\":\"Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models\",\"authors\":\"Ramón Fernández Astudillo, R. Orglmeister\",\"doi\":\"10.1109/TASL.2013.2244085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.\",\"PeriodicalId\":55014,\"journal\":{\"name\":\"IEEE Transactions on Audio Speech and Language Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TASL.2013.2244085\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Audio Speech and Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TASL.2013.2244085\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2244085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models
In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.
期刊介绍:
The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.