Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-05-01 DOI:10.1109/TASL.2013.2244085

Ramón Fernández Astudillo, R. Orglmeister

{"title":"Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models","authors":"Ramón Fernández Astudillo, R. Orglmeister","doi":"10.1109/TASL.2013.2244085","DOIUrl":null,"url":null,"abstract":"In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1023-1034"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2244085","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2244085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

Abstract

In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用STFT域语音失真模型直接在ASR特征域中计算MMSE估计和残差不确定性

在本文中，我们展示了不确定性传播如何允许使用短时傅里叶变换(STFT)域失真模型的各种特征提取方法在特征域中计算最小均方误差(MMSE)估计。除此之外，还获得了一个估计可靠性的度量，该度量允许自动语音识别(ASR)模型的特征重新估计或动态补偿。该方法通过使用STFT不确定性传播公式进行特征提取，将后验分布转换为维纳滤波器。本文还证明了STFT域中的非线性估计器，如Ephraim-Malah滤波器，可以看作是Wiener后值传播的特殊情况。通过开发两个mmse - mel频率倒谱系数(MFCC)估计器并将其与观测不确定度技术相结合来说明该方法。我们讨论了与其他MMSE- mfcc估计器的相似之处，并展示了该方法如何在AURORA4鲁棒ASR任务上优于STFT域的传统MMSE估计器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.