鲁棒语音识别中时间调制传递函数的归一化

2008 Second International Symposium on Universal Communication Pub Date : 2008-12-15 DOI:10.1109/ISUC.2008.74

Xugang Lu, Shigeki Matsuda, Tohru Shimizu, Satoshi Nakamura

{"title":"鲁棒语音识别中时间调制传递函数的归一化","authors":"Xugang Lu, Shigeki Matsuda, Tohru Shimizu, Satoshi Nakamura","doi":"10.1109/ISUC.2008.74","DOIUrl":null,"url":null,"abstract":"In this paper, we proposed a robust speech feature extraction algorithm for automatic speech recognition which reduced the noise effect in the temporal modulation domain. The proposed algorithm has two steps to deal with the time series of cepstral coefficients. The first step adopted a modulation contrast normalization to normalize the temporal modulation contrast of both clean and noisy speech to be in the same range. The second step adopted an edge-preserved smoothing to attenuate the low modulation components while preserving the high modulation components (edges). We tested our algorithms on speech recognition experiments in both additive noise condition (AURORA-2J data corpus) and reverberant noise condition (convolution of clean speech utterances from AURORA-2J with a smart room impulse response signal). For comparison, the ETSI advanced front-end algorithm (AFE) is used. Our results showed that the algorithm got: (1) for additive noise, 57.26% relative word error reduction (RWER) rate for clean conditional training (59.37% for AFE), and 33.52% RWER rate for multi-conditional training (35.77% for AFE), and (2) for reverberant noise, 51.28% RWER rate (10.17% for AFE).","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"329 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Normalization on Temporal Modulation Transfer Function for Robust Speech Recognition\",\"authors\":\"Xugang Lu, Shigeki Matsuda, Tohru Shimizu, Satoshi Nakamura\",\"doi\":\"10.1109/ISUC.2008.74\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we proposed a robust speech feature extraction algorithm for automatic speech recognition which reduced the noise effect in the temporal modulation domain. The proposed algorithm has two steps to deal with the time series of cepstral coefficients. The first step adopted a modulation contrast normalization to normalize the temporal modulation contrast of both clean and noisy speech to be in the same range. The second step adopted an edge-preserved smoothing to attenuate the low modulation components while preserving the high modulation components (edges). We tested our algorithms on speech recognition experiments in both additive noise condition (AURORA-2J data corpus) and reverberant noise condition (convolution of clean speech utterances from AURORA-2J with a smart room impulse response signal). For comparison, the ETSI advanced front-end algorithm (AFE) is used. Our results showed that the algorithm got: (1) for additive noise, 57.26% relative word error reduction (RWER) rate for clean conditional training (59.37% for AFE), and 33.52% RWER rate for multi-conditional training (35.77% for AFE), and (2) for reverberant noise, 51.28% RWER rate (10.17% for AFE).\",\"PeriodicalId\":339811,\"journal\":{\"name\":\"2008 Second International Symposium on Universal Communication\",\"volume\":\"329 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Second International Symposium on Universal Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISUC.2008.74\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Second International Symposium on Universal Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISUC.2008.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种鲁棒的语音特征提取算法，用于自动语音识别，降低了时域调制域的噪声影响。该算法分两步处理倒谱系数的时间序列。第一步采用调制对比度归一化，将干净语音和含噪语音的时间调制对比度归一化到同一范围内。第二步采用边缘保持平滑来衰减低调制分量，同时保留高调制分量(边缘)。我们在加性噪声条件(AURORA-2J数据语料库)和混响噪声条件(来自AURORA-2J的干净语音与智能房间脉冲响应信号的卷积)下的语音识别实验中测试了我们的算法。为了进行比较，我们使用了ETSI高级前端算法(AFE)。结果表明:(1)对于加性噪声，干净条件训练的相对词错误率为57.26% (AFE为59.37%)，多条件训练的相对词错误率为33.52% (AFE为35.77%);(2)对于混响噪声，相对词错误率为51.28% (AFE为10.17%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Normalization on Temporal Modulation Transfer Function for Robust Speech Recognition

In this paper, we proposed a robust speech feature extraction algorithm for automatic speech recognition which reduced the noise effect in the temporal modulation domain. The proposed algorithm has two steps to deal with the time series of cepstral coefficients. The first step adopted a modulation contrast normalization to normalize the temporal modulation contrast of both clean and noisy speech to be in the same range. The second step adopted an edge-preserved smoothing to attenuate the low modulation components while preserving the high modulation components (edges). We tested our algorithms on speech recognition experiments in both additive noise condition (AURORA-2J data corpus) and reverberant noise condition (convolution of clean speech utterances from AURORA-2J with a smart room impulse response signal). For comparison, the ETSI advanced front-end algorithm (AFE) is used. Our results showed that the algorithm got: (1) for additive noise, 57.26% relative word error reduction (RWER) rate for clean conditional training (59.37% for AFE), and 33.52% RWER rate for multi-conditional training (35.77% for AFE), and (2) for reverberant noise, 51.28% RWER rate (10.17% for AFE).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 Second International Symposium on Universal Communication

自引率

0.00%

发文量

期刊最新文献

AnHitz, Development and Integration of Language, Speech and Visual Technologies for Basque Chinese NP Chunking: A Semi-Supervised Approach The UCSD/Calit2 GreenLight Project (Invited Paper) Inferring User Interests from Relevance Feedback with High Similarity Sequence Data-Driven Clustering Computer Simulation of HRTFs for Personalization of 3D Audio