用于鲁棒噪声语音识别的新型信道估计

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Speech and Language Pub Date : 2024-06-01 Epub Date: 2023-12-16 DOI:10.1016/j.csl.2023.101598

Geoffroy Vanderreydt, Kris Demuynck

{"title":"用于鲁棒噪声语音识别的新型信道估计","authors":"Geoffroy Vanderreydt, Kris Demuynck","doi":"10.1016/j.csl.2023.101598","DOIUrl":null,"url":null,"abstract":"<div><p>We propose a novel technique to estimate the channel characteristics for robust speech recognition<span>. The method focuses on reliable time–frequency speech patches which are highly independent of the noise condition. Combined with a root-based approximation<span> of the logarithm in the MFCC computation, this reduces the variance caused by the noise on the spectral features<span>, and therefore also the constrain on the acoustic model in a multi-style training setup. We show that compared to the standard mean normalization, the proposed method estimates the channel equally well under clean conditions and better under noisy conditions. When integrated in the feature extraction pipeline, we show improvements in speech recognition accuracy on noisy speech and a status quo on clean speech. Our experiments reveal that this method helps the most for generative models that need to model the complex noise variability, and less so for discriminative models, which can learn to ignore noise instead of accurately modeling it. Our approach outperforms the state of the art on the noisy Aurora4 task.</span></span></span></p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"86 ","pages":"Article 101598"},"PeriodicalIF":3.4000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel channel estimate for noise robust speech recognition\",\"authors\":\"Geoffroy Vanderreydt, Kris Demuynck\",\"doi\":\"10.1016/j.csl.2023.101598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We propose a novel technique to estimate the channel characteristics for robust speech recognition<span>. The method focuses on reliable time–frequency speech patches which are highly independent of the noise condition. Combined with a root-based approximation<span> of the logarithm in the MFCC computation, this reduces the variance caused by the noise on the spectral features<span>, and therefore also the constrain on the acoustic model in a multi-style training setup. We show that compared to the standard mean normalization, the proposed method estimates the channel equally well under clean conditions and better under noisy conditions. When integrated in the feature extraction pipeline, we show improvements in speech recognition accuracy on noisy speech and a status quo on clean speech. Our experiments reveal that this method helps the most for generative models that need to model the complex noise variability, and less so for discriminative models, which can learn to ignore noise instead of accurately modeling it. Our approach outperforms the state of the art on the noisy Aurora4 task.</span></span></span></p></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"86 \",\"pages\":\"Article 101598\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230823001171\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/12/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230823001171","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/16 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种估算信道特征的新技术，以实现鲁棒语音识别。该方法侧重于与噪声条件高度无关的可靠时频语音片段。结合 MFCC 计算中基于根的对数近似，这就减少了噪声对频谱特征造成的方差，从而也减少了多风格训练设置中对声学模型的限制。我们的研究表明，与标准平均值归一化方法相比，所提出的方法在干净条件下对信道的估计效果相当好，而在噪声条件下则更好。当集成到特征提取管道中时，我们发现噪声语音的语音识别准确率有所提高，而干净语音的识别准确率则维持现状。我们的实验表明，这种方法对生成模型的帮助最大，因为生成模型需要对复杂的噪声变化进行建模，而对判别模型的帮助较小，因为判别模型可以学习忽略噪声，而不是对其进行精确建模。在噪声 Aurora4 任务中，我们的方法优于现有技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A novel channel estimate for noise robust speech recognition

We propose a novel technique to estimate the channel characteristics for robust speech recognition. The method focuses on reliable time–frequency speech patches which are highly independent of the noise condition. Combined with a root-based approximation of the logarithm in the MFCC computation, this reduces the variance caused by the noise on the spectral features, and therefore also the constrain on the acoustic model in a multi-style training setup. We show that compared to the standard mean normalization, the proposed method estimates the channel equally well under clean conditions and better under noisy conditions. When integrated in the feature extraction pipeline, we show improvements in speech recognition accuracy on noisy speech and a status quo on clean speech. Our experiments reveal that this method helps the most for generative models that need to model the complex noise variability, and less so for discriminative models, which can learn to ignore noise instead of accurately modeling it. Our approach outperforms the state of the art on the noisy Aurora4 task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.