Compromised feature normalization method for deep neural network based speech recognition*

M. Kim, H. S. Kim
{"title":"Compromised feature normalization method for deep neural network\n based speech recognition*","authors":"M. Kim, H. S. Kim","doi":"10.13064/ksss.2020.12.3.065","DOIUrl":null,"url":null,"abstract":"Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Phonetics and Speech Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13064/ksss.2020.12.3.065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于深度神经网络的语音识别折衷特征归一化方法*
特征归一化是通过对声学特征参数的统计特征进行归一化,减少训练条件和测试条件之间环境不匹配的影响。在传统的基于高斯混合模型-隐马尔可夫模型(GMM-HMM)的语音识别系统的基础上,得到了显著的性能提升。然而,在基于深度神经网络(DNN)的语音识别系统中,将环境不匹配的影响最小化并不一定会导致最佳的性能提高。在本文中,我们将这种现象的原因归结为过度特征归一化导致的信息丢失。我们研究是否存在一种特征归一化方法,通过适当减少环境不匹配的影响来最大化语音识别性能,同时为训练声学模型保留有用的信息。为此,我们引入均值和指数方差归一化(MEVN),这是均值归一化(MN)和均值和方差归一化(MVN)之间的折衷,并根据方差归一化程度比较了基于dnn的语音识别系统在噪声和混响环境下的性能。实验结果表明,根据方差归一化的程度,MEVN比MN和MVN的性能有轻微的提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tube phonation in water for patients with hyperfunctional voice disorders: The effect of tube diameter and water immersion depth on bubble height and maximum phonation time* Digital enhancement of pronunciation assessment: Automated speech recognition and human raters* Patterns of categorical perception and response times in the matrix scope interpretation of embedded wh-phrases in Gyeongsang Korean Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder* Transition of vowel harmony in Korean verbal conjugation: Patterns of variation in a spoken corpus
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1