Improvements to filterbank and delta learning within a deep neural network framework

Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, G. Saon, B. Ramabhadran
{"title":"Improvements to filterbank and delta learning within a deep neural network framework","authors":"Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, G. Saon, B. Ramabhadran","doi":"10.1109/ICASSP.2014.6854925","DOIUrl":null,"url":null,"abstract":"Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 1","pages":"6839-6843"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2014.6854925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在深度神经网络框架下对滤波器组和增量学习的改进
语音识别任务中使用的许多特征都是手工制作的,并不总是与手边的目标相关,即最小化单词错误率。最近,我们表明,用与深度神经网络的其余部分共同学习的滤波器库层取代感知激励的mel-filter bank是有希望的。在本文中,我们将滤波器学习扩展到一个自适应的、最先进的系统。首先,我们将增量学习合并到过滤器学习框架中。其次,我们结合了各种说话人自适应技术,包括VTLN翘曲和说话人身份特征。在一个50小时的英语广播新闻任务中,我们表明,与使用一组固定的过滤器和delta相比,使用过滤器和delta学习可以在单词错误率(WER)方面实现5%的相对改进。此外,在演讲者适应之后,我们发现与最先进的CNN相比,过滤器和delta学习允许在WER中相对提高3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multichannel detection of an unknown rank-one signal with uncalibrated receivers Design and implementation of a low power spike detection processor for 128-channel spike sorting microsystem On the convergence of average consensus with generalized metropolis-hasting weights A network of HF surface wave radars for maritime surveillance: Preliminary results in the German Bight Mobile real-time arousal detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1