Discriminative training based on an integrated view of MPE and MMI in margin and error space

E. McDermott, Shinji Watanabe, Atsushi Nakamura
{"title":"Discriminative training based on an integrated view of MPE and MMI in margin and error space","authors":"E. McDermott, Shinji Watanabe, Atsushi Nakamura","doi":"10.1109/ICASSP.2010.5495106","DOIUrl":null,"url":null,"abstract":"Recent work has demonstrated that the Maximum Mutual Information (MMI) objective function is mathematically equivalent to a simple integral of recognition error, if the latter is expressed as a margin-based Minimum Phone Error (MPE) style error-weighted objective function. This led to the proposal of a general approach to discriminative training based on integrals of MPE-style loss, calculated using “differenced MMI” (dMMI), a finite difference of MMI functionals evaluated at the edges of a margin interval. This article aims to clarify the essence and practical consequences of the new framework. The recently proposed Error-Indexed Forward-Backward Algorithm is used to visualize the close agreement between dMMI and MPE statistics for narrow margin intervals, and to illustrate the flexible control of the weight that can be given to different error levels using broader intervals. New speech recognition results are presented for the MIT OpenCourseWare/MIT-World corpus, showing small performance gains for dMMI compared to MPE for some choices of margin interval. Evaluation with an expanded 44K word trigram language model confirms that dMMI with a narrow margin interval yields the same performance as MPE.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2010.5495106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44

Abstract

Recent work has demonstrated that the Maximum Mutual Information (MMI) objective function is mathematically equivalent to a simple integral of recognition error, if the latter is expressed as a margin-based Minimum Phone Error (MPE) style error-weighted objective function. This led to the proposal of a general approach to discriminative training based on integrals of MPE-style loss, calculated using “differenced MMI” (dMMI), a finite difference of MMI functionals evaluated at the edges of a margin interval. This article aims to clarify the essence and practical consequences of the new framework. The recently proposed Error-Indexed Forward-Backward Algorithm is used to visualize the close agreement between dMMI and MPE statistics for narrow margin intervals, and to illustrate the flexible control of the weight that can be given to different error levels using broader intervals. New speech recognition results are presented for the MIT OpenCourseWare/MIT-World corpus, showing small performance gains for dMMI compared to MPE for some choices of margin interval. Evaluation with an expanded 44K word trigram language model confirms that dMMI with a narrow margin interval yields the same performance as MPE.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于边际和误差空间中MPE和MMI综合观点的判别训练
最近的研究表明,最大互信息(MMI)目标函数在数学上相当于识别误差的简单积分,如果后者被表示为基于边际的最小电话误差(MPE)风格的误差加权目标函数。这导致了一种基于mpe型损失积分的判别训练的一般方法的提出,该方法使用“差分MMI”(dMMI)计算,在边缘区间的边缘评估MMI函数的有限差分。本文旨在阐明新框架的本质和实际后果。最近提出的误差索引前向后向算法用于可视化dMMI和MPE统计量之间的紧密一致性,并说明在更宽的间隔下可以灵活地控制不同误差级别的权重。在麻省理工学院开放课程/麻省理工学院世界语料库中提出了新的语音识别结果,显示dMMI与MPE相比,在一些边际区间的选择上性能提高很小。使用扩展的44K单词三元组语言模型进行评估,证实具有窄边界间隔的dMMI产生与MPE相同的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Interactive tone mapping for High Dynamic Range video Search error risk minimization in Viterbi beam search for speech recognition Predicting interruptions in dyadic spoken interactions Simple methods for improving speaker-similarity of HMM-based speech synthesis Model-based dereverberation in the logmelspec domain for robust distant-talking speech recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1