Detection-based accented speech recognition using articulatory features

Chao Zhang, Yi Liu, Chin-Hui Lee
{"title":"Detection-based accented speech recognition using articulatory features","authors":"Chao Zhang, Yi Liu, Chin-Hui Lee","doi":"10.1109/ASRU.2011.6163982","DOIUrl":null,"url":null,"abstract":"We propose an attribute-based approach to accented speech recognition based on automatic speech attribute transcription with high efficiency detection of articulatory features. In order to utilize appropriate and extensible phonetic and linguistic knowledge, conditional random field (CRF) is designed to take frame-level inputs with binary feature functions. The use of CRF with merely the state features to generate probabilistic phone lattices is then utilized to solve the phone under-generation problem. Finally an attribute discrimination module is incorporated to handle a diversity of accent changes without retraining any model, leading to flexible “plug ‘n’ play” modular design. The effectiveness of the proposed approach is evaluated on three typical Chinese accents, namely Guanhua, Yue and Wu. Our method yields a significant absolute phone recognition accuracy improvement 5.04%, 4.68% and 6.06% for the corresponding three accent types over a conventional monophone HMM system. Compared to a context-dependent triphone HMM system, we achieve comparable phone accuracies at only less than 20% of the computation cost. In addition, our proposed method is equally applicable to speaker-independent systems handling multiple accents.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

We propose an attribute-based approach to accented speech recognition based on automatic speech attribute transcription with high efficiency detection of articulatory features. In order to utilize appropriate and extensible phonetic and linguistic knowledge, conditional random field (CRF) is designed to take frame-level inputs with binary feature functions. The use of CRF with merely the state features to generate probabilistic phone lattices is then utilized to solve the phone under-generation problem. Finally an attribute discrimination module is incorporated to handle a diversity of accent changes without retraining any model, leading to flexible “plug ‘n’ play” modular design. The effectiveness of the proposed approach is evaluated on three typical Chinese accents, namely Guanhua, Yue and Wu. Our method yields a significant absolute phone recognition accuracy improvement 5.04%, 4.68% and 6.06% for the corresponding three accent types over a conventional monophone HMM system. Compared to a context-dependent triphone HMM system, we achieve comparable phone accuracies at only less than 20% of the computation cost. In addition, our proposed method is equally applicable to speaker-independent systems handling multiple accents.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于检测的基于发音特征的重音语音识别
本文提出了一种基于属性的重音语音识别方法,该方法基于语音属性自动转录,能够高效地检测语音发音特征。为了利用适当的和可扩展的语音和语言知识,条件随机场(CRF)被设计为具有二进制特征函数的帧级输入。然后利用仅带状态特征的CRF生成概率电话格来解决电话欠生成问题。最后,引入属性识别模块来处理多种口音变化,而无需重新训练任何模型,从而实现灵活的“即插即用”模块化设计。本文以关华、越、吴三种典型的汉语口音为例,对该方法的有效性进行了评价。与传统的单声道HMM系统相比,我们的方法在对应的三种口音类型上的绝对电话识别准确率提高了5.04%,4.68%和6.06%。与上下文相关的三联音HMM系统相比,我们以不到20%的计算成本实现了类似的电话精度。此外,我们提出的方法同样适用于处理多个口音的独立于说话人的系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Applying feature bagging for more accurate and robust automated speaking assessment Towards choosing better primes for spoken dialog systems Accent level adjustment in bilingual Thai-English text-to-speech synthesis Fast speaker diarization using a high-level scripting language Evaluating prosodic features for automated scoring of non-native read speech
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1