回放录音语音语料库对语音识别精度的影响

A. Suchato, S. Chanjaradwichai, N. Kertkeidkachorn, S. Vorapatratorn, P. Hirankan, T. Suri, K. Likitsupin, S. Chuetanapinyo, P. Punyabukkana
{"title":"回放录音语音语料库对语音识别精度的影响","authors":"A. Suchato, S. Chanjaradwichai, N. Kertkeidkachorn, S. Vorapatratorn, P. Hirankan, T. Suri, K. Likitsupin, S. Chuetanapinyo, P. Punyabukkana","doi":"10.1109/ECTICON.2012.6254211","DOIUrl":null,"url":null,"abstract":"Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limited-vocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.","PeriodicalId":6319,"journal":{"name":"2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology","volume":"30 1","pages":"1-4"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus\",\"authors\":\"A. Suchato, S. Chanjaradwichai, N. Kertkeidkachorn, S. Vorapatratorn, P. Hirankan, T. Suri, K. Likitsupin, S. Chuetanapinyo, P. Punyabukkana\",\"doi\":\"10.1109/ECTICON.2012.6254211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limited-vocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.\",\"PeriodicalId\":6319,\"journal\":{\"name\":\"2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology\",\"volume\":\"30 1\",\"pages\":\"1-4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECTICON.2012.6254211\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECTICON.2012.6254211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

现代语音识别技术依靠大量声学特征与操作环境相匹配的语音数据来训练声学模型。从播放录音语音的扬声器中收集训练数据比从人类说话者那里收集训练数据要实际得多。本文介绍了语音识别实验的结果,为扬声器重新录制的话语所引起的影响提供了实际的见解。使用两种不同的麦克风建立了60个人类说话者的干净语音语料库,并重新录制了他们的回放。结果表明,即使在训练和测试数据之间没有不匹配的情况下,在最小的词法约束下,回放训练系统的准确率也会下降。但是,不匹配不会影响具有更严格的高级约束的情况,例如数字和有限词汇表的单词识别。介绍了一种减少从回放中构造语料库引起的不匹配的方法。结果表明,该程序使经过回放训练的系统的准确率与在匹配环境中使用语音训练的系统的准确率接近48%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus
Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limited-vocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Power efficient output stages for high density implantable stimulators — Review and outlook Electrical characteristics of photodetector with transparent contact Time base distance estimation model for localization in wireless sensor network WiFi electronic nose for indoor air monitoring The effects of temperature and device demension of MOSFETs on the DC characteristics of CMOS inverter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1