Robust recognition of children's speech

A. Potamianos, Shrikanth S. Narayanan
{"title":"Robust recognition of children's speech","authors":"A. Potamianos, Shrikanth S. Narayanan","doi":"10.1109/TSA.2003.818026","DOIUrl":null,"url":null,"abstract":"Developmental changes in speech production introduce age-dependent spectral and temporal variability in the speech signal produced by children. Such variabilities pose challenges for robust automatic recognition of children's speech. Through an analysis of age-related acoustic characteristics of children's speech in the context of automatic speech recognition (ASR), effects such as frequency scaling of spectral envelope parameters are demonstrated. Recognition experiments using acoustic models trained from adult speech and tested against speech from children of various ages clearly show performance degradation with decreasing age. On average, the word error rates are two to five times worse for children speech than for adult speech. Various techniques for improving ASR performance on children's speech are reported. A speaker normalization algorithm that combines frequency warping and model transformation is shown to reduce acoustic variability and significantly improve ASR performance for children speakers (by 25-45% under various model training and testing conditions). The use of age-dependent acoustic models further reduces word error rate by 10%. The potential of using piece-wise linear and phoneme-dependent frequency warping algorithms for reducing the variability in the acoustic feature space of children is also investigated.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"7 1","pages":"603-616"},"PeriodicalIF":0.0000,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"213","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.818026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 213

Abstract

Developmental changes in speech production introduce age-dependent spectral and temporal variability in the speech signal produced by children. Such variabilities pose challenges for robust automatic recognition of children's speech. Through an analysis of age-related acoustic characteristics of children's speech in the context of automatic speech recognition (ASR), effects such as frequency scaling of spectral envelope parameters are demonstrated. Recognition experiments using acoustic models trained from adult speech and tested against speech from children of various ages clearly show performance degradation with decreasing age. On average, the word error rates are two to five times worse for children speech than for adult speech. Various techniques for improving ASR performance on children's speech are reported. A speaker normalization algorithm that combines frequency warping and model transformation is shown to reduce acoustic variability and significantly improve ASR performance for children speakers (by 25-45% under various model training and testing conditions). The use of age-dependent acoustic models further reduces word error rate by 10%. The potential of using piece-wise linear and phoneme-dependent frequency warping algorithms for reducing the variability in the acoustic feature space of children is also investigated.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对儿童言语的强大识别
语言产生的发育变化在儿童产生的语音信号中引入了年龄依赖的频谱和时间变化。这些变化给儿童语言的自动识别带来了挑战。通过对自动语音识别(ASR)环境下儿童语音的年龄相关声学特征的分析,论证了频谱包络参数的频率缩放等效应。使用成人语音训练的声学模型进行识别实验,并对不同年龄的儿童的语音进行测试,结果清楚地表明,随着年龄的下降,识别能力会下降。平均而言,儿童语言的单词错误率是成人语言的2到5倍。本文报道了改善儿童ASR表现的各种技术。结合频率扭曲和模型变换的说话人归一化算法可以降低声音变异性,显著提高儿童说话人的ASR性能(在各种模型训练和测试条件下提高25-45%)。使用与年龄相关的声学模型进一步将单词错误率降低了10%。使用分段线性和音素相关的频率扭曲算法减少儿童声学特征空间的可变性的潜力也进行了研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Errata to "Using Steady-State Suppression to Improve Speech Intelligibility in Reverberant Environments for Elderly Listeners" Farewell Editorial Inaugural Editorial: Riding the Tidal Wave of Human-Centric Information Processing - Innovate, Outreach, Collaborate, Connect, Expand, and Win Three-Dimensional Sound Field Reproduction Using Multiple Circular Loudspeaker Arrays Introduction to the Special Issue on Processing Reverberant Speech: Methodologies and Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1