Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds

J. Köhler
{"title":"Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds","authors":"J. Köhler","doi":"10.21437/ICSLP.1996-556","DOIUrl":null,"url":null,"abstract":"The aim of the work is to exploit the acoustic-phonetic similarities between several languages. In recent work cross-language HMM-based phoneme models have been used only for bootstrapping the language-dependent models and the multi-lingual approach has been investigated only on very small speech corpora. The author introduces a statistical distance measure to determine the similarities of sounds. Further, he presents a new technique to model multi-lingual phonemes. The experiments are conducted with the OGI Multi-Language Telephone Speech Corpus for the languages American English, German and Spanish. In the first experiment phoneme recognition rates between 39.0% and 53.9% are achieved using language-dependent models. Using cross-language models yields improvement for some phonemes, but on average a degradation of recognition performance is observed. However, cross-language models speeds up the cross-language transfer and reduce the size of the phoneme inventory of multi-lingual speech recognition systems. Finally, a new method of modelling multi-lingual phonemes, which can be used for a variety of languages, is presented. This technique reduces the number of phoneme-based units in a multi-lingual speech recognition system.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"3 1","pages":"2195-2198"},"PeriodicalIF":0.0000,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"101","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings : ICSLP. International Conference on Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ICSLP.1996-556","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 101

Abstract

The aim of the work is to exploit the acoustic-phonetic similarities between several languages. In recent work cross-language HMM-based phoneme models have been used only for bootstrapping the language-dependent models and the multi-lingual approach has been investigated only on very small speech corpora. The author introduces a statistical distance measure to determine the similarities of sounds. Further, he presents a new technique to model multi-lingual phonemes. The experiments are conducted with the OGI Multi-Language Telephone Speech Corpus for the languages American English, German and Spanish. In the first experiment phoneme recognition rates between 39.0% and 53.9% are achieved using language-dependent models. Using cross-language models yields improvement for some phonemes, but on average a degradation of recognition performance is observed. However, cross-language models speeds up the cross-language transfer and reduce the size of the phoneme inventory of multi-lingual speech recognition systems. Finally, a new method of modelling multi-lingual phonemes, which can be used for a variety of languages, is presented. This technique reduces the number of phoneme-based units in a multi-lingual speech recognition system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用声音的声学-语音相似性进行多语言音素识别
这项工作的目的是利用几种语言之间的声学-语音相似性。在最近的工作中,基于跨语言hmm的音素模型仅用于引导语言依赖模型,多语言方法仅在非常小的语料库上进行了研究。作者引入了一种统计距离度量来确定声音的相似度。此外,他还提出了一种新的多语言音素建模技术。使用OGI多语言电话语音语料库对美国英语、德语和西班牙语进行了实验。在第一个实验中,使用语言依赖模型实现了39.0% ~ 53.9%的音素识别率。使用跨语言模型可以提高某些音素的识别性能,但平均而言会降低识别性能。然而,跨语言模型加速了跨语言迁移,减少了多语言语音识别系统的音素库大小。最后,提出了一种新的多语言音素建模方法,该方法可用于多种语言。该技术减少了多语言语音识别系统中基于音素的单元数量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Audiovisual integration of speech by children and adults with cochlear implants AUDIOVISUAL INTEGRATION OF SPEECH BY CHILDREN AND ADULTS WITH COCHEAR IMPLANTS. Efficient adaptation of TTS duration model to new speakers SABLE: a standard for TTS markup A three-dimensional linear articulatory model based on MRI data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1