An undergraduate Mandarin speech database for speaker recognition research

Hong Wang, Jingui Pan
{"title":"An undergraduate Mandarin speech database for speaker recognition research","authors":"Hong Wang, Jingui Pan","doi":"10.1109/ICSDA.2009.5278370","DOIUrl":null,"url":null,"abstract":"This paper describes the development of a new speech database for speaker recognition research, UMSD (undergraduate Mandarin speech database). In UMSD, there are total 12 sessions of utterances for each of the selected 24 undergraduate students, while all recordings are conducted in different session intervals. The phonetically balanced corpus content include isolated digits (0∼9), digit strings (5 phone numbers and 2 postal codes), words and phrases with different length from 1 to 10 characters (10 for each given length), the Chinese Phonetic Alphabet Table (21 Initials and 35 Finals), 2 ancient poems and a 200 words paragraph extracted from a well-known essay. Additionally, in order to effectively extract and process the interesting speech segments from UMSD, a speech database management system has been proposed on the base of MATLAB and MS-ACCESS. Results of preliminary evaluation show that the performance attained with UMSD is good, it not only meets the needs of our own recent effort in text-dependent and text-independent speaker recognition, but also allows the further research of the long term intra-speaker variability thanks to its multi-session records with different session intervals.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"8 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2009.5278370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This paper describes the development of a new speech database for speaker recognition research, UMSD (undergraduate Mandarin speech database). In UMSD, there are total 12 sessions of utterances for each of the selected 24 undergraduate students, while all recordings are conducted in different session intervals. The phonetically balanced corpus content include isolated digits (0∼9), digit strings (5 phone numbers and 2 postal codes), words and phrases with different length from 1 to 10 characters (10 for each given length), the Chinese Phonetic Alphabet Table (21 Initials and 35 Finals), 2 ancient poems and a 200 words paragraph extracted from a well-known essay. Additionally, in order to effectively extract and process the interesting speech segments from UMSD, a speech database management system has been proposed on the base of MATLAB and MS-ACCESS. Results of preliminary evaluation show that the performance attained with UMSD is good, it not only meets the needs of our own recent effort in text-dependent and text-independent speaker recognition, but also allows the further research of the long term intra-speaker variability thanks to its multi-session records with different session intervals.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向说话人识别研究的大学生普通话语音数据库
本文介绍了一个用于说话人识别研究的新型语音数据库UMSD(本科生普通话语音数据库)的开发。在UMSD中,选出的24名本科生每人总共有12个会话,而所有的录音都是在不同的会话间隔进行的。语音均衡的语料库内容包括孤立的数字(0 ~ 9)、数字串(5个电话号码和2个邮政编码)、1 ~ 10个字符(每个给定长度10个字符)的不同长度的单词和短语、汉语音标表(21个声母和35个韵母)、2首古诗和一篇200字的知名文章段落。此外,为了有效地提取和处理UMSD中感兴趣的语音片段,提出了一种基于MATLAB和MS-ACCESS的语音数据库管理系统。初步评价结果表明,该方法取得了良好的性能,不仅满足了我们目前在依赖文本和不依赖文本的说话人识别方面的需要,而且由于它具有不同会话间隔的多会话记录,可以进一步研究长期的说话人内部变异性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Message from the Oriental-COCOSDA convener Maximum Entropy combined FSM stemming method for Uyghur Multilingual number expansion for TTS Design and development of phonetically rich Urdu speech corpus Uyghur vowel weakening processing system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1