Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition

U. Chaudhari, Jirí Navrátil, Stephane H Maes
{"title":"Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition","authors":"U. Chaudhari, Jirí Navrátil, Stephane H Maes","doi":"10.1109/TSA.2003.809121","DOIUrl":null,"url":null,"abstract":"We present a transformation-based, multigrained data modeling technique in the context of text independent speaker recognition, aimed at mitigating difficulties caused by sparse training and test data. Both identification and verification are addressed, where we view the entire population as divided into the target population and its complement, which we refer to as the background population. First, we present our development of maximum likelihood transformation based recognition with diagonally constrained Gaussian mixture models and show its robustness to data scarcity with results on identification. Then for each target and background speaker, a multigrained model is constructed using the transformation based extension as a building block. The training data is labeled with an HMM based phone labeler. We then make use of a graduated phone class structure to train the speaker model at various levels of detail. This structure is a tree with the root node containing all the phones. Subsequent levels partition the phones into increasingly finer grained linguistic classes. This method affords the use of fine detail where possible, i.e., as reflected in the amount of training data distributed to each tree node. We demonstrate the effectiveness of the modeling with verification experiments in matched and mismatched conditions.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"10 1","pages":"61-69"},"PeriodicalIF":0.0000,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.809121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

Abstract

We present a transformation-based, multigrained data modeling technique in the context of text independent speaker recognition, aimed at mitigating difficulties caused by sparse training and test data. Both identification and verification are addressed, where we view the entire population as divided into the target population and its complement, which we refer to as the background population. First, we present our development of maximum likelihood transformation based recognition with diagonally constrained Gaussian mixture models and show its robustness to data scarcity with results on identification. Then for each target and background speaker, a multigrained model is constructed using the transformation based extension as a building block. The training data is labeled with an HMM based phone labeler. We then make use of a graduated phone class structure to train the speaker model at various levels of detail. This structure is a tree with the root node containing all the phones. Subsequent levels partition the phones into increasingly finer grained linguistic classes. This method affords the use of fine detail where possible, i.e., as reflected in the amount of training data distributed to each tree node. We demonstrate the effectiveness of the modeling with verification experiments in matched and mismatched conditions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有模式特定的最大似然变换的多粒度建模,用于文本无关的说话人识别
我们提出了一种基于转换的多粒度数据建模技术,用于文本独立的说话人识别,旨在减轻稀疏训练和测试数据带来的困难。讨论了鉴定和核查问题,我们把整个人口分为目标人口及其补充人口,我们称之为背景人口。首先,我们介绍了基于对角约束高斯混合模型的最大似然变换识别的发展,并通过识别结果证明了其对数据稀缺性的鲁棒性。然后,对于每个目标和背景说话者,使用基于转换的扩展作为构建块构建多粒度模型。使用基于HMM的电话标注器对训练数据进行标注。然后,我们使用一个毕业的电话类结构来训练扬声器模型在不同的细节水平。这个结构是一个包含所有电话的根节点的树。随后的级别将电话划分为越来越细粒度的语言类。这种方法提供了在可能的情况下使用精细的细节,即,正如分布到每个树节点的训练数据量所反映的那样。通过匹配和不匹配条件下的验证实验,验证了该模型的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Errata to "Using Steady-State Suppression to Improve Speech Intelligibility in Reverberant Environments for Elderly Listeners" Farewell Editorial Inaugural Editorial: Riding the Tidal Wave of Human-Centric Information Processing - Innovate, Outreach, Collaborate, Connect, Expand, and Win Three-Dimensional Sound Field Reproduction Using Multiple Circular Loudspeaker Arrays Introduction to the Special Issue on Processing Reverberant Speech: Methodologies and Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1