Alleviating the small sample-size problem in i-vector based speaker verification

Wei Rao, M. Mak
{"title":"Alleviating the small sample-size problem in i-vector based speaker verification","authors":"Wei Rao, M. Mak","doi":"10.1109/ISCSLP.2012.6423527","DOIUrl":null,"url":null,"abstract":"This paper investigates the small sample-size problem in i-vector based speaker verification systems. The idea of i-vectors is to represent the characteristics of speakers in the factors of a factor analyzer. Because the factor loading matrix defines the possible speaker and channel-variability of i-vectors, it is important to suppress the unwanted channel variability. Linear discriminant analysis (LDA), within-class covariance normalization (WCCN), and probabilistic LDA are commonly used for such purpose. These methods, however, require training data comprising many speakers each providing sufficient recording sessions for good performance. Performance will suffer when the number of speakers and/or number of sessions per speaker are too small. This paper compares four approaches to addressing this small sample-size problem: (1) preprocessing the i-vectors by PCA before applying LDA (PCA+LDA), (2) replacing the matrix inverse in LDA by pseudo-inverse, (3) applying multi-way LDA by exploiting the microphone and speaker labels of the training data, and (4) increasing the matrix rank in LDA by generating more i-vectors using utterance partitioning. Results based on NIST 2010 SRE suggests that utterance partitioning performs the best, followed by multi-way LDA and PCA+LDA.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

This paper investigates the small sample-size problem in i-vector based speaker verification systems. The idea of i-vectors is to represent the characteristics of speakers in the factors of a factor analyzer. Because the factor loading matrix defines the possible speaker and channel-variability of i-vectors, it is important to suppress the unwanted channel variability. Linear discriminant analysis (LDA), within-class covariance normalization (WCCN), and probabilistic LDA are commonly used for such purpose. These methods, however, require training data comprising many speakers each providing sufficient recording sessions for good performance. Performance will suffer when the number of speakers and/or number of sessions per speaker are too small. This paper compares four approaches to addressing this small sample-size problem: (1) preprocessing the i-vectors by PCA before applying LDA (PCA+LDA), (2) replacing the matrix inverse in LDA by pseudo-inverse, (3) applying multi-way LDA by exploiting the microphone and speaker labels of the training data, and (4) increasing the matrix rank in LDA by generating more i-vectors using utterance partitioning. Results based on NIST 2010 SRE suggests that utterance partitioning performs the best, followed by multi-way LDA and PCA+LDA.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
缓解基于i向量的说话人验证中的小样本问题
本文研究了基于i向量的说话人验证系统中的小样本问题。i向量的思想是在因子分析仪的因子中表示说话人的特征。由于因子加载矩阵定义了i向量的可能扬声器和通道可变性,因此抑制不需要的通道可变性非常重要。线性判别分析(LDA)、类内协方差归一化(WCCN)和概率LDA通常用于此目的。然而,这些方法需要由许多发言者组成的训练数据,每个发言者提供足够的录音会话以获得良好的表现。当演讲者的数量和/或每个演讲者的会话数量太少时,性能将受到影响。本文比较了四种解决小样本问题的方法:(1)在应用LDA之前先用PCA预处理i向量(PCA+LDA),(2)用伪逆代替LDA中的矩阵逆,(3)利用训练数据的麦克风和扬声器标签应用多路LDA,(4)利用话语划分生成更多的i向量来提高LDA中的矩阵秩。基于NIST 2010 SRE的结果表明,话语分割效果最好,其次是多路LDA和PCA+LDA。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation Effects of excitation spread on the intelligibility of Mandarin speech in cochlear implant simulations A comparative study of fMPE and RDLT approaches to LVCSR Keyword-specific normalization based keyword spotting for spontaneous speech A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1