An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments

Meng-Zhen Li, Xiao-Lei Zhang
{"title":"An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments","authors":"Meng-Zhen Li, Xiao-Lei Zhang","doi":"10.23919/APSIPA.2018.8659665","DOIUrl":null,"url":null,"abstract":"Speaker clustering is an important problem of speech processing, such as speaker diarization, however, its behavior in adverse acoustic environments is lack of comprehensive study. To address this problem, we focus on investigating its components respectively. A speaker clustering system contains three components—a feature extraction front-end, a dimensionality reduction algorithm, and a clustering back-end. In this paper, we use the standard Gaussian mixture model based universal background model (GMM-UBM) as a front end to extract high-dimensional supervectors, and compare three dimensionality reduction algorithms as well as two clustering algorithms. The three dimensionality reduction algorithms are the principal component analysis (PCA), spectral clustering (SC), and multilayer bootstrap network (MBN). The two clustering algorithms are the k-means and agglomerative hierarchical clustering (AHC). We have conducted an extensive experiment with both in-domain and out-of-domain settings on the noisy versions of the NIST 2006 speaker recognition evaluation (SRE) and NIST 2008 SRE corpora. Experimental results in various noisy environments show that (i) the MBN based systems perform the best in most cases, while the SC based systems outperform the PCA based systems as well as the original supervector based systems; (ii) AHC is more robust than k-means.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPA.2018.8659665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Speaker clustering is an important problem of speech processing, such as speaker diarization, however, its behavior in adverse acoustic environments is lack of comprehensive study. To address this problem, we focus on investigating its components respectively. A speaker clustering system contains three components—a feature extraction front-end, a dimensionality reduction algorithm, and a clustering back-end. In this paper, we use the standard Gaussian mixture model based universal background model (GMM-UBM) as a front end to extract high-dimensional supervectors, and compare three dimensionality reduction algorithms as well as two clustering algorithms. The three dimensionality reduction algorithms are the principal component analysis (PCA), spectral clustering (SC), and multilayer bootstrap network (MBN). The two clustering algorithms are the k-means and agglomerative hierarchical clustering (AHC). We have conducted an extensive experiment with both in-domain and out-of-domain settings on the noisy versions of the NIST 2006 speaker recognition evaluation (SRE) and NIST 2008 SRE corpora. Experimental results in various noisy environments show that (i) the MBN based systems perform the best in most cases, while the SC based systems outperform the PCA based systems as well as the original supervector based systems; (ii) AHC is more robust than k-means.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
不利声环境下说话人聚类算法研究
说话人聚类是语音处理中的一个重要问题,但对其在不利声环境下的行为缺乏全面的研究。为了解决这个问题,我们分别研究了它的组成部分。一个说话人聚类系统包含三个组成部分:特征提取前端、降维算法和聚类后端。本文采用基于标准高斯混合模型的通用背景模型(GMM-UBM)作为前端提取高维超向量,并对三种降维算法和两种聚类算法进行了比较。三种降维算法分别是主成分分析(PCA)、谱聚类(SC)和多层自举网络(MBN)。这两种聚类算法分别是k-means聚类算法和AHC聚类算法。我们在NIST 2006说话人识别评估(SRE)和NIST 2008 SRE语料库的噪声版本上进行了域内和域外设置的广泛实验。在各种噪声环境下的实验结果表明:(i)基于MBN的系统在大多数情况下表现最好,而基于SC的系统优于基于PCA的系统以及原始的基于超向量的系统;(ii) AHC比k-means更稳健。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Epileptic Focus Localization Based on iEEG by Using Positive Unlabeled (PU) Learning Image Retrieval using CNN and Low-level Feature Fusion for Crime Scene Investigation Image Database Privacy-Preserving SVM Computing in the Encrypted Domain Graphical User Interface for Medical Deep Learning - Application to Magnetic Resonance Imaging Statistical-Mechanical Analysis of the Second-Order Adaptive Volterra Filter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1