Liang He, Xianhong Chen, Can Xu, Tianyu Liang, Jia Liu
{"title":"Ivec-PLDA-AHC priors for VB-HMM speaker diarization system","authors":"Liang He, Xianhong Chen, Can Xu, Tianyu Liang, Jia Liu","doi":"10.1109/SiPS.2017.8109998","DOIUrl":null,"url":null,"abstract":"This paper proposes a hybrid speaker diarization system. The main body is a variational Bayes — hidden Markov model (VB-HMM) speaker diarization system. The VB-HMM speaker diarization system avoids making premature hard decision and takes advantages of soft speaker information in an iterative way. Thus, it outperforms most of mainstream speaker diarization systems. Unfortunately, this system is sensitive to its prior in some cases. Either a uniform prior or a flat Dirichlet prior may fail and lead to poor results, thus a more robust and informative prior is desired. Another speaker diarization branch is an i-vector — probabilistic linear discriminant analysis — agglomerative hierarchical clustering (Ivec-PLDA-AHC) system. Benefits from the excellent performance of the Ivec-PLDA system in the speaker recognition field, the Ivec-PLDA-AHC speaker diarization system is believed to be more powerful to cluster segmental i-vectors according to their speakers. Inspired by this feature, we take the output of the Ivec-PLDA-AHC as the VB-HMM's prior. Experiments on our collected database show that the proposed system is significantly better than both of the mentioned systems.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SiPS.2017.8109998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper proposes a hybrid speaker diarization system. The main body is a variational Bayes — hidden Markov model (VB-HMM) speaker diarization system. The VB-HMM speaker diarization system avoids making premature hard decision and takes advantages of soft speaker information in an iterative way. Thus, it outperforms most of mainstream speaker diarization systems. Unfortunately, this system is sensitive to its prior in some cases. Either a uniform prior or a flat Dirichlet prior may fail and lead to poor results, thus a more robust and informative prior is desired. Another speaker diarization branch is an i-vector — probabilistic linear discriminant analysis — agglomerative hierarchical clustering (Ivec-PLDA-AHC) system. Benefits from the excellent performance of the Ivec-PLDA system in the speaker recognition field, the Ivec-PLDA-AHC speaker diarization system is believed to be more powerful to cluster segmental i-vectors according to their speakers. Inspired by this feature, we take the output of the Ivec-PLDA-AHC as the VB-HMM's prior. Experiments on our collected database show that the proposed system is significantly better than both of the mentioned systems.