Liang He, Xianhong Chen, Can Xu, Tianyu Liang, Jia Liu
{"title":"用于 VB-HMM 说话者记录系统的 Ivec-PLDA-AHC 先验","authors":"Liang He, Xianhong Chen, Can Xu, Tianyu Liang, Jia Liu","doi":"10.1109/SiPS.2017.8109998","DOIUrl":null,"url":null,"abstract":"This paper proposes a hybrid speaker diarization system. The main body is a variational Bayes — hidden Markov model (VB-HMM) speaker diarization system. The VB-HMM speaker diarization system avoids making premature hard decision and takes advantages of soft speaker information in an iterative way. Thus, it outperforms most of mainstream speaker diarization systems. Unfortunately, this system is sensitive to its prior in some cases. Either a uniform prior or a flat Dirichlet prior may fail and lead to poor results, thus a more robust and informative prior is desired. Another speaker diarization branch is an i-vector — probabilistic linear discriminant analysis — agglomerative hierarchical clustering (Ivec-PLDA-AHC) system. Benefits from the excellent performance of the Ivec-PLDA system in the speaker recognition field, the Ivec-PLDA-AHC speaker diarization system is believed to be more powerful to cluster segmental i-vectors according to their speakers. Inspired by this feature, we take the output of the Ivec-PLDA-AHC as the VB-HMM's prior. Experiments on our collected database show that the proposed system is significantly better than both of the mentioned systems.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Ivec-PLDA-AHC priors for VB-HMM speaker diarization system\",\"authors\":\"Liang He, Xianhong Chen, Can Xu, Tianyu Liang, Jia Liu\",\"doi\":\"10.1109/SiPS.2017.8109998\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a hybrid speaker diarization system. The main body is a variational Bayes — hidden Markov model (VB-HMM) speaker diarization system. The VB-HMM speaker diarization system avoids making premature hard decision and takes advantages of soft speaker information in an iterative way. Thus, it outperforms most of mainstream speaker diarization systems. Unfortunately, this system is sensitive to its prior in some cases. Either a uniform prior or a flat Dirichlet prior may fail and lead to poor results, thus a more robust and informative prior is desired. Another speaker diarization branch is an i-vector — probabilistic linear discriminant analysis — agglomerative hierarchical clustering (Ivec-PLDA-AHC) system. Benefits from the excellent performance of the Ivec-PLDA system in the speaker recognition field, the Ivec-PLDA-AHC speaker diarization system is believed to be more powerful to cluster segmental i-vectors according to their speakers. Inspired by this feature, we take the output of the Ivec-PLDA-AHC as the VB-HMM's prior. Experiments on our collected database show that the proposed system is significantly better than both of the mentioned systems.\",\"PeriodicalId\":251688,\"journal\":{\"name\":\"2017 IEEE International Workshop on Signal Processing Systems (SiPS)\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Workshop on Signal Processing Systems (SiPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SiPS.2017.8109998\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SiPS.2017.8109998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
本文提出了一种混合式说话人日记系统。其主体是变异贝叶斯-隐马尔可夫模型(VB-HMM)扬声器日差化系统。VB-HMM 说话人日记系统避免了过早做出硬性决定,并以迭代方式利用了说话人的软信息。因此,它的性能优于大多数主流的说话人日记系统。遗憾的是,该系统在某些情况下对先验很敏感。无论是均匀先验还是平面 Dirichlet 先验都可能失效,导致效果不佳,因此需要一个更稳健、信息量更大的先验。另一个扬声器分词分支是 i 向量-概率线性判别分析-聚类分层聚类(Ivec-PLDA-AHC)系统。得益于 Ivec-PLDA 系统在扬声器识别领域的出色表现,Ivec-PLDA-AHC 扬声器分层系统被认为在根据扬声器对分段 i 向量进行聚类方面功能更为强大。受此启发,我们将 Ivec-PLDA-AHC 的输出作为 VB-HMM 的先验。在我们收集的数据库上进行的实验表明,所提出的系统明显优于上述两种系统。
Ivec-PLDA-AHC priors for VB-HMM speaker diarization system
This paper proposes a hybrid speaker diarization system. The main body is a variational Bayes — hidden Markov model (VB-HMM) speaker diarization system. The VB-HMM speaker diarization system avoids making premature hard decision and takes advantages of soft speaker information in an iterative way. Thus, it outperforms most of mainstream speaker diarization systems. Unfortunately, this system is sensitive to its prior in some cases. Either a uniform prior or a flat Dirichlet prior may fail and lead to poor results, thus a more robust and informative prior is desired. Another speaker diarization branch is an i-vector — probabilistic linear discriminant analysis — agglomerative hierarchical clustering (Ivec-PLDA-AHC) system. Benefits from the excellent performance of the Ivec-PLDA system in the speaker recognition field, the Ivec-PLDA-AHC speaker diarization system is believed to be more powerful to cluster segmental i-vectors according to their speakers. Inspired by this feature, we take the output of the Ivec-PLDA-AHC as the VB-HMM's prior. Experiments on our collected database show that the proposed system is significantly better than both of the mentioned systems.