{"title":"Speakers clustering with stochastic VQ and clustering quality estimator","authors":"Yishai Cohen, I. Lapidot","doi":"10.1109/ICSEE.2018.8646099","DOIUrl":null,"url":null,"abstract":"Short segments speaker clustering has significant importance both for diarization and applications such as short push-to-tatk (PTT) segments clustering. In this paper we present a new way to cluster speech segments by applying a stochastic vector quantization (VQ) with a cosine metric together with a speaker clustering quality estimator based on logistic regression. The VQ is performed on codebooks of different sizes, and the choice of the best clustering result is estimated using logistic regression. The algorithm is tested on a large range of speakers, between 2 to 60. The results are compared to those of the mean-shift clustering method, which was already tested for this task several times. The results are a bit below those of the cosine similarity measure-based mean-shift clustering. The advantage is in the run-time which is approximately 10 times faster.","PeriodicalId":254455,"journal":{"name":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSEE.2018.8646099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Short segments speaker clustering has significant importance both for diarization and applications such as short push-to-tatk (PTT) segments clustering. In this paper we present a new way to cluster speech segments by applying a stochastic vector quantization (VQ) with a cosine metric together with a speaker clustering quality estimator based on logistic regression. The VQ is performed on codebooks of different sizes, and the choice of the best clustering result is estimated using logistic regression. The algorithm is tested on a large range of speakers, between 2 to 60. The results are compared to those of the mean-shift clustering method, which was already tested for this task several times. The results are a bit below those of the cosine similarity measure-based mean-shift clustering. The advantage is in the run-time which is approximately 10 times faster.