A. N. Iyer, U. Ofoegbu, R. Yantorno, B. Y. Smolenski
{"title":"Blind Speaker Clustering","authors":"A. N. Iyer, U. Ofoegbu, R. Yantorno, B. Y. Smolenski","doi":"10.1109/ISPACS.2006.364902","DOIUrl":null,"url":null,"abstract":"A novel approach to performing speaker clustering in telephone conversations is presented in this paper. The method is based on a simple observation that the distance between populations of feature vectors extracted from different speakers is greater than a preset threshold. This observation is incorporated into the clustering problem by the formulation of a constrained optimization problem. A modified c-means algorithm is designed to solve the optimization problem. Another key aspect in speaker clustering is to determine the number of clusters, which is either assumed or expected as an input in traditional methods. The proposed method does not require such information; instead, the number of clusters is automatically determined from the data. The performance of the proposed algorithm with the Hellinger, Bhattacharyya, Mahalanobis and the generalized likelihood ratio distance measures is evaluated and compared. The approach, employing the Hellinger distance, resulted in an average cluster purity value of 0.85 from experiments performed using the switchboard telephone conversation al speech database. The result indicates a 9% relative improvement in the average cluster purity as compared to the best performing agglomerative clustering system","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Symposium on Intelligent Signal Processing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS.2006.364902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
A novel approach to performing speaker clustering in telephone conversations is presented in this paper. The method is based on a simple observation that the distance between populations of feature vectors extracted from different speakers is greater than a preset threshold. This observation is incorporated into the clustering problem by the formulation of a constrained optimization problem. A modified c-means algorithm is designed to solve the optimization problem. Another key aspect in speaker clustering is to determine the number of clusters, which is either assumed or expected as an input in traditional methods. The proposed method does not require such information; instead, the number of clusters is automatically determined from the data. The performance of the proposed algorithm with the Hellinger, Bhattacharyya, Mahalanobis and the generalized likelihood ratio distance measures is evaluated and compared. The approach, employing the Hellinger distance, resulted in an average cluster purity value of 0.85 from experiments performed using the switchboard telephone conversation al speech database. The result indicates a 9% relative improvement in the average cluster purity as compared to the best performing agglomerative clustering system