{"title":"基于TESBCC和音高特征的无监督说话人分割和聚类","authors":"J. Naresh, R. S. Holambe, T. Basu","doi":"10.1109/CICN.2013.142","DOIUrl":null,"url":null,"abstract":"This paper describes the implementation of unsupervised speaker segmentation and clustering system. Main objective of the work presented in this paper is to study the performance of speaker diarization system using a new feature-set called Temporal Energy of Subband Cepstral Coefficients (TESBCC) and Pitch based features. The system first classifies the audio signal into speech and nonspeech signal using average zero crossing rate (ZCR), followed by a gender clssifier stage. Speaker change is first roughly detected using Hotelling T2 distance metric and then the Bayesian information criterion (BIC) is used to validate the potential speaker change point to reduce the false alarm rate. The bottom-up approach is used for speaker clustering. The performance of the speaker segmentation and clustering system using TESBCC is compared with that using MFCC.","PeriodicalId":415274,"journal":{"name":"2013 5th International Conference on Computational Intelligence and Communication Networks","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Unsupervised Speaker Segmentation and Clustering Using TESBCC and Pitch Based Features\",\"authors\":\"J. Naresh, R. S. Holambe, T. Basu\",\"doi\":\"10.1109/CICN.2013.142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the implementation of unsupervised speaker segmentation and clustering system. Main objective of the work presented in this paper is to study the performance of speaker diarization system using a new feature-set called Temporal Energy of Subband Cepstral Coefficients (TESBCC) and Pitch based features. The system first classifies the audio signal into speech and nonspeech signal using average zero crossing rate (ZCR), followed by a gender clssifier stage. Speaker change is first roughly detected using Hotelling T2 distance metric and then the Bayesian information criterion (BIC) is used to validate the potential speaker change point to reduce the false alarm rate. The bottom-up approach is used for speaker clustering. The performance of the speaker segmentation and clustering system using TESBCC is compared with that using MFCC.\",\"PeriodicalId\":415274,\"journal\":{\"name\":\"2013 5th International Conference on Computational Intelligence and Communication Networks\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 5th International Conference on Computational Intelligence and Communication Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICN.2013.142\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 5th International Conference on Computational Intelligence and Communication Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN.2013.142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unsupervised Speaker Segmentation and Clustering Using TESBCC and Pitch Based Features
This paper describes the implementation of unsupervised speaker segmentation and clustering system. Main objective of the work presented in this paper is to study the performance of speaker diarization system using a new feature-set called Temporal Energy of Subband Cepstral Coefficients (TESBCC) and Pitch based features. The system first classifies the audio signal into speech and nonspeech signal using average zero crossing rate (ZCR), followed by a gender clssifier stage. Speaker change is first roughly detected using Hotelling T2 distance metric and then the Bayesian information criterion (BIC) is used to validate the potential speaker change point to reduce the false alarm rate. The bottom-up approach is used for speaker clustering. The performance of the speaker segmentation and clustering system using TESBCC is compared with that using MFCC.