Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski
{"title":"A Simple Approach to Unsupervised Speaker Indexing","authors":"Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski","doi":"10.1109/ISPACS.2006.364901","DOIUrl":null,"url":null,"abstract":"Unsupervised speaker indexing is a rapidly developing field in speech processing, which involves determining who is speaking when, without having prior knowledge about the speakers being observed. In this research, a distance-based technique for indexing telephone conversations is presented. Sub-models are formed (using data of approximately equal sizes) from the conversations, from which two references models are judiciously chosen such that they represent the two different speakers in the conversation. Models are then matched to the reference speakers based on a technique referred to as the restrained-relative minimum distance (RRMD) approach. Some models, which fail to meet the RRMD criteria, are considered \"undecided\" and left unmatched with either of the reference speakers. Analysis is made to determine the appropriate size (or length of data to be used) for these models, which are formed using cepstral coefficients of the speech data. The T-square statistic is used for speaker differentiation. Evaluation is performed based on the indexing accuracy as well as the amount of undecided speech obtained. The proposed system was able to yield a minimum indexing error of about 9% with a maximum undecided error of 18.5% , and an equal error rate of 11% on 245 files (with an average length of about 400 seconds each) from the SWITCHBOARD database","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"203 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Symposium on Intelligent Signal Processing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS.2006.364901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Unsupervised speaker indexing is a rapidly developing field in speech processing, which involves determining who is speaking when, without having prior knowledge about the speakers being observed. In this research, a distance-based technique for indexing telephone conversations is presented. Sub-models are formed (using data of approximately equal sizes) from the conversations, from which two references models are judiciously chosen such that they represent the two different speakers in the conversation. Models are then matched to the reference speakers based on a technique referred to as the restrained-relative minimum distance (RRMD) approach. Some models, which fail to meet the RRMD criteria, are considered "undecided" and left unmatched with either of the reference speakers. Analysis is made to determine the appropriate size (or length of data to be used) for these models, which are formed using cepstral coefficients of the speech data. The T-square statistic is used for speaker differentiation. Evaluation is performed based on the indexing accuracy as well as the amount of undecided speech obtained. The proposed system was able to yield a minimum indexing error of about 9% with a maximum undecided error of 18.5% , and an equal error rate of 11% on 245 files (with an average length of about 400 seconds each) from the SWITCHBOARD database