{"title":"“ISI”是一种自动跟踪和检测扬声器的新方法","authors":"S. Ouamour, M. Guerti, H. Sayoud","doi":"10.1109/IEEEGCC.2006.5686248","DOIUrl":null,"url":null,"abstract":"In this paper we propose a new algorithm called ISI or “Interlaced Speech Indexing”, developed and implemented for the task of speaker detection and tracking. It consists in finding the identity of a well-defined speaker and the moments of his interventions inside an audio document, in order to access rapidly, directly and easily to his speech. Speaker Tracking can broadly be divided into two problems: Locating the points of speaker change (Segmentation of the document) and looking for the target speaker in each segment using a verification system in order to extract his global speech in the document: Speaker Detection. For the segmentation task, we developed a method based on an interlaced equidistant segmentation (IES) associated with the ISI algorithm. This approach uses a speaker identification method based on Second Order Statistical Measures (SOSM). As SOSM measures, we choose the “μGc” one, which is based on the covariance matrix. However, the experiments showed that this method needs, at least, a speech length of 2 seconds, which means that the segmentation resolution will be 2 seconds. By combining the SOSM with the new Indexing technique (ISI), we demonstrate that the average segmentation error is reduced to only 0.5 second, which is more accurate and more interesting for real-time applications. Results indicate that the association SOSM-ISI provides a high resolution and a high tracking performance: the tracking score (percentage of correctly labelled segments) is 95% on TIMIT database and 92.4% on Hub4 database.","PeriodicalId":433452,"journal":{"name":"2006 IEEE GCC Conference (GCC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"“ISI” a new method for automatic speaker tracking and detection\",\"authors\":\"S. Ouamour, M. Guerti, H. Sayoud\",\"doi\":\"10.1109/IEEEGCC.2006.5686248\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose a new algorithm called ISI or “Interlaced Speech Indexing”, developed and implemented for the task of speaker detection and tracking. It consists in finding the identity of a well-defined speaker and the moments of his interventions inside an audio document, in order to access rapidly, directly and easily to his speech. Speaker Tracking can broadly be divided into two problems: Locating the points of speaker change (Segmentation of the document) and looking for the target speaker in each segment using a verification system in order to extract his global speech in the document: Speaker Detection. For the segmentation task, we developed a method based on an interlaced equidistant segmentation (IES) associated with the ISI algorithm. This approach uses a speaker identification method based on Second Order Statistical Measures (SOSM). As SOSM measures, we choose the “μGc” one, which is based on the covariance matrix. However, the experiments showed that this method needs, at least, a speech length of 2 seconds, which means that the segmentation resolution will be 2 seconds. By combining the SOSM with the new Indexing technique (ISI), we demonstrate that the average segmentation error is reduced to only 0.5 second, which is more accurate and more interesting for real-time applications. Results indicate that the association SOSM-ISI provides a high resolution and a high tracking performance: the tracking score (percentage of correctly labelled segments) is 95% on TIMIT database and 92.4% on Hub4 database.\",\"PeriodicalId\":433452,\"journal\":{\"name\":\"2006 IEEE GCC Conference (GCC)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 IEEE GCC Conference (GCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IEEEGCC.2006.5686248\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE GCC Conference (GCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEEEGCC.2006.5686248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
“ISI” a new method for automatic speaker tracking and detection
In this paper we propose a new algorithm called ISI or “Interlaced Speech Indexing”, developed and implemented for the task of speaker detection and tracking. It consists in finding the identity of a well-defined speaker and the moments of his interventions inside an audio document, in order to access rapidly, directly and easily to his speech. Speaker Tracking can broadly be divided into two problems: Locating the points of speaker change (Segmentation of the document) and looking for the target speaker in each segment using a verification system in order to extract his global speech in the document: Speaker Detection. For the segmentation task, we developed a method based on an interlaced equidistant segmentation (IES) associated with the ISI algorithm. This approach uses a speaker identification method based on Second Order Statistical Measures (SOSM). As SOSM measures, we choose the “μGc” one, which is based on the covariance matrix. However, the experiments showed that this method needs, at least, a speech length of 2 seconds, which means that the segmentation resolution will be 2 seconds. By combining the SOSM with the new Indexing technique (ISI), we demonstrate that the average segmentation error is reduced to only 0.5 second, which is more accurate and more interesting for real-time applications. Results indicate that the association SOSM-ISI provides a high resolution and a high tracking performance: the tracking score (percentage of correctly labelled segments) is 95% on TIMIT database and 92.4% on Hub4 database.