{"title":"基于无监督准沉默的说话人分割","authors":"Amit Kumar Bhuyan, H. Dutta, S. Biswas","doi":"10.1109/SETIT54465.2022.9875932","DOIUrl":null,"url":null,"abstract":"This paper presents a computationally efficient and accurate speech segmentation framework suitable for speaker diarization. The proposed approach solves the problem of increased false positive rate in order to compensate for reduced false negative rate during speaker change detection in the existing methods in literature. In this new approach, speaker change point detection is biased around detected quasi-silences, which reduces the severity of the trade-off between the missed detection and false detection rates. Additionally, the computational overhead is reduced due to the fact that the segmentation related processing happens only around the detected quasi-silences as opposed to during the entire speech signal. The change point detection accuracy of the proposed quasi-silence-based method is compared with the WinGrow method from literature that uses Bayesian Information Criterion (BIC) recursively. The results show a considerable improvement in the reduction of false positive rate at the segmentation stage while reducing the computational overhead. The proposed mechanism’s improved accuracy and reduced computation makes it a candidate for real-time speaker diarization especially when run on low-power embedded devices.","PeriodicalId":126155,"journal":{"name":"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)","volume":"272 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unsupervised Quasi-Silence based Speech Segmentation for Speaker Diarization\",\"authors\":\"Amit Kumar Bhuyan, H. Dutta, S. Biswas\",\"doi\":\"10.1109/SETIT54465.2022.9875932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a computationally efficient and accurate speech segmentation framework suitable for speaker diarization. The proposed approach solves the problem of increased false positive rate in order to compensate for reduced false negative rate during speaker change detection in the existing methods in literature. In this new approach, speaker change point detection is biased around detected quasi-silences, which reduces the severity of the trade-off between the missed detection and false detection rates. Additionally, the computational overhead is reduced due to the fact that the segmentation related processing happens only around the detected quasi-silences as opposed to during the entire speech signal. The change point detection accuracy of the proposed quasi-silence-based method is compared with the WinGrow method from literature that uses Bayesian Information Criterion (BIC) recursively. The results show a considerable improvement in the reduction of false positive rate at the segmentation stage while reducing the computational overhead. The proposed mechanism’s improved accuracy and reduced computation makes it a candidate for real-time speaker diarization especially when run on low-power embedded devices.\",\"PeriodicalId\":126155,\"journal\":{\"name\":\"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)\",\"volume\":\"272 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SETIT54465.2022.9875932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SETIT54465.2022.9875932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unsupervised Quasi-Silence based Speech Segmentation for Speaker Diarization
This paper presents a computationally efficient and accurate speech segmentation framework suitable for speaker diarization. The proposed approach solves the problem of increased false positive rate in order to compensate for reduced false negative rate during speaker change detection in the existing methods in literature. In this new approach, speaker change point detection is biased around detected quasi-silences, which reduces the severity of the trade-off between the missed detection and false detection rates. Additionally, the computational overhead is reduced due to the fact that the segmentation related processing happens only around the detected quasi-silences as opposed to during the entire speech signal. The change point detection accuracy of the proposed quasi-silence-based method is compared with the WinGrow method from literature that uses Bayesian Information Criterion (BIC) recursively. The results show a considerable improvement in the reduction of false positive rate at the segmentation stage while reducing the computational overhead. The proposed mechanism’s improved accuracy and reduced computation makes it a candidate for real-time speaker diarization especially when run on low-power embedded devices.