{"title":"基于矢量量化的可选码本半监督在线说话人二分化","authors":"Mahmoud El-Hindi, Michael Muma, A. Zoubir","doi":"10.23919/eusipco55093.2022.9909891","DOIUrl":null,"url":null,"abstract":"Speaker diarization systems process audio files by labelling speech segments according to speakers' identities. Many speaker diarization systems work offline and are not suited for online applications. We present a semi-supervised, online, low-complexity system. While, in general, speaker diarization operates in an unsupervised manner, the presented system relies on the enrollment of the participating speakers in the conversation. The diarization system has two main novel aspects. The first one is a proposed online learning strategy that evaluates processed segments according to their usefulness for learning a speaker, i.e. update a speaker model with it. The segment is evaluated using two metrics to determine whether to use the segment to update the system. The second novel aspect is a proposed vector quantization approach that models the score not only depending on the target speaker codebook but also takes an alternative codebook into account. We also present an approach to compute the alternative codebook. Simulation results show that the proposed system outperforms a comparable system without the proposed online learning strategy and shows benefits, especially for short training lengths.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-Supervised Online Speaker Diarization using Vector Quantization with Alternative Codebooks\",\"authors\":\"Mahmoud El-Hindi, Michael Muma, A. Zoubir\",\"doi\":\"10.23919/eusipco55093.2022.9909891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker diarization systems process audio files by labelling speech segments according to speakers' identities. Many speaker diarization systems work offline and are not suited for online applications. We present a semi-supervised, online, low-complexity system. While, in general, speaker diarization operates in an unsupervised manner, the presented system relies on the enrollment of the participating speakers in the conversation. The diarization system has two main novel aspects. The first one is a proposed online learning strategy that evaluates processed segments according to their usefulness for learning a speaker, i.e. update a speaker model with it. The segment is evaluated using two metrics to determine whether to use the segment to update the system. The second novel aspect is a proposed vector quantization approach that models the score not only depending on the target speaker codebook but also takes an alternative codebook into account. We also present an approach to compute the alternative codebook. Simulation results show that the proposed system outperforms a comparable system without the proposed online learning strategy and shows benefits, especially for short training lengths.\",\"PeriodicalId\":231263,\"journal\":{\"name\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/eusipco55093.2022.9909891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi-Supervised Online Speaker Diarization using Vector Quantization with Alternative Codebooks
Speaker diarization systems process audio files by labelling speech segments according to speakers' identities. Many speaker diarization systems work offline and are not suited for online applications. We present a semi-supervised, online, low-complexity system. While, in general, speaker diarization operates in an unsupervised manner, the presented system relies on the enrollment of the participating speakers in the conversation. The diarization system has two main novel aspects. The first one is a proposed online learning strategy that evaluates processed segments according to their usefulness for learning a speaker, i.e. update a speaker model with it. The segment is evaluated using two metrics to determine whether to use the segment to update the system. The second novel aspect is a proposed vector quantization approach that models the score not only depending on the target speaker codebook but also takes an alternative codebook into account. We also present an approach to compute the alternative codebook. Simulation results show that the proposed system outperforms a comparable system without the proposed online learning strategy and shows benefits, especially for short training lengths.