{"title":"RAS-E2E: SincNet端到端的RawNet损耗,用于文本无关的说话人验证","authors":"Pantid Chantangphol, Theerat Sakdejayont, Tawunrat Chalothorn","doi":"10.1109/iSAI-NLP56921.2022.9960255","DOIUrl":null,"url":null,"abstract":"Despite reaching satisfactory verification performance, variousness utterance duration and phonemes and the robustness of the system remain a challenge in speaker ver-ification tasks. To deal with this challenge, we propose RAS-E2E, a novel fully cross-lingual speaker verification system that discovers meaningful information from input raw waveforms of various duration utterances, including short utterance duration, to determine whether an utterance matches the target speaker by merging two powerful paradigms: SincNet and Rawnet training scheme with Bi-RNN. The conducted experiments on Voxceleb, Gowajee and internal call-center datasets demonstrate that RAS-E2E achieves better performance compared to the recent verification systems on waveforms.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RAS-E2E: The SincNet end-to-end with RawNet loss for text-independent speaker verification\",\"authors\":\"Pantid Chantangphol, Theerat Sakdejayont, Tawunrat Chalothorn\",\"doi\":\"10.1109/iSAI-NLP56921.2022.9960255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite reaching satisfactory verification performance, variousness utterance duration and phonemes and the robustness of the system remain a challenge in speaker ver-ification tasks. To deal with this challenge, we propose RAS-E2E, a novel fully cross-lingual speaker verification system that discovers meaningful information from input raw waveforms of various duration utterances, including short utterance duration, to determine whether an utterance matches the target speaker by merging two powerful paradigms: SincNet and Rawnet training scheme with Bi-RNN. The conducted experiments on Voxceleb, Gowajee and internal call-center datasets demonstrate that RAS-E2E achieves better performance compared to the recent verification systems on waveforms.\",\"PeriodicalId\":399019,\"journal\":{\"name\":\"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iSAI-NLP56921.2022.9960255\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RAS-E2E: The SincNet end-to-end with RawNet loss for text-independent speaker verification
Despite reaching satisfactory verification performance, variousness utterance duration and phonemes and the robustness of the system remain a challenge in speaker ver-ification tasks. To deal with this challenge, we propose RAS-E2E, a novel fully cross-lingual speaker verification system that discovers meaningful information from input raw waveforms of various duration utterances, including short utterance duration, to determine whether an utterance matches the target speaker by merging two powerful paradigms: SincNet and Rawnet training scheme with Bi-RNN. The conducted experiments on Voxceleb, Gowajee and internal call-center datasets demonstrate that RAS-E2E achieves better performance compared to the recent verification systems on waveforms.