在综合系统中利用语音嵌入技术进行说话者日记化和恶意行为者检测

Computer systems and network Pub Date : 2024-06-01 DOI:10.23939/csn2024.01.054

I. Zaiets, V. Brydinskyi, D. Sabodashko, Yu. Khoma, Khrystyna Ruda, M. Shved

{"title":"在综合系统中利用语音嵌入技术进行说话者日记化和恶意行为者检测","authors":"I. Zaiets, V. Brydinskyi, D. Sabodashko, Yu. Khoma, Khrystyna Ruda, M. Shved","doi":"10.23939/csn2024.01.054","DOIUrl":null,"url":null,"abstract":"This paper explores the use of diarization systems which employ advanced machine learning algorithms for the precise detection and separation of different speakers in audio recordings for the implementation of an intruder detection system. Several state-of-the-art diarization models including Nvidia’s NeMo Pyannote and SpeechBrain are compared. The performance of these models is evaluated using typical metrics used for the diarization systems such as diarization error rate (DER) and Jaccard error rate (JER). The diarization system was tested on various audio conditions including noisy environment clean environment small number of speakers and large number of speakers. The findings reveal that Pyannote delivers superior performance in terms of diarization accuracy and thus was used for implementation of the intruder detection system. This system was further evaluated on a custom dataset based on Ukrainian podcasts and it was found that the system performed with 100% recall and 93.75% precision meaning that the system has not missed any criminal from the dataset but could sometimes falsely detect a non-criminal as a criminal. This system proves to be effective and flexible in intruder detection tasks in audio files with different file sizes and different numbers of speakers which are present in these audio files. Keywords: deep learning diarization speaker embeddings speaker recognition cyber security.","PeriodicalId":504130,"journal":{"name":"Computer systems and network","volume":"13 21","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UTILIZATION OF VOICE EMBEDDINGS IN INTEGRATED SYSTEMS FOR SPEAKER DIARIZATION AND MALICIOUS ACTOR DETECTION\",\"authors\":\"I. Zaiets, V. Brydinskyi, D. Sabodashko, Yu. Khoma, Khrystyna Ruda, M. Shved\",\"doi\":\"10.23939/csn2024.01.054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper explores the use of diarization systems which employ advanced machine learning algorithms for the precise detection and separation of different speakers in audio recordings for the implementation of an intruder detection system. Several state-of-the-art diarization models including Nvidia’s NeMo Pyannote and SpeechBrain are compared. The performance of these models is evaluated using typical metrics used for the diarization systems such as diarization error rate (DER) and Jaccard error rate (JER). The diarization system was tested on various audio conditions including noisy environment clean environment small number of speakers and large number of speakers. The findings reveal that Pyannote delivers superior performance in terms of diarization accuracy and thus was used for implementation of the intruder detection system. This system was further evaluated on a custom dataset based on Ukrainian podcasts and it was found that the system performed with 100% recall and 93.75% precision meaning that the system has not missed any criminal from the dataset but could sometimes falsely detect a non-criminal as a criminal. This system proves to be effective and flexible in intruder detection tasks in audio files with different file sizes and different numbers of speakers which are present in these audio files. Keywords: deep learning diarization speaker embeddings speaker recognition cyber security.\",\"PeriodicalId\":504130,\"journal\":{\"name\":\"Computer systems and network\",\"volume\":\"13 21\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer systems and network\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23939/csn2024.01.054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer systems and network","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23939/csn2024.01.054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文探讨了日记化系统的使用，该系统采用先进的机器学习算法，可精确检测和分离音频记录中的不同说话者，用于实施入侵者检测系统。本文对包括 Nvidia 的 NeMo Pyannote 和 SpeechBrain 在内的几种最先进的日记化模型进行了比较。这些模型的性能采用了用于分离系统的典型指标进行评估，如分离错误率（DER）和 Jaccard 错误率（JER）。在各种音频条件下，包括嘈杂环境、清洁环境、扬声器数量较少和扬声器数量较多的情况下，都对分离系统进行了测试。测试结果表明，Pyannote 在日记化准确性方面表现出色，因此被用于实施入侵者检测系统。在基于乌克兰播客的定制数据集上对该系统进行了进一步评估，结果发现该系统的召回率为 100%，精确率为 93.75%，这意味着该系统没有遗漏数据集中的任何罪犯，但有时会将非罪犯错误地检测为罪犯。事实证明，该系统在不同文件大小和不同发言人数量的音频文件中执行入侵者检测任务时非常有效和灵活。关键词：深度学习日记化扬声器嵌入扬声器识别网络安全。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

UTILIZATION OF VOICE EMBEDDINGS IN INTEGRATED SYSTEMS FOR SPEAKER DIARIZATION AND MALICIOUS ACTOR DETECTION

This paper explores the use of diarization systems which employ advanced machine learning algorithms for the precise detection and separation of different speakers in audio recordings for the implementation of an intruder detection system. Several state-of-the-art diarization models including Nvidia’s NeMo Pyannote and SpeechBrain are compared. The performance of these models is evaluated using typical metrics used for the diarization systems such as diarization error rate (DER) and Jaccard error rate (JER). The diarization system was tested on various audio conditions including noisy environment clean environment small number of speakers and large number of speakers. The findings reveal that Pyannote delivers superior performance in terms of diarization accuracy and thus was used for implementation of the intruder detection system. This system was further evaluated on a custom dataset based on Ukrainian podcasts and it was found that the system performed with 100% recall and 93.75% precision meaning that the system has not missed any criminal from the dataset but could sometimes falsely detect a non-criminal as a criminal. This system proves to be effective and flexible in intruder detection tasks in audio files with different file sizes and different numbers of speakers which are present in these audio files. Keywords: deep learning diarization speaker embeddings speaker recognition cyber security.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer systems and network

自引率

0.00%

发文量