在综合系统中利用语音嵌入技术进行说话者日记化和恶意行为者检测

I. Zaiets, V. Brydinskyi, D. Sabodashko, Yu. Khoma, Khrystyna Ruda, M. Shved
{"title":"在综合系统中利用语音嵌入技术进行说话者日记化和恶意行为者检测","authors":"I. Zaiets, V. Brydinskyi, D. Sabodashko, Yu. Khoma, Khrystyna Ruda, M. Shved","doi":"10.23939/csn2024.01.054","DOIUrl":null,"url":null,"abstract":"This paper explores the use of diarization systems which employ advanced machine learning algorithms for the precise detection and separation of different speakers in audio recordings for the implementation of an intruder detection system. Several state-of-the-art diarization models including Nvidia’s NeMo Pyannote and SpeechBrain are compared. The performance of these models is evaluated using typical metrics used for the diarization systems such as diarization error rate (DER) and Jaccard error rate (JER). The diarization system was tested on various audio conditions including noisy environment clean environment small number of speakers and large number of speakers. The findings reveal that Pyannote delivers superior performance in terms of diarization accuracy and thus was used for implementation of the intruder detection system. This system was further evaluated on a custom dataset based on Ukrainian podcasts and it was found that the system performed with 100% recall and 93.75% precision meaning that the system has not missed any criminal from the dataset but could sometimes falsely detect a non-criminal as a criminal. This system proves to be effective and flexible in intruder detection tasks in audio files with different file sizes and different numbers of speakers which are present in these audio files. Keywords: deep learning diarization speaker embeddings speaker recognition cyber security.","PeriodicalId":504130,"journal":{"name":"Computer systems and network","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UTILIZATION OF VOICE EMBEDDINGS IN INTEGRATED SYSTEMS FOR SPEAKER DIARIZATION AND MALICIOUS ACTOR DETECTION\",\"authors\":\"I. Zaiets, V. Brydinskyi, D. Sabodashko, Yu. Khoma, Khrystyna Ruda, M. Shved\",\"doi\":\"10.23939/csn2024.01.054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper explores the use of diarization systems which employ advanced machine learning algorithms for the precise detection and separation of different speakers in audio recordings for the implementation of an intruder detection system. Several state-of-the-art diarization models including Nvidia’s NeMo Pyannote and SpeechBrain are compared. The performance of these models is evaluated using typical metrics used for the diarization systems such as diarization error rate (DER) and Jaccard error rate (JER). The diarization system was tested on various audio conditions including noisy environment clean environment small number of speakers and large number of speakers. The findings reveal that Pyannote delivers superior performance in terms of diarization accuracy and thus was used for implementation of the intruder detection system. This system was further evaluated on a custom dataset based on Ukrainian podcasts and it was found that the system performed with 100% recall and 93.75% precision meaning that the system has not missed any criminal from the dataset but could sometimes falsely detect a non-criminal as a criminal. This system proves to be effective and flexible in intruder detection tasks in audio files with different file sizes and different numbers of speakers which are present in these audio files. Keywords: deep learning diarization speaker embeddings speaker recognition cyber security.\",\"PeriodicalId\":504130,\"journal\":{\"name\":\"Computer systems and network\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer systems and network\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23939/csn2024.01.054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer systems and network","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23939/csn2024.01.054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文探讨了日记化系统的使用,该系统采用先进的机器学习算法,可精确检测和分离音频记录中的不同说话者,用于实施入侵者检测系统。本文对包括 Nvidia 的 NeMo Pyannote 和 SpeechBrain 在内的几种最先进的日记化模型进行了比较。这些模型的性能采用了用于分离系统的典型指标进行评估,如分离错误率(DER)和 Jaccard 错误率(JER)。在各种音频条件下,包括嘈杂环境、清洁环境、扬声器数量较少和扬声器数量较多的情况下,都对分离系统进行了测试。测试结果表明,Pyannote 在日记化准确性方面表现出色,因此被用于实施入侵者检测系统。在基于乌克兰播客的定制数据集上对该系统进行了进一步评估,结果发现该系统的召回率为 100%,精确率为 93.75%,这意味着该系统没有遗漏数据集中的任何罪犯,但有时会将非罪犯错误地检测为罪犯。事实证明,该系统在不同文件大小和不同发言人数量的音频文件中执行入侵者检测任务时非常有效和灵活。关键词:深度学习日记化扬声器嵌入扬声器识别网络安全。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
UTILIZATION OF VOICE EMBEDDINGS IN INTEGRATED SYSTEMS FOR SPEAKER DIARIZATION AND MALICIOUS ACTOR DETECTION
This paper explores the use of diarization systems which employ advanced machine learning algorithms for the precise detection and separation of different speakers in audio recordings for the implementation of an intruder detection system. Several state-of-the-art diarization models including Nvidia’s NeMo Pyannote and SpeechBrain are compared. The performance of these models is evaluated using typical metrics used for the diarization systems such as diarization error rate (DER) and Jaccard error rate (JER). The diarization system was tested on various audio conditions including noisy environment clean environment small number of speakers and large number of speakers. The findings reveal that Pyannote delivers superior performance in terms of diarization accuracy and thus was used for implementation of the intruder detection system. This system was further evaluated on a custom dataset based on Ukrainian podcasts and it was found that the system performed with 100% recall and 93.75% precision meaning that the system has not missed any criminal from the dataset but could sometimes falsely detect a non-criminal as a criminal. This system proves to be effective and flexible in intruder detection tasks in audio files with different file sizes and different numbers of speakers which are present in these audio files. Keywords: deep learning diarization speaker embeddings speaker recognition cyber security.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
METHODS AND MEANS OF ENSURING STABILITY AND PROTECTION OF RADIO COMMUNICATIONS IN A COMPLEX ELECTROMAGNETIC ENVIRONMENT RESEARCH AND IMPROVEMENT OF COMPUTING ALGORITHMS FOR CALCULATING THE TRIGONOMETRICAL COEFFICIENTS OF THE HASHING ALGORITHM MD5 UTILIZATION OF VOICE EMBEDDINGS IN INTEGRATED SYSTEMS FOR SPEAKER DIARIZATION AND MALICIOUS ACTOR DETECTION IMPROVEMENT THE SECURITY OF THE ENTERPRISE’S NETWORK INFRASTRUCTURE IN CONDITIONS OF MODERN CHALLENGES AND LIMITED RESOURCES OVERVIEW OF THE CIS BENCHMARKS USAGE FOR FULFILLING THE REQUIREMENTS FROM INTERNATIONAL STANDARD ISO/IEC 27001:2022
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1