用于欺骗性语音特征描述的可解释概率属性嵌入方法

Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen
{"title":"用于欺骗性语音特征描述的可解释概率属性嵌入方法","authors":"Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen","doi":"arxiv-2409.11027","DOIUrl":null,"url":null,"abstract":"We propose a novel approach for spoofed speech characterization through\nexplainable probabilistic attribute embeddings. In contrast to high-dimensional\nraw embeddings extracted from a spoofing countermeasure (CM) whose dimensions\nare not easy to interpret, the probabilistic attributes are designed to gauge\nthe presence or absence of sub-components that make up a specific spoofing\nattack. These attributes are then applied to two downstream tasks: spoofing\ndetection and attack attribution. To enforce interpretability also to the\nback-end, we adopt a decision tree classifier. Our experiments on the\nASVspoof2019 dataset with spoof CM embeddings extracted from three models\n(AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the\nattribute embeddings are on par with the original raw spoof CM embeddings for\nboth tasks. The best performance achieved with the proposed approach for\nspoofing detection and attack attribution, in terms of accuracy, is 99.7% and\n99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings.\nTo analyze the relative contribution of each attribute, we estimate their\nShapley values. Attributes related to acoustic feature prediction, waveform\ngeneration (vocoder), and speaker modeling are found important for spoofing\ndetection; while duration modeling, vocoder, and input type play a role in\nspoofing attack attribution.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization\",\"authors\":\"Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen\",\"doi\":\"arxiv-2409.11027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel approach for spoofed speech characterization through\\nexplainable probabilistic attribute embeddings. In contrast to high-dimensional\\nraw embeddings extracted from a spoofing countermeasure (CM) whose dimensions\\nare not easy to interpret, the probabilistic attributes are designed to gauge\\nthe presence or absence of sub-components that make up a specific spoofing\\nattack. These attributes are then applied to two downstream tasks: spoofing\\ndetection and attack attribution. To enforce interpretability also to the\\nback-end, we adopt a decision tree classifier. Our experiments on the\\nASVspoof2019 dataset with spoof CM embeddings extracted from three models\\n(AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the\\nattribute embeddings are on par with the original raw spoof CM embeddings for\\nboth tasks. The best performance achieved with the proposed approach for\\nspoofing detection and attack attribution, in terms of accuracy, is 99.7% and\\n99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings.\\nTo analyze the relative contribution of each attribute, we estimate their\\nShapley values. Attributes related to acoustic feature prediction, waveform\\ngeneration (vocoder), and speaker modeling are found important for spoofing\\ndetection; while duration modeling, vocoder, and input type play a role in\\nspoofing attack attribution.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们提出了一种通过可解释的概率属性嵌入来描述欺骗语音特征的新方法。从欺骗对策(CM)中提取的高维草图嵌入不容易解释,与之相反,概率属性旨在衡量是否存在构成特定欺骗攻击的子组件。然后将这些属性应用于两个下游任务:欺骗检测和攻击归因。为了使后端也具有可解释性,我们采用了决策树分类器。我们使用从三种模型(AASIST、Rawboost-AASIST、SSL-AASIST)中提取的欺骗性 CM 嵌入在 ASVspoof2019 数据集上进行的实验表明,属性嵌入在这两项任务中的性能与原始的欺骗性 CM 嵌入相当。在欺骗检测和攻击归因方面,拟议方法的准确率分别达到 99.7% 和 99.2%,而使用原始 CM 嵌入的准确率分别为 99.7% 和 94.7%。我们发现,与声学特征预测、波形生成(声码器)和扬声器建模相关的属性对于欺骗检测非常重要;而时长建模、声码器和输入类型则在欺骗攻击归因中发挥了作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization
We propose a novel approach for spoofed speech characterization through explainable probabilistic attribute embeddings. In contrast to high-dimensional raw embeddings extracted from a spoofing countermeasure (CM) whose dimensions are not easy to interpret, the probabilistic attributes are designed to gauge the presence or absence of sub-components that make up a specific spoofing attack. These attributes are then applied to two downstream tasks: spoofing detection and attack attribution. To enforce interpretability also to the back-end, we adopt a decision tree classifier. Our experiments on the ASVspoof2019 dataset with spoof CM embeddings extracted from three models (AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the attribute embeddings are on par with the original raw spoof CM embeddings for both tasks. The best performance achieved with the proposed approach for spoofing detection and attack attribution, in terms of accuracy, is 99.7% and 99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings. To analyze the relative contribution of each attribute, we estimate their Shapley values. Attributes related to acoustic feature prediction, waveform generation (vocoder), and speaker modeling are found important for spoofing detection; while duration modeling, vocoder, and input type play a role in spoofing attack attribution.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems Conformal Prediction for Manifold-based Source Localization with Gaussian Processes Insights into the Incorporation of Signal Information in Binaural Signal Matching with Wearable Microphone Arrays Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1