用于欺骗性语音特征描述的可解释概率属性嵌入方法

arXiv - EE - Audio and Speech Processing Pub Date : 2024-09-17 DOI:arxiv-2409.11027

Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen

{"title":"用于欺骗性语音特征描述的可解释概率属性嵌入方法","authors":"Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen","doi":"arxiv-2409.11027","DOIUrl":null,"url":null,"abstract":"We propose a novel approach for spoofed speech characterization through\nexplainable probabilistic attribute embeddings. In contrast to high-dimensional\nraw embeddings extracted from a spoofing countermeasure (CM) whose dimensions\nare not easy to interpret, the probabilistic attributes are designed to gauge\nthe presence or absence of sub-components that make up a specific spoofing\nattack. These attributes are then applied to two downstream tasks: spoofing\ndetection and attack attribution. To enforce interpretability also to the\nback-end, we adopt a decision tree classifier. Our experiments on the\nASVspoof2019 dataset with spoof CM embeddings extracted from three models\n(AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the\nattribute embeddings are on par with the original raw spoof CM embeddings for\nboth tasks. The best performance achieved with the proposed approach for\nspoofing detection and attack attribution, in terms of accuracy, is 99.7% and\n99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings.\nTo analyze the relative contribution of each attribute, we estimate their\nShapley values. Attributes related to acoustic feature prediction, waveform\ngeneration (vocoder), and speaker modeling are found important for spoofing\ndetection; while duration modeling, vocoder, and input type play a role in\nspoofing attack attribution.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization\",\"authors\":\"Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen\",\"doi\":\"arxiv-2409.11027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel approach for spoofed speech characterization through\\nexplainable probabilistic attribute embeddings. In contrast to high-dimensional\\nraw embeddings extracted from a spoofing countermeasure (CM) whose dimensions\\nare not easy to interpret, the probabilistic attributes are designed to gauge\\nthe presence or absence of sub-components that make up a specific spoofing\\nattack. These attributes are then applied to two downstream tasks: spoofing\\ndetection and attack attribution. To enforce interpretability also to the\\nback-end, we adopt a decision tree classifier. Our experiments on the\\nASVspoof2019 dataset with spoof CM embeddings extracted from three models\\n(AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the\\nattribute embeddings are on par with the original raw spoof CM embeddings for\\nboth tasks. The best performance achieved with the proposed approach for\\nspoofing detection and attack attribution, in terms of accuracy, is 99.7% and\\n99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings.\\nTo analyze the relative contribution of each attribute, we estimate their\\nShapley values. Attributes related to acoustic feature prediction, waveform\\ngeneration (vocoder), and speaker modeling are found important for spoofing\\ndetection; while duration modeling, vocoder, and input type play a role in\\nspoofing attack attribution.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种通过可解释的概率属性嵌入来描述欺骗语音特征的新方法。从欺骗对策（CM）中提取的高维草图嵌入不容易解释，与之相反，概率属性旨在衡量是否存在构成特定欺骗攻击的子组件。然后将这些属性应用于两个下游任务：欺骗检测和攻击归因。为了使后端也具有可解释性，我们采用了决策树分类器。我们使用从三种模型（AASIST、Rawboost-AASIST、SSL-AASIST）中提取的欺骗性 CM 嵌入在 ASVspoof2019 数据集上进行的实验表明，属性嵌入在这两项任务中的性能与原始的欺骗性 CM 嵌入相当。在欺骗检测和攻击归因方面，拟议方法的准确率分别达到 99.7% 和 99.2%，而使用原始 CM 嵌入的准确率分别为 99.7% 和 94.7%。我们发现，与声学特征预测、波形生成（声码器）和扬声器建模相关的属性对于欺骗检测非常重要；而时长建模、声码器和输入类型则在欺骗攻击归因中发挥了作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization

We propose a novel approach for spoofed speech characterization through explainable probabilistic attribute embeddings. In contrast to high-dimensional raw embeddings extracted from a spoofing countermeasure (CM) whose dimensions are not easy to interpret, the probabilistic attributes are designed to gauge the presence or absence of sub-components that make up a specific spoofing attack. These attributes are then applied to two downstream tasks: spoofing detection and attack attribution. To enforce interpretability also to the back-end, we adopt a decision tree classifier. Our experiments on the ASVspoof2019 dataset with spoof CM embeddings extracted from three models (AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the attribute embeddings are on par with the original raw spoof CM embeddings for both tasks. The best performance achieved with the proposed approach for spoofing detection and attack attribution, in terms of accuracy, is 99.7% and 99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings. To analyze the relative contribution of each attribute, we estimate their Shapley values. Attributes related to acoustic feature prediction, waveform generation (vocoder), and speaker modeling are found important for spoofing detection; while duration modeling, vocoder, and input type play a role in spoofing attack attribution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - EE - Audio and Speech Processing

自引率

0.00%

发文量