A Countermeasure Based on CQT Spectrogram for Deepfake Speech Detection

Pedram Abdzadeh Ziabary, H. Veisi
{"title":"A Countermeasure Based on CQT Spectrogram for Deepfake Speech Detection","authors":"Pedram Abdzadeh Ziabary, H. Veisi","doi":"10.1109/ICSPIS54653.2021.9729387","DOIUrl":null,"url":null,"abstract":"Nowadays, biometrics like face, voice, fingerprint, and iris are widely used for the identity authentication of individuals. Automatic Speaker Verification (ASV) systems aim to verify the speaker's authenticity, but recent research has shown that they are vulnerable to various types of attacks. A large number of Text-To-Speech (TTS) and Voice Conversion (VC) methods are being used to create the so-called synthetic or deepfake speech. In recent years, numerous works have been proposed to improve the spoofing detection ability to protect ASV systems against these attacks. This work proposes a synthetic speech detection system, which uses the spectrogram of Constant Q Transform (CQT) as its input features. The CQT spectrogram provides a constant Q factor in different frequency regions similar to the human perception system. Also, compared with Short-Term Fourier Transform (STFT), CQT provides higher time resolution at higher frequencies and higher frequency resolution at lower frequencies. Additionally, the CQT spectrogram has brought us low input feature dimensions, which aids with reducing needed computation time. The Constant Q Cepstral Coefficients (CQCC) features, driven from cepstral analysis of the CQT, have been employed in some recent works for voice spoofing detection. However, to the best of our knowledge, ours is the first work using CQT magnitude and power spectrogram directly for voice spoofing detection. We also use a combination of self-attended ResNet and one class learning to provide our model the robustness against unseen attacks. Finally, it is observed that even though using input features with relatively lower dimensions and reducing computation time, we can still obtain EER 3.53% and min t-DCF 0.10 on ASVspoof 2019 Logical Access (LA) dataset, which places our model among the top performers in this field.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPIS54653.2021.9729387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Nowadays, biometrics like face, voice, fingerprint, and iris are widely used for the identity authentication of individuals. Automatic Speaker Verification (ASV) systems aim to verify the speaker's authenticity, but recent research has shown that they are vulnerable to various types of attacks. A large number of Text-To-Speech (TTS) and Voice Conversion (VC) methods are being used to create the so-called synthetic or deepfake speech. In recent years, numerous works have been proposed to improve the spoofing detection ability to protect ASV systems against these attacks. This work proposes a synthetic speech detection system, which uses the spectrogram of Constant Q Transform (CQT) as its input features. The CQT spectrogram provides a constant Q factor in different frequency regions similar to the human perception system. Also, compared with Short-Term Fourier Transform (STFT), CQT provides higher time resolution at higher frequencies and higher frequency resolution at lower frequencies. Additionally, the CQT spectrogram has brought us low input feature dimensions, which aids with reducing needed computation time. The Constant Q Cepstral Coefficients (CQCC) features, driven from cepstral analysis of the CQT, have been employed in some recent works for voice spoofing detection. However, to the best of our knowledge, ours is the first work using CQT magnitude and power spectrogram directly for voice spoofing detection. We also use a combination of self-attended ResNet and one class learning to provide our model the robustness against unseen attacks. Finally, it is observed that even though using input features with relatively lower dimensions and reducing computation time, we can still obtain EER 3.53% and min t-DCF 0.10 on ASVspoof 2019 Logical Access (LA) dataset, which places our model among the top performers in this field.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于CQT谱图的深度假语音检测对策
如今,人脸、声音、指纹、虹膜等生物识别技术被广泛用于个人身份认证。自动说话人验证(ASV)系统旨在验证说话人的真实性,但最近的研究表明,它们很容易受到各种类型的攻击。大量的文本到语音(TTS)和语音转换(VC)方法被用来创造所谓的合成或深度假语音。近年来,人们提出了许多工作来提高欺骗检测能力,以保护自动驾驶汽车系统免受这些攻击。本文提出了一种以恒Q变换(CQT)谱图作为输入特征的合成语音检测系统。CQT频谱图提供了一个常数的Q因子在不同的频率区域类似于人类感知系统。此外,与短时傅里叶变换(STFT)相比,CQT在较高频率下提供更高的时间分辨率,在较低频率下提供更高的频率分辨率。此外,CQT谱图为我们带来了低输入特征维数,这有助于减少所需的计算时间。恒定Q倒谱系数(CQCC)特征是由CQT的倒谱分析驱动的,近年来已被用于语音欺骗检测。然而,据我们所知,我们是第一个直接使用CQT幅度和功率谱进行语音欺骗检测的工作。我们还使用了自参加ResNet和一个类学习的组合,以提供我们的模型对看不见的攻击的鲁棒性。最后,我们观察到,即使使用相对较低维度的输入特征并减少计算时间,我们仍然可以在ASVspoof 2019逻辑访问(LA)数据集上获得3.53%的EER和0.10的最小t-DCF,这使得我们的模型在该领域中表现最好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Intelligent Fault Diagnosis of Rolling BearingBased on Deep Transfer Learning Using Time-Frequency Representation Wind Energy Potential Approximation with Various Metaheuristic Optimization Techniques Deployment Listening to Sounds of Silence for Audio replay attack detection Transcranial Magnetic Stimulation of Prefrontal Cortex Alters Functional Brain Network Architecture: Graph Theoretical Analysis Anomaly Detection and Resilience-Oriented Countermeasures against Cyberattacks in Smart Grids
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1