{"title":"基于可变形卷积神经网络和时频注意模型的重放攻击检测","authors":"Dang-en Xie, Hai Hu, Qiang Xu","doi":"10.1515/jisys-2022-0265","DOIUrl":null,"url":null,"abstract":"Abstract As an important identity authentication method, speaker verification (SV) has been widely used in many domains, e.g., mobile financials. At the same time, the existing SV systems are insecure under replay spoofing attacks. Toward a more secure and stable SV system, this article proposes a replay attack detection system based on deformable convolutional neural networks (DCNNs) and a time–frequency double-channel attention model. In DCNN, the positions of elements in the convolutional kernel are not fixed. Instead, they are modified by some trainable variable to help the model extract more useful local information from input spectrograms. Meanwhile, a time–frequency domino double-channel attention model is adopted to extract more effective distinctive features to collect valuable information for distinguishing genuine and replay speeches. Experimental results on ASVspoof 2019 dataset show that the proposed model can detect replay attacks accurately.","PeriodicalId":46139,"journal":{"name":"Journal of Intelligent Systems","volume":"23 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Replay attack detection based on deformable convolutional neural network and temporal-frequency attention model\",\"authors\":\"Dang-en Xie, Hai Hu, Qiang Xu\",\"doi\":\"10.1515/jisys-2022-0265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract As an important identity authentication method, speaker verification (SV) has been widely used in many domains, e.g., mobile financials. At the same time, the existing SV systems are insecure under replay spoofing attacks. Toward a more secure and stable SV system, this article proposes a replay attack detection system based on deformable convolutional neural networks (DCNNs) and a time–frequency double-channel attention model. In DCNN, the positions of elements in the convolutional kernel are not fixed. Instead, they are modified by some trainable variable to help the model extract more useful local information from input spectrograms. Meanwhile, a time–frequency domino double-channel attention model is adopted to extract more effective distinctive features to collect valuable information for distinguishing genuine and replay speeches. Experimental results on ASVspoof 2019 dataset show that the proposed model can detect replay attacks accurately.\",\"PeriodicalId\":46139,\"journal\":{\"name\":\"Journal of Intelligent Systems\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/jisys-2022-0265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jisys-2022-0265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Replay attack detection based on deformable convolutional neural network and temporal-frequency attention model
Abstract As an important identity authentication method, speaker verification (SV) has been widely used in many domains, e.g., mobile financials. At the same time, the existing SV systems are insecure under replay spoofing attacks. Toward a more secure and stable SV system, this article proposes a replay attack detection system based on deformable convolutional neural networks (DCNNs) and a time–frequency double-channel attention model. In DCNN, the positions of elements in the convolutional kernel are not fixed. Instead, they are modified by some trainable variable to help the model extract more useful local information from input spectrograms. Meanwhile, a time–frequency domino double-channel attention model is adopted to extract more effective distinctive features to collect valuable information for distinguishing genuine and replay speeches. Experimental results on ASVspoof 2019 dataset show that the proposed model can detect replay attacks accurately.
期刊介绍:
The Journal of Intelligent Systems aims to provide research and review papers, as well as Brief Communications at an interdisciplinary level, with the field of intelligent systems providing the focal point. This field includes areas like artificial intelligence, models and computational theories of human cognition, perception and motivation; brain models, artificial neural nets and neural computing. It covers contributions from the social, human and computer sciences to the analysis and application of information technology.