{"title":"Perceptually Constrained Fast Adversarial Audio Attacks","authors":"Jason Henry, Mehmet Ergezer, M. Orescanin","doi":"10.1109/ICMLA52953.2021.00135","DOIUrl":null,"url":null,"abstract":"Audio adversarial attacks on deep learning models are of great interest given the commercial success and proliferation of these technologies. These types of attacks have been successfully demonstrated, however, artifacts introduced in the adversarial audio are easily detectable by a human observer. In this work, an expansion of the fast audio adversarial perturbation framework is proposed that can produce an adversarial attack that is imperceptible to a human observer in near-real time using black-box attacks. This is achieved by proposing a perceptually motivated penalty function. We propose a perceptual fast audio adversarial perturbation generator (PFAPG) that employs a loudness constrained loss function, in lieu of a conventional L-2 norm, between the adversarial example and original audio signal. We compare the performance of PFAPG against the conventional constraint based on the MSE on three audio recognition datasets: speaker recognition, speech command, and the Ryerson audiovisual database of emotional speech and song. Our results indicate that, on average, PFAPG equipped with the loudness-constrained loss function yields a 11% higher success rate, while reducing the undesirable distortion artifacts in adversarial audio by 10% dB compared to the prevalent MSE constraints.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"3 1","pages":"819-824"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Audio adversarial attacks on deep learning models are of great interest given the commercial success and proliferation of these technologies. These types of attacks have been successfully demonstrated, however, artifacts introduced in the adversarial audio are easily detectable by a human observer. In this work, an expansion of the fast audio adversarial perturbation framework is proposed that can produce an adversarial attack that is imperceptible to a human observer in near-real time using black-box attacks. This is achieved by proposing a perceptually motivated penalty function. We propose a perceptual fast audio adversarial perturbation generator (PFAPG) that employs a loudness constrained loss function, in lieu of a conventional L-2 norm, between the adversarial example and original audio signal. We compare the performance of PFAPG against the conventional constraint based on the MSE on three audio recognition datasets: speaker recognition, speech command, and the Ryerson audiovisual database of emotional speech and song. Our results indicate that, on average, PFAPG equipped with the loudness-constrained loss function yields a 11% higher success rate, while reducing the undesirable distortion artifacts in adversarial audio by 10% dB compared to the prevalent MSE constraints.