{"title":"EG-Booster: Explanation-Guided Booster of ML Evasion Attacks","authors":"Abderrahmen Amich, Birhanu Eshete","doi":"10.1145/3508398.3511510","DOIUrl":null,"url":null,"abstract":"The widespread usage of machine learning (ML) in a myriad of domains has raised questions about its trustworthiness in high-stakes environments. Part of the quest for trustworthy ML is assessing robustness to test-time adversarial examples. Inline with the trustworthy ML goal, a useful input to potentially aid robustness evaluation is feature-based explanations of model predictions. In this paper, we present a novel approach, called EG-Booster, that leverages techniques from explainable ML to guide adversarial example crafting for improved robustness evaluation of ML models. The key insight in EG-Booster is the use of feature-based explanations of model predictions to guide adversarial example crafting by adding consequential perturbations (likely to result in model evasion) and avoiding non-consequential perturbations (unlikely to contribute to evasion). EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used in the literature. We evaluate EG-Booster using image classification benchmark datasets: MNIST and CIFAR10. Our findings suggest that EG-Booster significantly improves the evasion rate of state-of-the-art attacks while performing a smaller number of perturbations. Through extensive experiments that cover four white-box and three black-box attacks, we demonstrate the effectiveness of EG-Booster against two undefended neural networks trained on MNIST and CIFAR10, and an adversarially-trained ResNet model trained on CIFAR10. Furthermore, we introduce a stability assessment metric and evaluate the reliability of our explanation-based attack boosting approach by tracking the similarity between the model's predictions across multiple runs of EG-Booster. Our results over 10 separate runs suggest that EG-Booster's output is stable across distinct runs. Combined with state-of-the-art attacks, we hope EG-Booster will be used towards improved robustness assessment of ML models against evasion attacks.","PeriodicalId":102306,"journal":{"name":"Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508398.3511510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The widespread usage of machine learning (ML) in a myriad of domains has raised questions about its trustworthiness in high-stakes environments. Part of the quest for trustworthy ML is assessing robustness to test-time adversarial examples. Inline with the trustworthy ML goal, a useful input to potentially aid robustness evaluation is feature-based explanations of model predictions. In this paper, we present a novel approach, called EG-Booster, that leverages techniques from explainable ML to guide adversarial example crafting for improved robustness evaluation of ML models. The key insight in EG-Booster is the use of feature-based explanations of model predictions to guide adversarial example crafting by adding consequential perturbations (likely to result in model evasion) and avoiding non-consequential perturbations (unlikely to contribute to evasion). EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used in the literature. We evaluate EG-Booster using image classification benchmark datasets: MNIST and CIFAR10. Our findings suggest that EG-Booster significantly improves the evasion rate of state-of-the-art attacks while performing a smaller number of perturbations. Through extensive experiments that cover four white-box and three black-box attacks, we demonstrate the effectiveness of EG-Booster against two undefended neural networks trained on MNIST and CIFAR10, and an adversarially-trained ResNet model trained on CIFAR10. Furthermore, we introduce a stability assessment metric and evaluate the reliability of our explanation-based attack boosting approach by tracking the similarity between the model's predictions across multiple runs of EG-Booster. Our results over 10 separate runs suggest that EG-Booster's output is stable across distinct runs. Combined with state-of-the-art attacks, we hope EG-Booster will be used towards improved robustness assessment of ML models against evasion attacks.