{"title":"Adaptive feature alignment for adversarial training","authors":"Kai Zhao , Tao Wang , Ruixin Zhang , Wei Shen","doi":"10.1016/j.patrec.2024.10.004","DOIUrl":null,"url":null,"abstract":"<div><div>Recent studies reveal that Convolutional Neural Networks (CNNs) are typically vulnerable to adversarial attacks. Many adversarial defense methods have been proposed to improve the robustness against adversarial samples. Moreover, these methods can only defend adversarial samples of a specific strength, reducing their flexibility against attacks of varying strengths. Moreover, these methods often enhance adversarial robustness at the expense of accuracy on clean samples. In this paper, we first observed that features of adversarial images change monotonically and smoothly w.r.t the rising of attacking strength. This intriguing observation suggests that features of adversarial images with various attacking strengths can be approximated by interpolating between the features of adversarial images with the strongest and weakest attacking strengths. Due to the monotonicity property, the interpolation weight can be easily learned by a neural network. Based on the observation, we proposed the adaptive feature alignment (AFA) that automatically align features to defense adversarial attacks of various attacking strengths. During training, our method learns the statistical information of adversarial samples with various attacking strengths using a dual batchnorm architecture. In this architecture, each batchnorm process handles samples of a specific attacking strength. During inference, our method automatically adjusts to varying attacking strengths by linearly interpolating the dual-BN features. Unlike previous methods that need to either retrain the model or manually tune hyper-parameters for a new attacking strength, our method can deal with arbitrary attacking strengths with a single model without introducing any hyper-parameter. Additionally, our method improves the model robustness against adversarial samples without incurring much loss of accuracy on clean images. Experiments on CIFAR-10, SVHN and tiny-ImageNet datasets demonstrate that our method outperforms the state-of-the-art under various attacking strengths and even improve accuracy on clean samples. Code will be made open available upon acceptance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 184-190"},"PeriodicalIF":3.9000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524002927","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recent studies reveal that Convolutional Neural Networks (CNNs) are typically vulnerable to adversarial attacks. Many adversarial defense methods have been proposed to improve the robustness against adversarial samples. Moreover, these methods can only defend adversarial samples of a specific strength, reducing their flexibility against attacks of varying strengths. Moreover, these methods often enhance adversarial robustness at the expense of accuracy on clean samples. In this paper, we first observed that features of adversarial images change monotonically and smoothly w.r.t the rising of attacking strength. This intriguing observation suggests that features of adversarial images with various attacking strengths can be approximated by interpolating between the features of adversarial images with the strongest and weakest attacking strengths. Due to the monotonicity property, the interpolation weight can be easily learned by a neural network. Based on the observation, we proposed the adaptive feature alignment (AFA) that automatically align features to defense adversarial attacks of various attacking strengths. During training, our method learns the statistical information of adversarial samples with various attacking strengths using a dual batchnorm architecture. In this architecture, each batchnorm process handles samples of a specific attacking strength. During inference, our method automatically adjusts to varying attacking strengths by linearly interpolating the dual-BN features. Unlike previous methods that need to either retrain the model or manually tune hyper-parameters for a new attacking strength, our method can deal with arbitrary attacking strengths with a single model without introducing any hyper-parameter. Additionally, our method improves the model robustness against adversarial samples without incurring much loss of accuracy on clean images. Experiments on CIFAR-10, SVHN and tiny-ImageNet datasets demonstrate that our method outperforms the state-of-the-art under various attacking strengths and even improve accuracy on clean samples. Code will be made open available upon acceptance.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.