Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190725
Yean-Wei Lee, L. Wong, John See
Haze, which occurs due to the accumulation of fine dust or smoke particles in the atmosphere, degrades outdoor imaging, resulting in reduced attractiveness of outdoor photography and the effectiveness of vision-based systems. In this paper, we present an end-to-end convolutional neural network for image dehazing. Our proposed U-Net based architecture employs Squeeze-and-Excitation (SE) blocks at the skip connections to enforce channel-wise attention and parallelized dilated convolution blocks at the bottleneck to capture both local and global context, resulting in a richer representation of the image features. Experimental results demonstrate the effectiveness of the proposed method in achieving state-of-the-art performance on the benchmark SOTS dataset.
{"title":"Image Dehazing With Contextualized Attentive U-NET","authors":"Yean-Wei Lee, L. Wong, John See","doi":"10.1109/ICIP40778.2020.9190725","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190725","url":null,"abstract":"Haze, which occurs due to the accumulation of fine dust or smoke particles in the atmosphere, degrades outdoor imaging, resulting in reduced attractiveness of outdoor photography and the effectiveness of vision-based systems. In this paper, we present an end-to-end convolutional neural network for image dehazing. Our proposed U-Net based architecture employs Squeeze-and-Excitation (SE) blocks at the skip connections to enforce channel-wise attention and parallelized dilated convolution blocks at the bottleneck to capture both local and global context, resulting in a richer representation of the image features. Experimental results demonstrate the effectiveness of the proposed method in achieving state-of-the-art performance on the benchmark SOTS dataset.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132422590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9191307
N. Bakalos, I. Rallis, N. Doulamis, A. Doulamis, A. Voulodimos, Eftychios E. Protopapadakis
In this paper, we present a deep learning scheme for classification of choreographic primitives from RGB images. The proposed framework combines the representational power of feature maps, extracted by Convolutional Neural Networks, with the long-term dependency modeling capabilities of Long Short-Term Memory recurrent neural networks. In addition, it uses AutoRegressive and Moving Average (ARMA) filter into the convolutionally enriched LSTM filter to face dance dynamic characteristics. Finally, an adaptive weight updating strategy is introduced for improving classification modeling performance The framework is used for the recognition of dance primitives (basic dance postures) and is experimentally validated with real-world sequences of traditional Greek folk dances.
{"title":"Adaptive Convolutionally Enchanced Bi-Directional Lstm Networks For Choreographic Modeling","authors":"N. Bakalos, I. Rallis, N. Doulamis, A. Doulamis, A. Voulodimos, Eftychios E. Protopapadakis","doi":"10.1109/ICIP40778.2020.9191307","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191307","url":null,"abstract":"In this paper, we present a deep learning scheme for classification of choreographic primitives from RGB images. The proposed framework combines the representational power of feature maps, extracted by Convolutional Neural Networks, with the long-term dependency modeling capabilities of Long Short-Term Memory recurrent neural networks. In addition, it uses AutoRegressive and Moving Average (ARMA) filter into the convolutionally enriched LSTM filter to face dance dynamic characteristics. Finally, an adaptive weight updating strategy is introduced for improving classification modeling performance The framework is used for the recognition of dance primitives (basic dance postures) and is experimentally validated with real-world sequences of traditional Greek folk dances.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"48 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132894203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190789
Yicheng Su, J. Korhonen
Typically, some regions of an image are more relevant for its perceived quality than the others. On the other hand, subjective image quality is also affected by low level characteristics, such as sensor noise and sharpness. This is why image rescaling, as often used in object recognition, is not a feasible approach for producing input images for convolutional neural networks (CNN) used for blind image quality prediction. Generally, convolution layer can accept images of arbitrary resolution as input, whereas fully connected (FC) layer only can accept a fixed length feature vector. To solve this problem, we propose weighted spatial pooling (WSP) to aggregate spatial information of any size of weight map, which can be used to replace global average pooling (GAP). In this paper, we present a blind image quality assessment (BIQA) method based on CNN and WSP. Our experimental results show that the prediction accuracy of the proposed method is competitive against the state-of-the-art image quality assessment methods.
{"title":"Blind Natural Image Quality Prediction Using Convolutional Neural Networks And Weighted Spatial Pooling","authors":"Yicheng Su, J. Korhonen","doi":"10.1109/ICIP40778.2020.9190789","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190789","url":null,"abstract":"Typically, some regions of an image are more relevant for its perceived quality than the others. On the other hand, subjective image quality is also affected by low level characteristics, such as sensor noise and sharpness. This is why image rescaling, as often used in object recognition, is not a feasible approach for producing input images for convolutional neural networks (CNN) used for blind image quality prediction. Generally, convolution layer can accept images of arbitrary resolution as input, whereas fully connected (FC) layer only can accept a fixed length feature vector. To solve this problem, we propose weighted spatial pooling (WSP) to aggregate spatial information of any size of weight map, which can be used to replace global average pooling (GAP). In this paper, we present a blind image quality assessment (BIQA) method based on CNN and WSP. Our experimental results show that the prediction accuracy of the proposed method is competitive against the state-of-the-art image quality assessment methods.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132327749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190682
M. Alfarhan, M. Deriche, A. Maalej, G. AlRegib, H. Al-Marzouqi
Seismic data interpretation is a fundamental process in the pipeline of identifying hydrocarbon structural traps such as salt domes and faults. This process is highly demanding and challenging in terms of expert-knowledge, time, and efforts. The interpretation process becomes even more challenging when it comes to identifying multiple seismic events taking place simultaneously. In recent years, the technology trend has been directed towards the automation of seismic interpretation using advanced computational techniques and in particular deep learning (DL) networks. In this paper, we present our DL solution for concurrent salt domes and faults identification with very promising preliminary results obtained through applications to real world seismic data. The proposed workflow leads to excellent detection results even with small size training datasets. Furthermore, the resulting probability maps can be extended to even a larger number of structure types. Precisions of the order of more than 96% were obtained with real data when three types of seismic structures are present concurrently.
{"title":"Multiple Events Detection In Seismic Structures Using A Novel U-Net Variant","authors":"M. Alfarhan, M. Deriche, A. Maalej, G. AlRegib, H. Al-Marzouqi","doi":"10.1109/ICIP40778.2020.9190682","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190682","url":null,"abstract":"Seismic data interpretation is a fundamental process in the pipeline of identifying hydrocarbon structural traps such as salt domes and faults. This process is highly demanding and challenging in terms of expert-knowledge, time, and efforts. The interpretation process becomes even more challenging when it comes to identifying multiple seismic events taking place simultaneously. In recent years, the technology trend has been directed towards the automation of seismic interpretation using advanced computational techniques and in particular deep learning (DL) networks. In this paper, we present our DL solution for concurrent salt domes and faults identification with very promising preliminary results obtained through applications to real world seismic data. The proposed workflow leads to excellent detection results even with small size training datasets. Furthermore, the resulting probability maps can be extended to even a larger number of structure types. Precisions of the order of more than 96% were obtained with real data when three types of seismic structures are present concurrently.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130824252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190642
Yao Peng, D. Chen, Lanfen Lin
Visual relationship is crucial to image understanding and can be applied to many tasks (e.g., image caption and visual question answering). Despite great progress on many vision tasks, relationship detection remains a challenging problem due to the complexity of modeling the widely spread and imbalanced distribution of {subject – predicate – object} triplets. In this paper, we propose a new framework to capture the relative positions and sizes of the subject and object in the feature map and add a new branch to filter out some object pairs that are unlikely to have relationships. In addition, an activation function is trained to increase the probability of some feature maps given an object pair. Experiments on two large datasets, the Visual Relationship Detection (VRD) and Visual Genome (VG) datasets, demonstrate the superiority of our new approach over state-of-the-art methods. Further, ablation study verifies the effectiveness of our techniques.
{"title":"Visual Relationship Detection With A Deep Convolutional Relationship Network","authors":"Yao Peng, D. Chen, Lanfen Lin","doi":"10.1109/ICIP40778.2020.9190642","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190642","url":null,"abstract":"Visual relationship is crucial to image understanding and can be applied to many tasks (e.g., image caption and visual question answering). Despite great progress on many vision tasks, relationship detection remains a challenging problem due to the complexity of modeling the widely spread and imbalanced distribution of {subject – predicate – object} triplets. In this paper, we propose a new framework to capture the relative positions and sizes of the subject and object in the feature map and add a new branch to filter out some object pairs that are unlikely to have relationships. In addition, an activation function is trained to increase the probability of some feature maps given an object pair. Experiments on two large datasets, the Visual Relationship Detection (VRD) and Visual Genome (VG) datasets, demonstrate the superiority of our new approach over state-of-the-art methods. Further, ablation study verifies the effectiveness of our techniques.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130279622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190704
Rige Su, Zhengxue Cheng, Heming Sun, J. Katto
Recently learned image compression has achieved many great progresses, such as representative hyperprior and its variants based on convolutional neural networks (CNNs). However, CNNs are not fit for scalable coding and multiple models need to be trained separately to achieve variable rates. In this paper, we incorporate differentiable quantization and accurate entropy models into recurrent neural networks (RNNs) architectures to achieve a scalable learned image compression. First, we present an RNN architecture with quantization and entropy coding. To realize the scalable coding, we allocate the bits to multiple layers, by adjusting the layer-wise lambda values in Lagrangian multiplier-based rate-distortion optimization function. Second, we add an RNN-based hyperprior to improve the accuracy of entropy models for multiple-layer residual representations. Experimental results demonstrate that our performance can be comparable with recent CNN-based hyperprior methods on Kodak dataset. Besides, our method is a scalable and flexible coding approach, to achieve multiple rates using one single model, which is very appealing.
{"title":"Scalable Learned Image Compression With A Recurrent Neural Networks-Based Hyperprior","authors":"Rige Su, Zhengxue Cheng, Heming Sun, J. Katto","doi":"10.1109/ICIP40778.2020.9190704","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190704","url":null,"abstract":"Recently learned image compression has achieved many great progresses, such as representative hyperprior and its variants based on convolutional neural networks (CNNs). However, CNNs are not fit for scalable coding and multiple models need to be trained separately to achieve variable rates. In this paper, we incorporate differentiable quantization and accurate entropy models into recurrent neural networks (RNNs) architectures to achieve a scalable learned image compression. First, we present an RNN architecture with quantization and entropy coding. To realize the scalable coding, we allocate the bits to multiple layers, by adjusting the layer-wise lambda values in Lagrangian multiplier-based rate-distortion optimization function. Second, we add an RNN-based hyperprior to improve the accuracy of entropy models for multiple-layer residual representations. Experimental results demonstrate that our performance can be comparable with recent CNN-based hyperprior methods on Kodak dataset. Besides, our method is a scalable and flexible coding approach, to achieve multiple rates using one single model, which is very appealing.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130410018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9191077
F. Taher, A. Soliman, Heba Kandil, Ali M. Mahmoud, A. Shalaby, G. Gimel'farb, A. El-Baz
Analyzing cerebrovascular changes using Time-of-Flight Magnetic Resonance Angiography (ToF–MRA) images can detect the presence of serious diseases and track their progress, e.g., hypertension. Such analysis requires accurate segmentation of the vasculature from the surroundings, which motivated us to propose a fully automated cerebral vasculature segmentation approach based on extracting both prior and current appearance features that capture the appearance of macro and micro-vessels. The appearance prior is modeled with a novel translation and rotation invariant Markov-Gibbs Random Field (MGRF) of voxel intensities with pairwise interaction analytically identified from a set of training data sets, while the current appearance is represented with a marginal probability distribution of voxel intensities by using a Linear Combination of Discrete Gaussians (LCDG) whose parameters are estimated by a modified Expectation-Maximization (EM) algorithm. The proposed approach was validated on 190 data sets using three metrics, which revealed high accuracy compared to existing approaches.
{"title":"Precise Cerebrovascular Segmentation","authors":"F. Taher, A. Soliman, Heba Kandil, Ali M. Mahmoud, A. Shalaby, G. Gimel'farb, A. El-Baz","doi":"10.1109/ICIP40778.2020.9191077","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191077","url":null,"abstract":"Analyzing cerebrovascular changes using Time-of-Flight Magnetic Resonance Angiography (ToF–MRA) images can detect the presence of serious diseases and track their progress, e.g., hypertension. Such analysis requires accurate segmentation of the vasculature from the surroundings, which motivated us to propose a fully automated cerebral vasculature segmentation approach based on extracting both prior and current appearance features that capture the appearance of macro and micro-vessels. The appearance prior is modeled with a novel translation and rotation invariant Markov-Gibbs Random Field (MGRF) of voxel intensities with pairwise interaction analytically identified from a set of training data sets, while the current appearance is represented with a marginal probability distribution of voxel intensities by using a Linear Combination of Discrete Gaussians (LCDG) whose parameters are estimated by a modified Expectation-Maximization (EM) algorithm. The proposed approach was validated on 190 data sets using three metrics, which revealed high accuracy compared to existing approaches.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127913574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9191134
S. McCrae, A. Zakhor
3D object detection is a fundamental problem in the space of autonomous driving, and pedestrians are some of the most important objects to detect. The recently introduced PointPillars architecture has been shown to be effective in object detection. It voxelizes 3D LiDAR point clouds to produce a 2D pseudo-image to be used for object detection. In this work, we modify PointPillars to become a recurrent network, using fewer LiDAR frames per forward pass. Specifically, as compared to the original PointPillars model which uses 10 LiDAR frames per forward pass, our recurrent model uses 3 frames and recurrent memory. With this modification, we observe an 8% increase in pedestrian detection and a slight decline in performance on vehicle detection in a coarsely voxelized setting. Furthermore, when given 3 frames of data as input to both models, our recurrent architecture outperforms PointPillars by 21% and 1% in pedestrian and vehicle detection, respectively.
{"title":"3d Object Detection For Autonomous Driving Using Temporal Lidar Data","authors":"S. McCrae, A. Zakhor","doi":"10.1109/ICIP40778.2020.9191134","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191134","url":null,"abstract":"3D object detection is a fundamental problem in the space of autonomous driving, and pedestrians are some of the most important objects to detect. The recently introduced PointPillars architecture has been shown to be effective in object detection. It voxelizes 3D LiDAR point clouds to produce a 2D pseudo-image to be used for object detection. In this work, we modify PointPillars to become a recurrent network, using fewer LiDAR frames per forward pass. Specifically, as compared to the original PointPillars model which uses 10 LiDAR frames per forward pass, our recurrent model uses 3 frames and recurrent memory. With this modification, we observe an 8% increase in pedestrian detection and a slight decline in performance on vehicle detection in a coarsely voxelized setting. Furthermore, when given 3 frames of data as input to both models, our recurrent architecture outperforms PointPillars by 21% and 1% in pedestrian and vehicle detection, respectively.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129241684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9191041
Lingzhi Zhang, P. Kumar, Manuj R. Sabharwal, Andy Kuzma, Jianbo Shi
We present an image compression algorithm that preserves high-frequency details and information of rare occurrences. Our approach can be thought of as image inpainting in the frequency scale space. Given an image, we construct a Laplacian image pyramid, and store only the finest and coarsest levels, thereby removing the middle-frequency of the image. Using a network backbone borrowed from an image super-resolution algorithm, we train our network to hallucinate the missing middle-level Laplacian image. We introduce a novel training paradigm where we train our algorithm using only a face dataset where the faces are aligned and scaled correctly. We demonstrate that image compression learned on this restricted dataset leads to better GAN network [1] convergence and generalization to completely different image domains. We also show that Lapacian inpainting could be simplified further with a few selective pixels as seeds.
{"title":"Image Compression with Laplacian Guided Scale Space Inpainting","authors":"Lingzhi Zhang, P. Kumar, Manuj R. Sabharwal, Andy Kuzma, Jianbo Shi","doi":"10.1109/ICIP40778.2020.9191041","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191041","url":null,"abstract":"We present an image compression algorithm that preserves high-frequency details and information of rare occurrences. Our approach can be thought of as image inpainting in the frequency scale space. Given an image, we construct a Laplacian image pyramid, and store only the finest and coarsest levels, thereby removing the middle-frequency of the image. Using a network backbone borrowed from an image super-resolution algorithm, we train our network to hallucinate the missing middle-level Laplacian image. We introduce a novel training paradigm where we train our algorithm using only a face dataset where the faces are aligned and scaled correctly. We demonstrate that image compression learned on this restricted dataset leads to better GAN network [1] convergence and generalization to completely different image domains. We also show that Lapacian inpainting could be simplified further with a few selective pixels as seeds.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125456882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9191084
S. A. A. Shah, Moise Bougre, Naveed Akhtar, Bennamoun, Liang Zhang
Deep learning has achieved unprecedented performance in object recognition and scene understanding. However, deep models are also found vulnerable to adversarial attacks. Of particular relevance to robotics systems are pixel-level attacks that can completely fool a neural network by altering very few pixels (e.g. 1-5) in an image. We present the first technique to detect the presence of adversarial pixels in images for the robotic systems, employing an Adversarial Detection Network (ADNet). The proposed network efficiently recognize an input as adversarial or clean by discriminating the peculiar activation signals of the adversarial samples from the clean ones. It acts as a defense mechanism for the robotic vision system by detecting and rejecting the adversarial samples. We thoroughly evaluate our technique on three benchmark datasets including CIFAR-10, CIFAR-100 and Fashion MNIST. Results demonstrate effective detection of adversarial samples by ADNet.
{"title":"Efficient Detection of Pixel-Level Adversarial Attacks","authors":"S. A. A. Shah, Moise Bougre, Naveed Akhtar, Bennamoun, Liang Zhang","doi":"10.1109/ICIP40778.2020.9191084","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191084","url":null,"abstract":"Deep learning has achieved unprecedented performance in object recognition and scene understanding. However, deep models are also found vulnerable to adversarial attacks. Of particular relevance to robotics systems are pixel-level attacks that can completely fool a neural network by altering very few pixels (e.g. 1-5) in an image. We present the first technique to detect the presence of adversarial pixels in images for the robotic systems, employing an Adversarial Detection Network (ADNet). The proposed network efficiently recognize an input as adversarial or clean by discriminating the peculiar activation signals of the adversarial samples from the clean ones. It acts as a defense mechanism for the robotic vision system by detecting and rejecting the adversarial samples. We thoroughly evaluate our technique on three benchmark datasets including CIFAR-10, CIFAR-100 and Fashion MNIST. Results demonstrate effective detection of adversarial samples by ADNet.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125482898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}