Estimating manipulation parameter values is an important problem in image forensics. While several algorithms have been proposed to accomplish this, their application is exclusively limited to one type of image manipulation. These existing techniques are often designed using classical approaches from estimation theory by constructing parametric models of image data. This is problematic since this process of developing a theoretical model then deriving a parameter estimator must be repeated each time a new image manipulation is derived. In this paper, we propose a new data-driven generic approach to performing manipulation parameter estimation. Our proposed approach can be adapted to operate on several different manipulations without requiring a forensic investigator to make substantial changes to the proposed method. To accomplish this, we reformulate estimation as a classification problem by partitioning the parameter space into disjoint subsets such that each parameter subset is assigned a distinct class. Subsequently, we design a constrained CNN-based classifier that is able to extract classification features directly from data as well as estimating the manipulation parameter value in a subject image. Through a set of experiments, we demonstrated the effectiveness of our approach using four different types of manipulations.
{"title":"A Generic Approach Towards Image Manipulation Parameter Estimation Using Convolutional Neural Networks","authors":"Belhassen Bayar, M. Stamm","doi":"10.1145/3082031.3083249","DOIUrl":"https://doi.org/10.1145/3082031.3083249","url":null,"abstract":"Estimating manipulation parameter values is an important problem in image forensics. While several algorithms have been proposed to accomplish this, their application is exclusively limited to one type of image manipulation. These existing techniques are often designed using classical approaches from estimation theory by constructing parametric models of image data. This is problematic since this process of developing a theoretical model then deriving a parameter estimator must be repeated each time a new image manipulation is derived. In this paper, we propose a new data-driven generic approach to performing manipulation parameter estimation. Our proposed approach can be adapted to operate on several different manipulations without requiring a forensic investigator to make substantial changes to the proposed method. To accomplish this, we reformulate estimation as a classification problem by partitioning the parameter space into disjoint subsets such that each parameter subset is assigned a distinct class. Subsequently, we design a constrained CNN-based classifier that is able to extract classification features directly from data as well as estimating the manipulation parameter value in a subject image. Through a set of experiments, we demonstrated the effectiveness of our approach using four different types of manipulations.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"174 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121353140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel feature set for steganalysis of motion vector-based steganography in H.264/AVC. First, the influence of steganographic embedding on the sum of absolute difference (SAD) and the motion vector difference (MVD) is analyzed, and then the statistical characteristics of these two aspects are combined to design features. In terms of SAD, the macroblock partition modes are used to measure the quantization distortion, and by using the optimality of SAD in neighborhood, the partition based neighborhood optimal probability features are extracted. In terms of MVD, it has been proved that MVD is better in feature construction than neighboring motion vector difference (NMVD) which has been widely used by traditional steganalyzers, and thus the inter and intra co-occurrence features are constructed based on the distribution of two components of neighboring MVDs and the distribution of two components of the same MVD. Finally, the combined features are enhanced by window optimal calibration, which utilizes the optimality of both SAD and MVD in a local window area. Experiments on various conditions demonstrate that the proposed scheme generally achieves a more accurate detection than current methods especially for videos encoded in variable block size and high quantization parameter values, and exhibits strong universality in applications.
{"title":"Combined and Calibrated Features for Steganalysis of Motion Vector-Based Steganography in H.264/AVC","authors":"Liming Zhai, Lina Wang, Yanzhen Ren","doi":"10.1145/3082031.3083237","DOIUrl":"https://doi.org/10.1145/3082031.3083237","url":null,"abstract":"This paper presents a novel feature set for steganalysis of motion vector-based steganography in H.264/AVC. First, the influence of steganographic embedding on the sum of absolute difference (SAD) and the motion vector difference (MVD) is analyzed, and then the statistical characteristics of these two aspects are combined to design features. In terms of SAD, the macroblock partition modes are used to measure the quantization distortion, and by using the optimality of SAD in neighborhood, the partition based neighborhood optimal probability features are extracted. In terms of MVD, it has been proved that MVD is better in feature construction than neighboring motion vector difference (NMVD) which has been widely used by traditional steganalyzers, and thus the inter and intra co-occurrence features are constructed based on the distribution of two components of neighboring MVDs and the distribution of two components of the same MVD. Finally, the combined features are enhanced by window optimal calibration, which utilizes the optimality of both SAD and MVD in a local window area. Experiments on various conditions demonstrate that the proposed scheme generally achieves a more accurate detection than current methods especially for videos encoded in variable block size and high quantization parameter values, and exhibits strong universality in applications.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114703227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an effective steganalytic algorithm to detect Discrete Cosine Transform (DCT) based data hiding methods for H.264/AVC videos. These methods hide covert information into compressed video streams by manipulating quantized DCT coefficients, and usually achieve high payload and low computational complexity, which is suitable for applications with hard real-time requirements. In contrast to considerable literature grown up in JPEG domain steganalysis, so far there is few work found against DCT-based methods for compressed videos. In this paper, the embedding impacts on both spatial and temporal correlations are carefully analyzed, based on which two feature sets are designed for steganalysis. The first feature set is engineered as the histograms of noise residuals from the decompressed frames using 16 DCT kernels, in which a quantity measuring residual distortion is accumulated. The second feature set is designed as the residual histograms from the similar blocks linked by motion vectors between inter-frames. The experimental results have demonstrated that our method can effectively distinguish stego videos undergone DCT manipulations from clean ones, especially for those of high qualities.
{"title":"A Steganalytic Algorithm to Detect DCT-based Data Hiding Methods for H.264/AVC Videos","authors":"Peipei Wang, Yun Cao, Xianfeng Zhao, Meineng Zhu","doi":"10.1145/3082031.3083245","DOIUrl":"https://doi.org/10.1145/3082031.3083245","url":null,"abstract":"This paper presents an effective steganalytic algorithm to detect Discrete Cosine Transform (DCT) based data hiding methods for H.264/AVC videos. These methods hide covert information into compressed video streams by manipulating quantized DCT coefficients, and usually achieve high payload and low computational complexity, which is suitable for applications with hard real-time requirements. In contrast to considerable literature grown up in JPEG domain steganalysis, so far there is few work found against DCT-based methods for compressed videos. In this paper, the embedding impacts on both spatial and temporal correlations are carefully analyzed, based on which two feature sets are designed for steganalysis. The first feature set is engineered as the histograms of noise residuals from the decompressed frames using 16 DCT kernels, in which a quantity measuring residual distortion is accumulated. The second feature set is designed as the residual histograms from the similar blocks linked by motion vectors between inter-frames. The experimental results have demonstrated that our method can effectively distinguish stego videos undergone DCT manipulations from clean ones, especially for those of high qualities.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125560099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research has shown that lateral chromatic aberrations (LCA), an imaging fingerprint, can be anti-forensically modified to hide evidence of cut-and-paste forgery. In this paper, we propose a new technique for securing digital images against anti-forensic manipulation of LCA. To do this, we exploit resizing differences between color channels, which are induced by LCA anti-forensics, and define a feature vector to quantitatively capture these differences. Furthermore, we propose a detection method that exposes anti-forensically manipulated image patches. The technique algorithm is validated through experimental procedure, showing dependence on forgery patch size as well as anti-forensic scaling factor.
{"title":"Countering Anti-Forensics of Lateral Chromatic Aberration","authors":"O. Mayer, M. Stamm","doi":"10.1145/3082031.3083242","DOIUrl":"https://doi.org/10.1145/3082031.3083242","url":null,"abstract":"Research has shown that lateral chromatic aberrations (LCA), an imaging fingerprint, can be anti-forensically modified to hide evidence of cut-and-paste forgery. In this paper, we propose a new technique for securing digital images against anti-forensic manipulation of LCA. To do this, we exploit resizing differences between color channels, which are induced by LCA anti-forensics, and define a feature vector to quantitatively capture these differences. Furthermore, we propose a detection method that exposes anti-forensically manipulated image patches. The technique algorithm is validated through experimental procedure, showing dependence on forgery patch size as well as anti-forensic scaling factor.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131464252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Krätzer, A. Makrushin, T. Neubert, M. Hildebrandt, J. Dittmann
Since 2014, a novel approach to attack face image based person verification designated as face morphing attack has been actively discussed in the biometric and media forensics communities. Up until that point, modern travel documents were considered to be extremely hard to forge or to successfully manipulate. In the case of template-targeting attacks like facial morphing, the face verification process becomes vulnerable, making it a necessity to design protection mechanisms. In this paper, a new modeling approach for face morphing attacks is introduced. We start with a life-cycle model for photo-ID documents. We extend this model by an image editing history model, allowing for a precise description of attack realizations as a foundation for performing media forensics as well as training and testing scenarios for the attack detectors. On the basis of these modeling approaches, two different realizations of the face morphing attack as well as a forensic morphing detector are implemented and evaluated. The design of the feature space for the detector is based on the idea that the blending operation in the morphing pipeline causes the reduction of face details. To quantify this reduction, we adopt features implemented in the OpenCV image processing library, namely the number of SIFT, SURF, ORB, FAST and AGAST keypoints in the face region as well as the loss of edge-information with Canny and Sobel edge operators. Our morphing detector is trained with 2000 self-acquired authentic and 2000 morphed images captured with three camera types (Canon EOS 1200D, Nikon D 3300, Nikon Coolpix A100) and tested with authentic and morphed face images from a public database. Morphing detection accuracies of a decision tree classifier vary from 81.3% to 98% for different training and test scenarios.
{"title":"Modeling Attacks on Photo-ID Documents and Applying Media Forensics for the Detection of Facial Morphing","authors":"Christian Krätzer, A. Makrushin, T. Neubert, M. Hildebrandt, J. Dittmann","doi":"10.1145/3082031.3083244","DOIUrl":"https://doi.org/10.1145/3082031.3083244","url":null,"abstract":"Since 2014, a novel approach to attack face image based person verification designated as face morphing attack has been actively discussed in the biometric and media forensics communities. Up until that point, modern travel documents were considered to be extremely hard to forge or to successfully manipulate. In the case of template-targeting attacks like facial morphing, the face verification process becomes vulnerable, making it a necessity to design protection mechanisms. In this paper, a new modeling approach for face morphing attacks is introduced. We start with a life-cycle model for photo-ID documents. We extend this model by an image editing history model, allowing for a precise description of attack realizations as a foundation for performing media forensics as well as training and testing scenarios for the attack detectors. On the basis of these modeling approaches, two different realizations of the face morphing attack as well as a forensic morphing detector are implemented and evaluated. The design of the feature space for the detector is based on the idea that the blending operation in the morphing pipeline causes the reduction of face details. To quantify this reduction, we adopt features implemented in the OpenCV image processing library, namely the number of SIFT, SURF, ORB, FAST and AGAST keypoints in the face region as well as the loss of edge-information with Canny and Sobel edge operators. Our morphing detector is trained with 2000 self-acquired authentic and 2000 morphed images captured with three camera types (Canon EOS 1200D, Nikon D 3300, Nikon Coolpix A100) and tested with authentic and morphed face images from a public database. Morphing detection accuracies of a decision tree classifier vary from 81.3% to 98% for different training and test scenarios.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121988093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although several methods have been proposed for the detection of resampling operations in multimedia signals and the estimation of the resampling factor, the fundamental limits for this forensic task leave open research questions. In this work, we explore the effects that a downsampling operation introduces in the statistics of a 1D signal as a function of the parameters used. We quantify the statistical distance between an original signal and its downsampled version by means of the Kullback-Leibler Divergence (KLD) in case of a wide-sense stationary 1st-order autoregressive signal model. Values of the KLD are derived for different signal parameters, resampling factors and interpolation kernels, thus predicting the achievable hypothesis distinguishability in each case. Our analysis reveals unexpected detectability in case of strong downsampling due to the local correlation structure of the original signal. Moreover, since existing detection methods generally leverage the cyclostationarity of resampled signals, we also address the case where the autocovariance values are estimated directly by means of the sample autocovariance from the signal under investigation. Under the considered assumptions, the Wishart distribution models the sample covariance matrix of a signal segment and the KLD under different hypotheses is derived.
{"title":"Information-theoretic Bounds of Resampling Forensics: New Evidence for Traces Beyond Cyclostationarity","authors":"Cecilia Pasquini, Rainer Böhme","doi":"10.1145/3082031.3083233","DOIUrl":"https://doi.org/10.1145/3082031.3083233","url":null,"abstract":"Although several methods have been proposed for the detection of resampling operations in multimedia signals and the estimation of the resampling factor, the fundamental limits for this forensic task leave open research questions. In this work, we explore the effects that a downsampling operation introduces in the statistics of a 1D signal as a function of the parameters used. We quantify the statistical distance between an original signal and its downsampled version by means of the Kullback-Leibler Divergence (KLD) in case of a wide-sense stationary 1st-order autoregressive signal model. Values of the KLD are derived for different signal parameters, resampling factors and interpolation kernels, thus predicting the achievable hypothesis distinguishability in each case. Our analysis reveals unexpected detectability in case of strong downsampling due to the local correlation structure of the original signal. Moreover, since existing detection methods generally leverage the cyclostationarity of resampled signals, we also address the case where the autocovariance values are estimated directly by means of the sample autocovariance from the signal under investigation. Under the considered assumptions, the Wishart distribution models the sample covariance matrix of a signal segment and the KLD under different hypotheses is derived.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114957801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online users are increasingly being subjected to privacy-invasive tracking across the web for advertisement and surveillance purposes, using IP addresses, cookies, and browser fingerprinting. As web browsing activity shifts to mobile platforms such as smartphones, traditional browser fingerprinting techniques become less effective due to ephemeral IP addresses and uniform software-base. However, device fingerprinting using built-in sensors offers a new avenue for attack. In this talk, I will describe how motion sensors such as accelerometer and gyroscope, embedded in smartphones, can be exploited to track users online. Next, I will discuss the practical aspects of this attack and how it can be used to track users across different sessions under natural web browsing settings. Finally, I will talk about usable countermeasures that we have developed to protect users against such fingerprinting techniques.
{"title":"Every Move You Make: Tracking Smartphone Users through Motion Sensors","authors":"Anupam Das","doi":"10.1145/3082031.3092568","DOIUrl":"https://doi.org/10.1145/3082031.3092568","url":null,"abstract":"Online users are increasingly being subjected to privacy-invasive tracking across the web for advertisement and surveillance purposes, using IP addresses, cookies, and browser fingerprinting. As web browsing activity shifts to mobile platforms such as smartphones, traditional browser fingerprinting techniques become less effective due to ephemeral IP addresses and uniform software-base. However, device fingerprinting using built-in sensors offers a new avenue for attack. In this talk, I will describe how motion sensors such as accelerometer and gyroscope, embedded in smartphones, can be exploited to track users online. Next, I will discuss the practical aspects of this attack and how it can be used to track users across different sessions under natural web browsing settings. Finally, I will talk about usable countermeasures that we have developed to protect users against such fingerprinting techniques.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131081842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a method for normalization of rich feature sets to improve detection accuracy of simple classifiers in steganalysis. It consists of two steps: 1) replacing random subsets of empirical joint probability mass functions (co-occurrences) by their conditional probabilities and 2) applying a non-linear normalization to each element of the feature vector by forcing its marginal distribution over covers to be uniform. We call the first step random conditioning and the second step feature uniformization. When applied to maxSRMd2 features in combination with simple classifiers, we observe a gain in detection accuracy across all tested stego algorithms and payloads. For better insight, we investigate the gain for two image formats. The proposed normalization has a very low computational complexity and does not require any feedback from the stego class.
{"title":"Nonlinear Feature Normalization in Steganalysis","authors":"M. Boroumand, J. Fridrich","doi":"10.1145/3082031.3083239","DOIUrl":"https://doi.org/10.1145/3082031.3083239","url":null,"abstract":"In this paper, we propose a method for normalization of rich feature sets to improve detection accuracy of simple classifiers in steganalysis. It consists of two steps: 1) replacing random subsets of empirical joint probability mass functions (co-occurrences) by their conditional probabilities and 2) applying a non-linear normalization to each element of the feature vector by forcing its marginal distribution over covers to be uniform. We call the first step random conditioning and the second step feature uniformization. When applied to maxSRMd2 features in combination with simple classifiers, we observe a gain in detection accuracy across all tested stego algorithms and payloads. For better insight, we investigate the gain for two image formats. The proposed normalization has a very low computational complexity and does not require any feedback from the stego class.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117165562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, deep learning has achieved breakthrough results in various areas, such as computer vision, audio recognition, and natural language processing. However, just several related works have been investigated for digital multimedia forensics and steganalysis. In this paper, we design a novel CNN (convolutional neural networks) to detect audio steganography in the time domain. Unlike most existing CNN based methods which try to capture media contents, we carefully design the network layers to suppress audio content and adaptively capture the minor modifications introduced by ±1 LSB based steganography. Besides, we use a mix of convolutional layer and max pooling to perform subsampling to achieve good abstraction and prevent over-fitting. In our experiments, we compared our network with six similar network architectures and two traditional methods using handcrafted features. Extensive experimental results evaluated on 40,000 speech audio clips have shown the effectiveness of the proposed convolutional network.
{"title":"Audio Steganalysis with Convolutional Neural Network","authors":"Bolin Chen, Weiqi Luo, Haodong Li","doi":"10.1145/3082031.3083234","DOIUrl":"https://doi.org/10.1145/3082031.3083234","url":null,"abstract":"In recent years, deep learning has achieved breakthrough results in various areas, such as computer vision, audio recognition, and natural language processing. However, just several related works have been investigated for digital multimedia forensics and steganalysis. In this paper, we design a novel CNN (convolutional neural networks) to detect audio steganography in the time domain. Unlike most existing CNN based methods which try to capture media contents, we carefully design the network layers to suppress audio content and adaptively capture the minor modifications introduced by ±1 LSB based steganography. Besides, we use a mix of convolutional layer and max pooling to perform subsampling to achieve good abstraction and prevent over-fitting. In our experiments, we compared our network with six similar network architectures and two traditional methods using handcrafted features. Extensive experimental results evaluated on 40,000 speech audio clips have shown the effectiveness of the proposed convolutional network.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130416791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an empirical study on applying convolutional neural networks (CNNs) to detecting J-UNIWARD -- one of the most secure JPEG steganographic method. Experiments guiding the architectural design of the CNNs have been conducted on the JPEG compressed BOSSBase containing 10,000 covers of size 512×512. Results have verified that both the pooling method and the depth of the CNNs are critical for performance. Results have also proved that a 20-layer CNN, in general, outperforms the most sophisticated feature-based methods, but its advantage gradually diminishes on hard-to-detect cases. To show that the performance generalizes to large-scale databases and to different cover sizes, one experiment has been conducted on the CLS-LOC dataset of ImageNet containing more than one million covers cropped to unified size of 256×256. The proposed 20-layer CNN has cut the error achieved by a CNN recently proposed for large-scale JPEG steganalysis by 35%. Source code is available via GitHub: https://github.com/GuanshuoXu/deep_cnn_jpeg_steganalysis
{"title":"Deep Convolutional Neural Network to Detect J-UNIWARD","authors":"Guanshuo Xu","doi":"10.1145/3082031.3083236","DOIUrl":"https://doi.org/10.1145/3082031.3083236","url":null,"abstract":"This paper presents an empirical study on applying convolutional neural networks (CNNs) to detecting J-UNIWARD -- one of the most secure JPEG steganographic method. Experiments guiding the architectural design of the CNNs have been conducted on the JPEG compressed BOSSBase containing 10,000 covers of size 512×512. Results have verified that both the pooling method and the depth of the CNNs are critical for performance. Results have also proved that a 20-layer CNN, in general, outperforms the most sophisticated feature-based methods, but its advantage gradually diminishes on hard-to-detect cases. To show that the performance generalizes to large-scale databases and to different cover sizes, one experiment has been conducted on the CLS-LOC dataset of ImageNet containing more than one million covers cropped to unified size of 256×256. The proposed 20-layer CNN has cut the error achieved by a CNN recently proposed for large-scale JPEG steganalysis by 35%. Source code is available via GitHub: https://github.com/GuanshuoXu/deep_cnn_jpeg_steganalysis","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129405714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}