Network traffic classification has many applications, such as security monitoring, quality of service, traffic engineering, etc. For the aforementioned applications, Deep Packet Inspection (DPI) is a popularly used technique for traffic classification because it scrutinizes the payload and provides comprehensive information for accurate analysis of network traffic. However, DPI-based methods reduce network performance because they are computationally expensive and hinder end-user privacy as they analyze the payload. To overcome these challenges, bit-level signatures are significantly used to perform network traffic classification. However, most of these methods still need to improve performance as they perform one-by-one signature matching of unknown payloads with application signatures for classification. Moreover, these methods become stagnant with the increase in application signatures. Therefore, to fill this gap, we propose OptiClass, an optimized classifier for application protocols using bit-level signatures. OptiClass performs parallel application signature matching with unknown flows, which results in faster, more accurate, and more efficient network traffic classification. OptiClass achieves twofold performance gains compared to the state-of-the-art methods. First, OptiClass generates bit-level signatures of just 32 bits for all the applications. This keeps OptiClass swift and privacy-preserving. Second, OptiClass uses a novel data structure called BiTSPLITTER for signature matching for fast and accurate classification. We evaluated the performance of OptiClass on three datasets consisting of twenty application protocols. Experimental results report that OptiClass has an average recall, precision, and F1-score of 97.36%, 97.38%, and 97.37%, respectively, and an average classification speed of 9.08 times faster than five closely related state-of-the-art methods.
{"title":"OptiClass: An Optimized Classifier for Application Layer Protocols Using Bit Level Signatures","authors":"Mayank Swarnkar, Neha Sharma","doi":"10.1145/3633777","DOIUrl":"https://doi.org/10.1145/3633777","url":null,"abstract":"<p>Network traffic classification has many applications, such as security monitoring, quality of service, traffic engineering, etc. For the aforementioned applications, Deep Packet Inspection (DPI) is a popularly used technique for traffic classification because it scrutinizes the payload and provides comprehensive information for accurate analysis of network traffic. However, DPI-based methods reduce network performance because they are computationally expensive and hinder end-user privacy as they analyze the payload. To overcome these challenges, bit-level signatures are significantly used to perform network traffic classification. However, most of these methods still need to improve performance as they perform one-by-one signature matching of unknown payloads with application signatures for classification. Moreover, these methods become stagnant with the increase in application signatures. Therefore, to fill this gap, we propose <i>OptiClass</i>, an optimized classifier for application protocols using bit-level signatures. <i>OptiClass</i> performs parallel application signature matching with unknown flows, which results in faster, more accurate, and more efficient network traffic classification. <i>OptiClass</i> achieves twofold performance gains compared to the state-of-the-art methods. First, <i>OptiClass</i> generates bit-level signatures of just 32 bits for all the applications. This keeps <i>OptiClass</i> swift and privacy-preserving. Second, <i>OptiClass</i> uses a novel data structure called <i>BiTSPLITTER</i> for signature matching for fast and accurate classification. We evaluated the performance of <i>OptiClass</i> on three datasets consisting of twenty application protocols. Experimental results report that <i>OptiClass</i> has an average recall, precision, and F1-score of 97.36%, 97.38%, and 97.37%, respectively, and an average classification speed of 9.08 times faster than five closely related state-of-the-art methods.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maksim E. Eren, Manish Bhattarai, Robert J. Joyce, Edward Raff, Charles Nicholas, Boian S. Alexandrov
Identification of the family to which a malware specimen belongs is essential in understanding the behavior of the malware and developing mitigation strategies. Solutions proposed by prior work, however, are often not practicable due to the lack of realistic evaluation factors. These factors include learning under class imbalance, the ability to identify new malware, and the cost of production-quality labeled data. In practice, deployed models face prominent, rare, and new malware families. At the same time, obtaining a large quantity of up-to-date labeled malware for training a model can be expensive. In this article, we address these problems and propose a novel hierarchical semi-supervised algorithm, which we call the HNMFk Classifier , that can be used in the early stages of the malware family labeling process. Our method is based on non-negative matrix factorization with automatic model selection, that is, with an estimation of the number of clusters. With HNMFk Classifier , we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families and helps with maintaining the performance of the model when a low quantity of labeled data is used. We perform bulk classification of nearly 2,900 both rare and prominent malware families, through static analysis, using nearly 388,000 samples from the EMBER-2018 corpus. In our experiments, we surpass both supervised and semi-supervised baseline models with an F1 score of 0.80.
{"title":"Semi-supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection","authors":"Maksim E. Eren, Manish Bhattarai, Robert J. Joyce, Edward Raff, Charles Nicholas, Boian S. Alexandrov","doi":"10.1145/3624567","DOIUrl":"https://doi.org/10.1145/3624567","url":null,"abstract":"Identification of the family to which a malware specimen belongs is essential in understanding the behavior of the malware and developing mitigation strategies. Solutions proposed by prior work, however, are often not practicable due to the lack of realistic evaluation factors. These factors include learning under class imbalance, the ability to identify new malware, and the cost of production-quality labeled data. In practice, deployed models face prominent, rare, and new malware families. At the same time, obtaining a large quantity of up-to-date labeled malware for training a model can be expensive. In this article, we address these problems and propose a novel hierarchical semi-supervised algorithm, which we call the HNMFk Classifier , that can be used in the early stages of the malware family labeling process. Our method is based on non-negative matrix factorization with automatic model selection, that is, with an estimation of the number of clusters. With HNMFk Classifier , we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families and helps with maintaining the performance of the model when a low quantity of labeled data is used. We perform bulk classification of nearly 2,900 both rare and prominent malware families, through static analysis, using nearly 388,000 samples from the EMBER-2018 corpus. In our experiments, we surpass both supervised and semi-supervised baseline models with an F1 score of 0.80.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134992865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malware is commonly used by adversaries to compromise and infiltrate cyber systems in order to steal sensitive information or destroy critical assets. Active Cyber Deception (ACD) has emerged as an effective proactive cyber defense against malware to enable misleading adversaries by presenting fake data and engaging them to learn novel attack techniques. However, real-time malware deception is a complex and challenging task because (1) it requires a comprehensive understanding of the malware behaviors at technical and tactical levels in order to create the appropriate deception ploys and resources that can leverage this behavior and mislead malware, and (2) it requires a configurable yet provably valid deception planning to guarantee effective and safe real-time deception orchestration. This article presents symbSODA, a highly configurable and verifiable cyber deception system that analyzes real-world malware using multipath execution to discover API patterns that represent attack techniques/tactics critical for deception, enables users to create their own customized deception ploys based on the malware type and objectives, allows for constructing conflict-free Deception Playbooks , and finally automates the deception orchestration to execute the malware inside a deceptive environment. symbSODA extracts Malicious Sub-graphs (MSGs) consisting of WinAPIs from real-world malware and maps them to tactics and techniques using the ATT&CK framework to facilitate the construction of meaningful user-defined deception playbooks. We conducted a comprehensive evaluation study on symbSODA using 255 recent malware samples. We demonstrated that the accuracy of the end-to-end malware deception is 95% on average, with negligible overhead using various deception goals and strategies. Furthermore, our approach successfully extracted MSGs with a 97% recall, and our MSG-to-MITRE mapping achieved a top-1 accuracy of 88.75%. Our study suggests that symbSODA can serve as a general-purpose Malware Deception Factory to automatically produce customized deception playbooks against arbitrary malware behavior.
{"title":"symbSODA: Configurable and Verifiable Orchestration Automation for Active Malware Deception","authors":"Md Sajidul Islam Sajid, Jinpeng Wei, Ehab Al-Shaer, Qi Duan, Basel Abdeen, Latifur Khan","doi":"10.1145/3624568","DOIUrl":"https://doi.org/10.1145/3624568","url":null,"abstract":"Malware is commonly used by adversaries to compromise and infiltrate cyber systems in order to steal sensitive information or destroy critical assets. Active Cyber Deception (ACD) has emerged as an effective proactive cyber defense against malware to enable misleading adversaries by presenting fake data and engaging them to learn novel attack techniques. However, real-time malware deception is a complex and challenging task because (1) it requires a comprehensive understanding of the malware behaviors at technical and tactical levels in order to create the appropriate deception ploys and resources that can leverage this behavior and mislead malware, and (2) it requires a configurable yet provably valid deception planning to guarantee effective and safe real-time deception orchestration. This article presents symbSODA, a highly configurable and verifiable cyber deception system that analyzes real-world malware using multipath execution to discover API patterns that represent attack techniques/tactics critical for deception, enables users to create their own customized deception ploys based on the malware type and objectives, allows for constructing conflict-free Deception Playbooks , and finally automates the deception orchestration to execute the malware inside a deceptive environment. symbSODA extracts Malicious Sub-graphs (MSGs) consisting of WinAPIs from real-world malware and maps them to tactics and techniques using the ATT&CK framework to facilitate the construction of meaningful user-defined deception playbooks. We conducted a comprehensive evaluation study on symbSODA using 255 recent malware samples. We demonstrated that the accuracy of the end-to-end malware deception is 95% on average, with negligible overhead using various deception goals and strategies. Furthermore, our approach successfully extracted MSGs with a 97% recall, and our MSG-to-MITRE mapping achieved a top-1 accuracy of 88.75%. Our study suggests that symbSODA can serve as a general-purpose Malware Deception Factory to automatically produce customized deception playbooks against arbitrary malware behavior.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134992869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shahnewaz Karim Sakib, George T Amariucai, Yong Guan
Information leakage is usually defined as the logarithmic increment in the adversary’s probability of correctly guessing the legitimate user’s private data or some arbitrary function of the private data when presented with the legitimate user’s publicly disclosed information. However, this definition of information leakage implicitly assumes that both the privacy mechanism and the prior probability of the original data are entirely known to the attacker. In reality, the assumption of complete knowledge of the privacy mechanism for an attacker is often impractical. The attacker can usually have access to only an approximate version of the correct privacy mechanism, computed from a limited set of the disclosed data, for which they can access the corresponding un-distorted data. In this scenario, the conventional definition of leakage no longer has an operational meaning. To address this problem, in this article, we propose novel meaningful information-theoretic metrics for information leakage when the attacker has incomplete information about the privacy mechanism—we call them average subjective leakage , average confidence boost , and average objective leakage , respectively. For the simplest, binary scenario, we demonstrate how to find an optimized privacy mechanism that minimizes the worst-case value of either of these leakages.
{"title":"Measures of Information Leakage for Incomplete Statistical Information: Application to a Binary Privacy Mechanism","authors":"Shahnewaz Karim Sakib, George T Amariucai, Yong Guan","doi":"10.1145/3624982","DOIUrl":"https://doi.org/10.1145/3624982","url":null,"abstract":"Information leakage is usually defined as the logarithmic increment in the adversary’s probability of correctly guessing the legitimate user’s private data or some arbitrary function of the private data when presented with the legitimate user’s publicly disclosed information. However, this definition of information leakage implicitly assumes that both the privacy mechanism and the prior probability of the original data are entirely known to the attacker. In reality, the assumption of complete knowledge of the privacy mechanism for an attacker is often impractical. The attacker can usually have access to only an approximate version of the correct privacy mechanism, computed from a limited set of the disclosed data, for which they can access the corresponding un-distorted data. In this scenario, the conventional definition of leakage no longer has an operational meaning. To address this problem, in this article, we propose novel meaningful information-theoretic metrics for information leakage when the attacker has incomplete information about the privacy mechanism—we call them average subjective leakage , average confidence boost , and average objective leakage , respectively. For the simplest, binary scenario, we demonstrate how to find an optimized privacy mechanism that minimizes the worst-case value of either of these leakages.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134992867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prakash Shrestha, Ahmed Tanvir Mahdad, Nitesh Saxena
Reducing the level of user effort involved in traditional two-factor authentication (TFA) constitutes an important research topic. An interesting representative approach, Sound-Proof , leverages ambient sounds to detect the proximity between the second-factor device (phone) and the login terminal (browser), and eliminates the need for the user to transfer PIN codes. In this paper, we identify a weakness of the Sound-Proof system that makes it completely vulnerable to passive “environment guessing” and active “environment manipulating” remote attackers and proximity attackers. Addressing these security issues, we propose Listening-Watch , a new TFA mechanism based on a wearable device (watch/bracelet) and active browser-generated random speech sounds. As the user attempts to log in, the browser populates a short random code encoded into speech, and the login succeeds if the watch’s audio recording contains this code (decoded using speech recognition ), and is similar enough to the browser’s audio recording. The remote attacker, who has guessed/manipulated the user’s environment, will be defeated since authentication success relies upon the presence of the random code in watch’s recordings. The proximity attacker will also be defeated unless it is extremely close (< 50 cm) to the watch since the wearable microphones are usually designed to capture only nearby sounds (e.g., voice commands).
{"title":"Sound-based Two-Factor Authentication: Vulnerabilities and Redesign","authors":"Prakash Shrestha, Ahmed Tanvir Mahdad, Nitesh Saxena","doi":"10.1145/3632175","DOIUrl":"https://doi.org/10.1145/3632175","url":null,"abstract":"Reducing the level of user effort involved in traditional two-factor authentication (TFA) constitutes an important research topic. An interesting representative approach, Sound-Proof , leverages ambient sounds to detect the proximity between the second-factor device (phone) and the login terminal (browser), and eliminates the need for the user to transfer PIN codes. In this paper, we identify a weakness of the Sound-Proof system that makes it completely vulnerable to passive “environment guessing” and active “environment manipulating” remote attackers and proximity attackers. Addressing these security issues, we propose Listening-Watch , a new TFA mechanism based on a wearable device (watch/bracelet) and active browser-generated random speech sounds. As the user attempts to log in, the browser populates a short random code encoded into speech, and the login succeeds if the watch’s audio recording contains this code (decoded using speech recognition ), and is similar enough to the browser’s audio recording. The remote attacker, who has guessed/manipulated the user’s environment, will be defeated since authentication success relies upon the presence of the random code in watch’s recordings. The proximity attacker will also be defeated unless it is extremely close (< 50 cm) to the watch since the wearable microphones are usually designed to capture only nearby sounds (e.g., voice commands).","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135041828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Zeng, Jiale Liu, Tong Dong, Qingqi Pei, Jianfeng Ma, Yao Liu
Facial recognition technology has been developed and widely used for decades. However, it has also made privacy concerns and researchers’ expectations for facial recognition privacy-preserving technologies. To provide privacy, detailed or semantic contents in face images should be obfuscated. However, face recognition algorithms have to be tailor-designed according to current obfuscation methods, as a result the face recognition service provider has to update its commercial off-the-shelf(COTS) products for each obfuscation method. Meanwhile, current obfuscation methods have no clearly quantified explanation. This paper presents a universal face obfuscation method for a family of face recognition algorithms using global or local structure of eigenvector space. By specific mathematical explanations, we show that the upper bound of the distance between the original and obfuscated face images is smaller than the given recognition threshold. Experiments show that the recognition degradation is 0% for global structure based and 0.3%-5.3% for local structure based, respectively. Meanwhile, we show that even if an attacker knows the whole obfuscation method, he/she has to enumerate all the possible roots of a polynomial with an obfuscation coefficient, which is computationally infeasible to reconstruct original faces. So our method shows a good performance in both privacy and recognition accuracy without modifying recognition algorithms.
{"title":"Eyes See Hazy while Algorithms Recognize Who You Are","authors":"Yong Zeng, Jiale Liu, Tong Dong, Qingqi Pei, Jianfeng Ma, Yao Liu","doi":"10.1145/3632292","DOIUrl":"https://doi.org/10.1145/3632292","url":null,"abstract":"Facial recognition technology has been developed and widely used for decades. However, it has also made privacy concerns and researchers’ expectations for facial recognition privacy-preserving technologies. To provide privacy, detailed or semantic contents in face images should be obfuscated. However, face recognition algorithms have to be tailor-designed according to current obfuscation methods, as a result the face recognition service provider has to update its commercial off-the-shelf(COTS) products for each obfuscation method. Meanwhile, current obfuscation methods have no clearly quantified explanation. This paper presents a universal face obfuscation method for a family of face recognition algorithms using global or local structure of eigenvector space. By specific mathematical explanations, we show that the upper bound of the distance between the original and obfuscated face images is smaller than the given recognition threshold. Experiments show that the recognition degradation is 0% for global structure based and 0.3%-5.3% for local structure based, respectively. Meanwhile, we show that even if an attacker knows the whole obfuscation method, he/she has to enumerate all the possible roots of a polynomial with an obfuscation coefficient, which is computationally infeasible to reconstruct original faces. So our method shows a good performance in both privacy and recognition accuracy without modifying recognition algorithms.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135137594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Cao, Qindong Sun, Yaqi Li, Rong Geng, Xiaoxiong Wang
The existence of adversarial image makes us have to doubt the credibility of artificial intelligence system. Attackers can use carefully processed adversarial images to carry out a variety of attacks. Inspired by the theory of image compressed sensing, this paper proposes a new black-box attack, (mathcal {N}text{-HSA}_{LF} ) . It uses covariance matrix adaptive evolution strategy (CMA-ES) to learn the distribution of adversarial perturbation in low frequency domain, reducing the dimensionality of solution space. And sep-CMA-ES is used to set the covariance matrix as a diagonal matrix, which further reduces the dimensions that need to be updated for the covariance matrix of multivariate Gaussian distribution learned in attacks, thereby reducing the computational cost of attack. And on this basis, we propose history-driven mean update and current optimal solution-guided improvement strategies to avoid the evolution of distribution to a worse direction. The experimental results show that the proposed (mathcal {N}text{-HSA}_{LF} ) can achieve a higher attack success rate with fewer queries on attacking both CNN-based and transformer-based target models under L 2 -norm and L ∞ -norm constraints of perturbation. We also conduct an ablation study and the results show that the proposed improved strategies can effectively reduce the number of visits to the target model when making adversarial examples for hard examples. In addition, our attack is able to make the integrated defense strategy of GRIP-GAN and noise-embedded training ineffective to a certain extent.
{"title":"Efficient History-Driven Adversarial Perturbation Distribution Learning in Low Frequency Domain","authors":"Han Cao, Qindong Sun, Yaqi Li, Rong Geng, Xiaoxiong Wang","doi":"10.1145/3632293","DOIUrl":"https://doi.org/10.1145/3632293","url":null,"abstract":"The existence of adversarial image makes us have to doubt the credibility of artificial intelligence system. Attackers can use carefully processed adversarial images to carry out a variety of attacks. Inspired by the theory of image compressed sensing, this paper proposes a new black-box attack, (mathcal {N}text{-HSA}_{LF} ) . It uses covariance matrix adaptive evolution strategy (CMA-ES) to learn the distribution of adversarial perturbation in low frequency domain, reducing the dimensionality of solution space. And sep-CMA-ES is used to set the covariance matrix as a diagonal matrix, which further reduces the dimensions that need to be updated for the covariance matrix of multivariate Gaussian distribution learned in attacks, thereby reducing the computational cost of attack. And on this basis, we propose history-driven mean update and current optimal solution-guided improvement strategies to avoid the evolution of distribution to a worse direction. The experimental results show that the proposed (mathcal {N}text{-HSA}_{LF} ) can achieve a higher attack success rate with fewer queries on attacking both CNN-based and transformer-based target models under L 2 -norm and L ∞ -norm constraints of perturbation. We also conduct an ablation study and the results show that the proposed improved strategies can effectively reduce the number of visits to the target model when making adversarial examples for hard examples. In addition, our attack is able to make the integrated defense strategy of GRIP-GAN and noise-embedded training ineffective to a certain extent.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135342518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Logging is a key mechanism in the security of computer systems. Beyond supporting important forward security properties, it is critical that logging withstands both failures and intentional tampering to prevent subtle attacks leaving the system in an inconsistent state with inconclusive evidence. We propose new techniques combining forward security with crash recovery for secure log data storage. As the support of specifically forward integrity and the online nature of logging prevent the use of conventional coding, we propose and analyze a coding scheme resolving these unique design constraints. Specifically, our coding enables forward integrity, online encoding, and most importantly a constant number of operations per encoding. It adds a new log item by (mathsf {XOR} ) ing it to k cells of a table. If up to a certain threshold of cells is modified by the adversary, or lost due to a crash, we still guarantee recovery of all stored log items. The main advantage of the coding scheme is its efficiency and compatibility with forward integrity. The key contribution of the paper is the use of spectral graph theory techniques to prove that k is constant in the number n of all log items ever stored and small in practice, e.g., k = 5. Moreover, we prove that to cope with up to (sqrt {n} ) modified or lost log items, storage expansion is constant in n and small in practice. For k = 5, the size of the table is only (12% ) more than the simple concatenation of all n items. We propose and evaluate original techniques to scale the computation cost of recovery to several GBytes of security logs. We instantiate our scheme into an abstract data structure which allows to either detect adversarial modifications to log items or treat modifications like data loss in a system crash. The data structure can recover lost log items, thereby effectively reverting adversarial modifications.
{"title":"Forward Security with Crash Recovery for Secure Logs","authors":"Erik-Oliver Blass, Guevara Noubir","doi":"10.1145/3631524","DOIUrl":"https://doi.org/10.1145/3631524","url":null,"abstract":"Logging is a key mechanism in the security of computer systems. Beyond supporting important forward security properties, it is critical that logging withstands both failures and intentional tampering to prevent subtle attacks leaving the system in an inconsistent state with inconclusive evidence. We propose new techniques combining forward security with crash recovery for secure log data storage. As the support of specifically forward integrity and the online nature of logging prevent the use of conventional coding, we propose and analyze a coding scheme resolving these unique design constraints. Specifically, our coding enables forward integrity, online encoding, and most importantly a constant number of operations per encoding. It adds a new log item by (mathsf {XOR} ) ing it to k cells of a table. If up to a certain threshold of cells is modified by the adversary, or lost due to a crash, we still guarantee recovery of all stored log items. The main advantage of the coding scheme is its efficiency and compatibility with forward integrity. The key contribution of the paper is the use of spectral graph theory techniques to prove that k is constant in the number n of all log items ever stored and small in practice, e.g., k = 5. Moreover, we prove that to cope with up to (sqrt {n} ) modified or lost log items, storage expansion is constant in n and small in practice. For k = 5, the size of the table is only (12% ) more than the simple concatenation of all n items. We propose and evaluate original techniques to scale the computation cost of recovery to several GBytes of security logs. We instantiate our scheme into an abstract data structure which allows to either detect adversarial modifications to log items or treat modifications like data loss in a system crash. The data structure can recover lost log items, thereby effectively reverting adversarial modifications.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135818730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Tang, Qingqing Ye, Haibo Hu, Qiao Xue, Yaxin Xiao, Jin Li
With the rapid growth of DeepFake video techniques, it becomes increasingly challenging to identify them visually, posing a huge threat to our society. Unfortunately, existing detection schemes are limited to exploiting the artifacts left by DeepFake manipulations, so they struggle to keep pace with the ever-improving DeepFake models. In this work, we propose DeepMark, a scalable and robust framework for detecting DeepFakes. It imprints essential visual features of a video into DeepMark Meta (DMM), and uses it to detect DeepFake manipulations by comparing the extracted visual features with the ground truth in DMM. Therefore, DeepMark is future-proof because a DeepFake video must aim to alter some visual feature, no matter how “natural” it looks. Furthermore, DMM also contains a signature for verifying the integrity of the above features. And an essential link to the features as well as their signature is attached with error correction codes and embedded in the video watermark. To improve the efficiency of DMM creation, we also present a threshold-based feature selection scheme and a deduced face detection scheme. Experimental results demonstrate the effectiveness and efficiency of DeepMark on DeepFake video detection under various datasets and parameter settings.
随着DeepFake视频技术的快速发展,在视觉上识别它们变得越来越困难,对我们的社会构成了巨大的威胁。不幸的是,现有的检测方案仅限于利用DeepFake操纵留下的工件,因此它们很难跟上不断改进的DeepFake模型的步伐。在这项工作中,我们提出了DeepMark,一个可扩展和鲁棒的框架,用于检测DeepFakes。它将视频的基本视觉特征刻印到DeepMark Meta (DMM)中,并通过将提取的视觉特征与DMM中的ground truth进行比较来检测DeepFake的操作。因此,DeepMark是面向未来的,因为DeepFake视频必须旨在改变一些视觉特征,无论它看起来多么“自然”。此外,DMM还包含一个签名,用于验证上述特性的完整性。在特征及其签名的关键环节上附加纠错码并嵌入到视频水印中。为了提高DMM的创建效率,我们还提出了一种基于阈值的特征选择方案和一种推导的人脸检测方案。实验结果证明了DeepMark在不同数据集和参数设置下对DeepFake视频检测的有效性和高效性。
{"title":"DeepMark: A Scalable and Robust Framework for DeepFake Video Detection","authors":"Li Tang, Qingqing Ye, Haibo Hu, Qiao Xue, Yaxin Xiao, Jin Li","doi":"10.1145/3629976","DOIUrl":"https://doi.org/10.1145/3629976","url":null,"abstract":"With the rapid growth of DeepFake video techniques, it becomes increasingly challenging to identify them visually, posing a huge threat to our society. Unfortunately, existing detection schemes are limited to exploiting the artifacts left by DeepFake manipulations, so they struggle to keep pace with the ever-improving DeepFake models. In this work, we propose DeepMark, a scalable and robust framework for detecting DeepFakes. It imprints essential visual features of a video into DeepMark Meta (DMM), and uses it to detect DeepFake manipulations by comparing the extracted visual features with the ground truth in DMM. Therefore, DeepMark is future-proof because a DeepFake video must aim to alter some visual feature, no matter how “natural” it looks. Furthermore, DMM also contains a signature for verifying the integrity of the above features. And an essential link to the features as well as their signature is attached with error correction codes and embedded in the video watermark. To improve the efficiency of DMM creation, we also present a threshold-based feature selection scheme and a deduced face detection scheme. Experimental results demonstrate the effectiveness and efficiency of DeepMark on DeepFake video detection under various datasets and parameter settings.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JavaScript is often rated as the most popular programming language for the development of both client-side and server-side applications. Because of its popularity, JavaScript has become a frequent target for attackers who exploit vulnerabilities in the source code to take control over the application. To address these JavaScript security issues, such vulnerabilities must be identified first. Existing studies in vulnerable code detection in JavaScript mostly consider package-level vulnerability tracking and measurements. However, such package-level analysis is largely imprecise as real-world services that include a vulnerable package may not use the vulnerable functions in the package. Moreover, even the inclusion of a vulnerable function may not lead to a security problem, if the function cannot be triggered with exploitable inputs. In this paper, we develop a vulnerability detection framework that uses vulnerable pattern recognition and textual similarity methods to detect vulnerable functions in real-world JavaScript projects, combined with a static multi-file taint analysis mechanism to further assess the impact of the vulnerabilities on the whole project (i.e., whether the vulnerability can be exploited in a given project). We compose a comprehensive dataset of 1,360 verified vulnerable JavaScript functions using the Snyk vulnerability database and the VulnCode-DB project. From this ground-truth dataset, we build our vulnerable patterns for two common vulnerability types: prototype pollution and Regular Expression Denial of Service (ReDoS). With our framework, we analyze 9,205,654 functions (from 3,000 NPM packages, 1892 websites and 557 Chrome Web extensions), and detect 117,601 prototype pollution and 7,333 ReDoS vulnerabilities. By further processing all 5,839 findings from NPM packages with our taint analyzer, we verify the exploitability of 290 zero-day cases across 134 NPM packages. In addition, we conduct an in-depth contextual analysis of the findings in 17 popular/critical projects and study the practical security exposure of 20 functions. With our semi-automated vulnerability reporting functionality, we disclosed all verified findings to project owners. We also obtained 25 published CVEs for our findings, 19 of them rated as “Critical” severity, and six rated as “High” severity. Additionally, we obtained 169 CVEs that are currently “Reserved” (as of Apr. 2023). As evident from the results, our approach can shift JavaScript vulnerability detection from the coarse package/library level to the function level, and thus improve the accuracy of detection and aid timely patching.
{"title":"On Detecting and Measuring Exploitable JavaScript Functions in Real-World Applications","authors":"Maryna Kluban, Mohammad Mannan, Amr Youssef","doi":"10.1145/3630253","DOIUrl":"https://doi.org/10.1145/3630253","url":null,"abstract":"JavaScript is often rated as the most popular programming language for the development of both client-side and server-side applications. Because of its popularity, JavaScript has become a frequent target for attackers who exploit vulnerabilities in the source code to take control over the application. To address these JavaScript security issues, such vulnerabilities must be identified first. Existing studies in vulnerable code detection in JavaScript mostly consider package-level vulnerability tracking and measurements. However, such package-level analysis is largely imprecise as real-world services that include a vulnerable package may not use the vulnerable functions in the package. Moreover, even the inclusion of a vulnerable function may not lead to a security problem, if the function cannot be triggered with exploitable inputs. In this paper, we develop a vulnerability detection framework that uses vulnerable pattern recognition and textual similarity methods to detect vulnerable functions in real-world JavaScript projects, combined with a static multi-file taint analysis mechanism to further assess the impact of the vulnerabilities on the whole project (i.e., whether the vulnerability can be exploited in a given project). We compose a comprehensive dataset of 1,360 verified vulnerable JavaScript functions using the Snyk vulnerability database and the VulnCode-DB project. From this ground-truth dataset, we build our vulnerable patterns for two common vulnerability types: prototype pollution and Regular Expression Denial of Service (ReDoS). With our framework, we analyze 9,205,654 functions (from 3,000 NPM packages, 1892 websites and 557 Chrome Web extensions), and detect 117,601 prototype pollution and 7,333 ReDoS vulnerabilities. By further processing all 5,839 findings from NPM packages with our taint analyzer, we verify the exploitability of 290 zero-day cases across 134 NPM packages. In addition, we conduct an in-depth contextual analysis of the findings in 17 popular/critical projects and study the practical security exposure of 20 functions. With our semi-automated vulnerability reporting functionality, we disclosed all verified findings to project owners. We also obtained 25 published CVEs for our findings, 19 of them rated as “Critical” severity, and six rated as “High” severity. Additionally, we obtained 169 CVEs that are currently “Reserved” (as of Apr. 2023). As evident from the results, our approach can shift JavaScript vulnerability detection from the coarse package/library level to the function level, and thus improve the accuracy of detection and aid timely patching.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134906837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}