Brown Farinholt, Mohammad Rezaeirad, P. Pearce, Hitesh Dharmdasani, Haikuo Yin, Stevens Le Blond, Damon McCoy, Kirill Levchenko
Remote Access Trojans (RATs) give remote attackers interactive control over a compromised machine. Unlike large-scale malware such as botnets, a RAT is controlled individually by a human operator interacting with the compromised machine remotely. The versatility of RATs makes them attractive to actors of all levels of sophistication: they've been used for espionage, information theft, voyeurism and extortion. Despite their increasing use, there are still major gaps in our understanding of RATs and their operators, including motives, intentions, procedures, and weak points where defenses might be most effective. In this work we study the use of DarkComet, a popular commercial RAT. We collected 19,109 samples of DarkComet malware found in the wild, and in the course of two, several-week-long experiments, ran as many samples as possible in our honeypot environment. By monitoring a sample's behavior in our system, we are able to reconstruct the sequence of operator actions, giving us a unique view into operator behavior. We report on the results of 2,747 interactive sessions captured in the course of the experiment. During these sessions operators frequently attempted to interact with victims via remote desktop, to capture video, audio, and keystrokes, and to exfiltrate files and credentials. To our knowledge, we are the first large-scale systematic study of RAT use.
{"title":"To Catch a Ratter: Monitoring the Behavior of Amateur DarkComet RAT Operators in the Wild","authors":"Brown Farinholt, Mohammad Rezaeirad, P. Pearce, Hitesh Dharmdasani, Haikuo Yin, Stevens Le Blond, Damon McCoy, Kirill Levchenko","doi":"10.1109/SP.2017.48","DOIUrl":"https://doi.org/10.1109/SP.2017.48","url":null,"abstract":"Remote Access Trojans (RATs) give remote attackers interactive control over a compromised machine. Unlike large-scale malware such as botnets, a RAT is controlled individually by a human operator interacting with the compromised machine remotely. The versatility of RATs makes them attractive to actors of all levels of sophistication: they've been used for espionage, information theft, voyeurism and extortion. Despite their increasing use, there are still major gaps in our understanding of RATs and their operators, including motives, intentions, procedures, and weak points where defenses might be most effective. In this work we study the use of DarkComet, a popular commercial RAT. We collected 19,109 samples of DarkComet malware found in the wild, and in the course of two, several-week-long experiments, ran as many samples as possible in our honeypot environment. By monitoring a sample's behavior in our system, we are able to reconstruct the sequence of operator actions, giving us a unique view into operator behavior. We report on the results of 2,747 interactive sessions captured in the course of the experiment. During these sessions operators frequently attempted to interact with victims via remote desktop, to capture video, audio, and keystrokes, and to exfiltrate files and credentials. To our knowledge, we are the first large-scale systematic study of RAT use.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"17 1","pages":"770-787"},"PeriodicalIF":0.0,"publicationDate":"2017-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74612835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sumayah A. Alrwais, Xiaojing Liao, Xianghang Mi, Peng Wang, Xiaofeng Wang, Feng Qian, R. Beyah, Damon McCoy
BulletProof Hosting (BPH) services provide criminal actors with technical infrastructure that is resilient to complaints of illicit activities, which serves as a basic building block for streamlining numerous types of attacks. Anecdotal reports have highlighted an emerging trend of these BPH services reselling infrastructure from lower end service providers (hosting ISPs, cloud hosting, and CDNs) instead of from monolithic BPH providers. This has rendered many of the prior methods of detecting BPH less effective, since instead of the infrastructure being highly concentrated within a few malicious Autonomous Systems (ASes) it is now agile and dispersed across a larger set of providers that have a mixture of benign and malicious clients. In this paper, we present the first systematic study on this new trend of BPH services. By collecting and analyzing a large amount of data (25 snapshots of the entire Whois IPv4 address space, 1.5 TB of passive DNS data, and longitudinal data from several blacklist feeds), we are able to identify a set of new features that uniquely characterizes BPH on sub-allocations and that are costly to evade. Based upon these features, we train a classifier for detecting malicious sub-allocated network blocks, achieving a 98% recall and 1.5% false discovery rates according to our evaluation. Using a conservatively trained version of our classifier, we scan the whole IPv4 address space and detect 39K malicious network blocks. This allows us to perform a large-scale study of the BPH service ecosystem, which sheds light on this underground business strategy, including patterns of network blocks being recycled and malicious clients being migrated to different network blocks, in an effort to evade IP address based blacklisting. Our study highlights the trend of agile BPH services and points to potential methods of detecting and mitigating this emerging threat.
{"title":"Under the Shadow of Sunshine: Understanding and Detecting Bulletproof Hosting on Legitimate Service Provider Networks","authors":"Sumayah A. Alrwais, Xiaojing Liao, Xianghang Mi, Peng Wang, Xiaofeng Wang, Feng Qian, R. Beyah, Damon McCoy","doi":"10.1109/SP.2017.32","DOIUrl":"https://doi.org/10.1109/SP.2017.32","url":null,"abstract":"BulletProof Hosting (BPH) services provide criminal actors with technical infrastructure that is resilient to complaints of illicit activities, which serves as a basic building block for streamlining numerous types of attacks. Anecdotal reports have highlighted an emerging trend of these BPH services reselling infrastructure from lower end service providers (hosting ISPs, cloud hosting, and CDNs) instead of from monolithic BPH providers. This has rendered many of the prior methods of detecting BPH less effective, since instead of the infrastructure being highly concentrated within a few malicious Autonomous Systems (ASes) it is now agile and dispersed across a larger set of providers that have a mixture of benign and malicious clients. In this paper, we present the first systematic study on this new trend of BPH services. By collecting and analyzing a large amount of data (25 snapshots of the entire Whois IPv4 address space, 1.5 TB of passive DNS data, and longitudinal data from several blacklist feeds), we are able to identify a set of new features that uniquely characterizes BPH on sub-allocations and that are costly to evade. Based upon these features, we train a classifier for detecting malicious sub-allocated network blocks, achieving a 98% recall and 1.5% false discovery rates according to our evaluation. Using a conservatively trained version of our classifier, we scan the whole IPv4 address space and detect 39K malicious network blocks. This allows us to perform a large-scale study of the BPH service ecosystem, which sheds light on this underground business strategy, including patterns of network blocks being recycled and malicious clients being migrated to different network blocks, in an effort to evade IP address based blacklisting. Our study highlights the trend of agile BPH services and points to potential methods of detecting and mitigating this emerging threat.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"82 1","pages":"805-823"},"PeriodicalIF":0.0,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72735627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Toshinori Araki, A. Barak, Jun Furukawa, Tamar Lichter, Yehuda Lindell, Ariel Nof, Kazuma Ohara, Adi Watzman, Or Weinstein
Secure multiparty computation enables a set of parties to securely carry out a joint computation of their private inputs without revealing anything but the output. In the past few years, the efficiency of secure computation protocols has increased in leaps and bounds. However, when considering the case of security in the presence of malicious adversaries (who may arbitrarily deviate from the protocol specification), we are still very far from achieving high efficiency. In this paper, we consider the specific case of three parties and an honest majority. We provide general techniques for improving efficiency of cut-and-choose protocols on multiplication triples and utilize them to significantly improve the recently published protocol of Furukawa et al. (ePrint 2016/944). We reduce the bandwidth of their protocol down from 10 bits per AND gate to 7 bits per AND gate, and show how to improve some computationally expensive parts of their protocol. Most notably, we design cache-efficient shuffling techniques for implementing cut-and-choose without randomly permuting large arrays (which is very slow due to continual cache misses). We provide a combinatorial analysis of our techniques, bounding the cheating probability of the adversary. Our implementation achieves a rate of approximately 1.15 billion AND gates per second on a cluster of three 20-core machines with a 10Gbps network. Thus, we can securely compute 212,000 AES encryptions per second (which is hundreds of times faster than previous work for this setting). Our results demonstrate that high-throughput secure computation for malicious adversaries is possible.
{"title":"Optimized Honest-Majority MPC for Malicious Adversaries — Breaking the 1 Billion-Gate Per Second Barrier","authors":"Toshinori Araki, A. Barak, Jun Furukawa, Tamar Lichter, Yehuda Lindell, Ariel Nof, Kazuma Ohara, Adi Watzman, Or Weinstein","doi":"10.1109/SP.2017.15","DOIUrl":"https://doi.org/10.1109/SP.2017.15","url":null,"abstract":"Secure multiparty computation enables a set of parties to securely carry out a joint computation of their private inputs without revealing anything but the output. In the past few years, the efficiency of secure computation protocols has increased in leaps and bounds. However, when considering the case of security in the presence of malicious adversaries (who may arbitrarily deviate from the protocol specification), we are still very far from achieving high efficiency. In this paper, we consider the specific case of three parties and an honest majority. We provide general techniques for improving efficiency of cut-and-choose protocols on multiplication triples and utilize them to significantly improve the recently published protocol of Furukawa et al. (ePrint 2016/944). We reduce the bandwidth of their protocol down from 10 bits per AND gate to 7 bits per AND gate, and show how to improve some computationally expensive parts of their protocol. Most notably, we design cache-efficient shuffling techniques for implementing cut-and-choose without randomly permuting large arrays (which is very slow due to continual cache misses). We provide a combinatorial analysis of our techniques, bounding the cheating probability of the adversary. Our implementation achieves a rate of approximately 1.15 billion AND gates per second on a cluster of three 20-core machines with a 10Gbps network. Thus, we can securely compute 212,000 AES encryptions per second (which is hundreds of times faster than previous work for this setting). Our results demonstrate that high-throughput secure computation for malicious adversaries is possible.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"12 1","pages":"843-862"},"PeriodicalIF":0.0,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78975858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent large-scale deployments of differentially private algorithms employ the local model for privacy (sometimes called PRAM or randomized response), where data are randomized on each individual's device before being sent to a server that computes approximate, aggregate statistics. The server need not be trusted for privacy, leaving data control in users' hands. For an important class of convex optimization problems (including logistic regression, support vector machines, and the Euclidean median), the best known locally differentially-private algorithms are highly interactive, requiring as many rounds of back and forth as there are users in the protocol. We ask: how much interaction is necessary to optimize convex functions in the local DP model? Existing lower bounds either do not apply to convex optimization, or say nothing about interaction. We provide new algorithms which are either noninteractive or use relatively few rounds of interaction. We also show lower bounds on the accuracy of an important class of noninteractive algorithms, suggesting a separation between what is possible with and without interaction.
{"title":"Is Interaction Necessary for Distributed Private Learning?","authors":"Adam D. Smith, Abhradeep Thakurta, Jalaj Upadhyay","doi":"10.1109/SP.2017.35","DOIUrl":"https://doi.org/10.1109/SP.2017.35","url":null,"abstract":"Recent large-scale deployments of differentially private algorithms employ the local model for privacy (sometimes called PRAM or randomized response), where data are randomized on each individual's device before being sent to a server that computes approximate, aggregate statistics. The server need not be trusted for privacy, leaving data control in users' hands. For an important class of convex optimization problems (including logistic regression, support vector machines, and the Euclidean median), the best known locally differentially-private algorithms are highly interactive, requiring as many rounds of back and forth as there are users in the protocol. We ask: how much interaction is necessary to optimize convex functions in the local DP model? Existing lower bounds either do not apply to convex optimization, or say nothing about interaction. We provide new algorithms which are either noninteractive or use relatively few rounds of interaction. We also show lower bounds on the accuracy of an important class of noninteractive algorithms, suggesting a separation between what is possible with and without interaction.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"22 1","pages":"58-77"},"PeriodicalIF":0.0,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86594166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katarzyna Olejnik, Italo Dacosta, Joana Soares Machado, Kévin Huguenin, M. E. Khan, J. Hubaux
Permission systems are the main defense that mobile platforms, such as Android and iOS, offer to users to protect their private data from prying apps. However, due to the tension between usability and control, such systems have several limitations that often force users to overshare sensitive data. We address some of these limitations with SmarPer, an advanced permission mechanism for Android. To address the rigidity of current permission systems and their poor matching of users' privacy preferences, SmarPer relies on contextual information and machine learning methods to predict permission decisions at runtime. Note that the goal of SmarPer is to mimic the users' decisions, not to make privacy-preserving decisions per se. Using our SmarPer implementation, we collected 8,521 runtime permission decisions from 41 participants in real conditions. With this unique data set, we show that using an efficient Bayesian linear regression model results in a mean correct classification rate of 80% (±3%). This represents a mean relative reduction of approximately 50% in the number of incorrect decisions when compared with a user-defined static permission policy, i.e., the model used in current permission systems. SmarPer also focuses on the suboptimal trade-off between privacy and utility, instead of only "allow" or "deny" type of decisions, SmarPer also offers an "obfuscate" option where users can still obtain utility by revealing partial information to apps. We implemented obfuscation techniques in SmarPer for different data types and evaluated them during our data collection campaign. Our results show that 73% of the participants found obfuscation useful and it accounted for almost a third of the total number of decisions. In short, we are the first to show, using a large dataset of real in situ permission decisions, that it is possible to learn users' unique decision patterns at runtime using contextual information while supporting data obfuscation, this is an important step towards automating the management of permissions in smartphones.
{"title":"SmarPer: Context-Aware and Automatic Runtime-Permissions for Mobile Devices","authors":"Katarzyna Olejnik, Italo Dacosta, Joana Soares Machado, Kévin Huguenin, M. E. Khan, J. Hubaux","doi":"10.1109/SP.2017.25","DOIUrl":"https://doi.org/10.1109/SP.2017.25","url":null,"abstract":"Permission systems are the main defense that mobile platforms, such as Android and iOS, offer to users to protect their private data from prying apps. However, due to the tension between usability and control, such systems have several limitations that often force users to overshare sensitive data. We address some of these limitations with SmarPer, an advanced permission mechanism for Android. To address the rigidity of current permission systems and their poor matching of users' privacy preferences, SmarPer relies on contextual information and machine learning methods to predict permission decisions at runtime. Note that the goal of SmarPer is to mimic the users' decisions, not to make privacy-preserving decisions per se. Using our SmarPer implementation, we collected 8,521 runtime permission decisions from 41 participants in real conditions. With this unique data set, we show that using an efficient Bayesian linear regression model results in a mean correct classification rate of 80% (±3%). This represents a mean relative reduction of approximately 50% in the number of incorrect decisions when compared with a user-defined static permission policy, i.e., the model used in current permission systems. SmarPer also focuses on the suboptimal trade-off between privacy and utility, instead of only \"allow\" or \"deny\" type of decisions, SmarPer also offers an \"obfuscate\" option where users can still obtain utility by revealing partial information to apps. We implemented obfuscation techniques in SmarPer for different data types and evaluated them during our data collection campaign. Our results show that 73% of the participants found obfuscation useful and it accounted for almost a third of the total number of decisions. In short, we are the first to show, using a large dataset of real in situ permission decisions, that it is possible to learn users' unique decision patterns at runtime using contextual information while supporting data obfuscation, this is an important step towards automating the management of permissions in smartphones.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"8 1","pages":"1058-1076"},"PeriodicalIF":0.0,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90039703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning is widely used in practice to produce predictive models for applications such as image processing, speech and text recognition. These models are more accurate when trained on large amount of data collected from different sources. However, the massive data collection raises privacy concerns. In this paper, we present new and efficient protocols for privacy preserving machine learning for linear regression, logistic regression and neural network training using the stochastic gradient descent method. Our protocols fall in the two-server model where data owners distribute their private data among two non-colluding servers who train various models on the joint data using secure two-party computation (2PC). We develop new techniques to support secure arithmetic operations on shared decimal numbers, and propose MPC-friendly alternatives to non-linear functions such as sigmoid and softmax that are superior to prior work. We implement our system in C++. Our experiments validate that our protocols are several orders of magnitude faster than the state of the art implementations for privacy preserving linear and logistic regressions, and scale to millions of data samples with thousands of features. We also implement the first privacy preserving system for training neural networks.
{"title":"SecureML: A System for Scalable Privacy-Preserving Machine Learning","authors":"Payman Mohassel, Yupeng Zhang","doi":"10.1109/SP.2017.12","DOIUrl":"https://doi.org/10.1109/SP.2017.12","url":null,"abstract":"Machine learning is widely used in practice to produce predictive models for applications such as image processing, speech and text recognition. These models are more accurate when trained on large amount of data collected from different sources. However, the massive data collection raises privacy concerns. In this paper, we present new and efficient protocols for privacy preserving machine learning for linear regression, logistic regression and neural network training using the stochastic gradient descent method. Our protocols fall in the two-server model where data owners distribute their private data among two non-colluding servers who train various models on the joint data using secure two-party computation (2PC). We develop new techniques to support secure arithmetic operations on shared decimal numbers, and propose MPC-friendly alternatives to non-linear functions such as sigmoid and softmax that are superior to prior work. We implement our system in C++. Our experiments validate that our protocols are several orders of magnitude faster than the state of the art implementations for privacy preserving linear and logistic regressions, and scale to millions of data samples with thousands of features. We also implement the first privacy preserving system for training neural networks.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"21 1","pages":"19-38"},"PeriodicalIF":0.0,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81326101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Cortier, C. Drăgan, François Dupressoir, Benedikt Schmidt, Pierre-Yves Strub, B. Warinschi
We provide the first machine-checked proof of privacy-related properties (including ballot privacy) for an electronic voting protocol in the computational model. We target the popular Helios family of voting protocols, for which we identify appropriate levels of abstractions to allow the simplification and convenient reuse of proof steps across many variations of the voting scheme. The resulting framework enables machine-checked security proofs for several hundred variants of Helios and should serve as a stepping stone for the analysis of further variations of the scheme. In addition, we highlight some of the lessons learned regarding the gap between pen-and-paper and machine-checked proofs, and report on the experience with formalizing the security of protocols at this scale.
{"title":"Machine-Checked Proofs of Privacy for Electronic Voting Protocols","authors":"V. Cortier, C. Drăgan, François Dupressoir, Benedikt Schmidt, Pierre-Yves Strub, B. Warinschi","doi":"10.1109/SP.2017.28","DOIUrl":"https://doi.org/10.1109/SP.2017.28","url":null,"abstract":"We provide the first machine-checked proof of privacy-related properties (including ballot privacy) for an electronic voting protocol in the computational model. We target the popular Helios family of voting protocols, for which we identify appropriate levels of abstractions to allow the simplification and convenient reuse of proof steps across many variations of the voting scheme. The resulting framework enables machine-checked security proofs for several hundred variants of Helios and should serve as a stepping stone for the analysis of further variations of the scheme. In addition, we highlight some of the lessons learned regarding the gap between pen-and-paper and machine-checked proofs, and report on the experience with formalizing the security of protocols at this scale.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"6 1","pages":"993-1008"},"PeriodicalIF":0.0,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87379899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Backes, Pascal Berrang, M. Bieg, R. Eils, C. Herrmann, Mathias Humbert, I. Lehmann
Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health. In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.
{"title":"Identifying Personal DNA Methylation Profiles by Genotype Inference","authors":"M. Backes, Pascal Berrang, M. Bieg, R. Eils, C. Herrmann, Mathias Humbert, I. Lehmann","doi":"10.1109/SP.2017.21","DOIUrl":"https://doi.org/10.1109/SP.2017.21","url":null,"abstract":"Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health. In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"137 1","pages":"957-976"},"PeriodicalIF":0.0,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79703781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Programs that take highly-structured files as inputs normally process inputs in stages: syntax parsing, semantic checking, and application execution. Deep bugs are often hidden in the application execution stage, and it is non-trivial to automatically generate test inputs to trigger them. Mutation-based fuzzing generates test inputs by modifying well-formed seed inputs randomly or heuristically. Most inputs are rejected at the early syntax parsing stage. Differently, generation-based fuzzing generates inputs from a specification (e.g., grammar). They can quickly carry the fuzzing beyond the syntax parsing stage. However, most inputs fail to pass the semantic checking (e.g., violating semantic rules), which restricts their capability of discovering deep bugs. In this paper, we propose a novel data-driven seed generation approach, named Skyfire, which leverages the knowledge in the vast amount of existing samples to generate well-distributed seed inputs for fuzzing programs that process highly-structured inputs. Skyfire takes as inputs a corpus and a grammar, and consists of two steps. The first step of Skyfire learns a probabilistic context-sensitive grammar (PCSG) to specify both syntax features and semantic rules, and then the second step leverages the learned PCSG to generate seed inputs. We fed the collected samples and the inputs generated by Skyfire as seeds of AFL to fuzz several open-source XSLT and XML engines (i.e., Sablotron, libxslt, and libxml2). The results have demonstrated that Skyfire can generate well-distributed inputs and thus significantly improve the code coverage (i.e., 20% for line coverage and 15% for function coverage on average) and the bug-finding capability of fuzzers. We also used the inputs generated by Skyfire to fuzz the closed-source JavaScript and rendering engine of Internet Explorer 11. Altogether, we discovered 19 new memory corruption bugs (among which there are 16 new vulnerabilities and received 33.5k USD bug bounty rewards) and 32 denial-of-service bugs.
将高度结构化文件作为输入的程序通常分阶段处理输入:语法解析、语义检查和应用程序执行。深层bug通常隐藏在应用程序执行阶段,自动生成测试输入来触发它们是非常重要的。基于突变的模糊算法通过随机或启发式地修改格式良好的种子输入来生成测试输入。大多数输入在早期语法解析阶段被拒绝。不同的是,基于生成的模糊测试从规范(例如,语法)中生成输入。它们可以快速地将模糊测试带到语法解析阶段之外。然而,大多数输入无法通过语义检查(例如,违反语义规则),这限制了它们发现深层bug的能力。在本文中,我们提出了一种新的数据驱动的种子生成方法,名为Skyfire,它利用大量现有样本中的知识为处理高度结构化输入的模糊程序生成分布良好的种子输入。Skyfire以语料库和语法作为输入,它由两个步骤组成。Skyfire的第一步学习概率上下文敏感语法(PCSG)来指定语法特征和语义规则,然后第二步利用学习到的PCSG来生成种子输入。我们将收集到的样本和Skyfire生成的输入作为AFL的种子来模糊几个开源XSLT和XML引擎(即Sablotron、libxslt和libxml2)。结果表明,Skyfire可以生成分布良好的输入,从而显著提高代码覆盖率(即,平均20%的行覆盖率和15%的函数覆盖率)和fuzzers的bug查找能力。我们还使用Skyfire生成的输入模糊化了Internet Explorer 11的闭源JavaScript和渲染引擎。我们一共发现了19个新的内存破坏漏洞(其中16个新漏洞,获得了33.5万美元的漏洞赏金奖励)和32个拒绝服务漏洞。
{"title":"Skyfire: Data-Driven Seed Generation for Fuzzing","authors":"Junjie Wang, Bihuan Chen, Lei Wei, Yang Liu","doi":"10.1109/SP.2017.23","DOIUrl":"https://doi.org/10.1109/SP.2017.23","url":null,"abstract":"Programs that take highly-structured files as inputs normally process inputs in stages: syntax parsing, semantic checking, and application execution. Deep bugs are often hidden in the application execution stage, and it is non-trivial to automatically generate test inputs to trigger them. Mutation-based fuzzing generates test inputs by modifying well-formed seed inputs randomly or heuristically. Most inputs are rejected at the early syntax parsing stage. Differently, generation-based fuzzing generates inputs from a specification (e.g., grammar). They can quickly carry the fuzzing beyond the syntax parsing stage. However, most inputs fail to pass the semantic checking (e.g., violating semantic rules), which restricts their capability of discovering deep bugs. In this paper, we propose a novel data-driven seed generation approach, named Skyfire, which leverages the knowledge in the vast amount of existing samples to generate well-distributed seed inputs for fuzzing programs that process highly-structured inputs. Skyfire takes as inputs a corpus and a grammar, and consists of two steps. The first step of Skyfire learns a probabilistic context-sensitive grammar (PCSG) to specify both syntax features and semantic rules, and then the second step leverages the learned PCSG to generate seed inputs. We fed the collected samples and the inputs generated by Skyfire as seeds of AFL to fuzz several open-source XSLT and XML engines (i.e., Sablotron, libxslt, and libxml2). The results have demonstrated that Skyfire can generate well-distributed inputs and thus significantly improve the code coverage (i.e., 20% for line coverage and 15% for function coverage on average) and the bug-finding capability of fuzzers. We also used the inputs generated by Skyfire to fuzz the closed-source JavaScript and rendering engine of Internet Explorer 11. Altogether, we discovered 19 new memory corruption bugs (among which there are 16 new vulnerabilities and received 33.5k USD bug bounty rewards) and 32 denial-of-service bugs.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"11 Spec No 1","pages":"579-594"},"PeriodicalIF":0.0,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77767288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Camenisch, Liqun Chen, Manu Drijvers, Anja Lehmann, David Novick, Rainer Urian
The Trusted Platform Module (TPM) is an international standard for a security chip that can be used for the management of cryptographic keys and for remote attestation. The specification of the most recent TPM 2.0 interfaces for direct anonymous attestation unfortunately has a number of severe shortcomings. First of all, they do not allow for security proofs (indeed, the published proofs are incorrect). Second, they provide a Diffie-Hellman oracle w.r.t. the secret key of the TPM, weakening the security and preventing forward anonymity of attestations. Fixes to these problems have been proposed, but they create new issues: they enable a fraudulent TPM to encode information into an attestation signature, which could be used to break anonymity or to leak the secret key. Furthermore, all proposed ways to remove the Diffie-Hellman oracle either strongly limit the functionality of the TPM or would require significant changes to the TPM 2.0 interfaces. In this paper we provide a better specification of the TPM 2.0 interfaces that addresses these problems and requires only minimal changes to the current TPM 2.0 commands. We then show how to use the revised interfaces to build q-SDH-and LRSW-based anonymous attestation schemes, and prove their security. We finally discuss how to obtain other schemes addressing different use cases such as key-binding for U-Prove and e-cash.
{"title":"One TPM to Bind Them All: Fixing TPM 2.0 for Provably Secure Anonymous Attestation","authors":"J. Camenisch, Liqun Chen, Manu Drijvers, Anja Lehmann, David Novick, Rainer Urian","doi":"10.1109/SP.2017.22","DOIUrl":"https://doi.org/10.1109/SP.2017.22","url":null,"abstract":"The Trusted Platform Module (TPM) is an international standard for a security chip that can be used for the management of cryptographic keys and for remote attestation. The specification of the most recent TPM 2.0 interfaces for direct anonymous attestation unfortunately has a number of severe shortcomings. First of all, they do not allow for security proofs (indeed, the published proofs are incorrect). Second, they provide a Diffie-Hellman oracle w.r.t. the secret key of the TPM, weakening the security and preventing forward anonymity of attestations. Fixes to these problems have been proposed, but they create new issues: they enable a fraudulent TPM to encode information into an attestation signature, which could be used to break anonymity or to leak the secret key. Furthermore, all proposed ways to remove the Diffie-Hellman oracle either strongly limit the functionality of the TPM or would require significant changes to the TPM 2.0 interfaces. In this paper we provide a better specification of the TPM 2.0 interfaces that addresses these problems and requires only minimal changes to the current TPM 2.0 commands. We then show how to use the revised interfaces to build q-SDH-and LRSW-based anonymous attestation schemes, and prove their security. We finally discuss how to obtain other schemes addressing different use cases such as key-binding for U-Prove and e-cash.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"24 1","pages":"901-920"},"PeriodicalIF":0.0,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81593958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}