Ruide Zhang, Ning Zhang, A. Moini, W. Lou, Thomas Hou
{"title":"PrivacyScope: Automatic Analysis of Private Data Leakage in TEE-Protected Applications","authors":"Ruide Zhang, Ning Zhang, A. Moini, W. Lou, Thomas Hou","doi":"10.1109/ICDCS47774.2020.00013","DOIUrl":null,"url":null,"abstract":"Big data analytics is having a profound impact on many sectors of the economy by transforming raw data into actionable intelligence. However, increased use of sensitive business and private personal data with no or limited privacy safeguards has raised great concerns among individuals and government regulators. To address the growing tension between the need for data utility and the demand for data privacy, trusted execution environment (TEE) is being used in academic research as well as industrial application as a powerful primitive to enable confidential computation on the private data with only the result disclosed but not the original private data. While much of the current research has been focusing on protecting the TEE against attacks (e.g. side-channel information leakage), the security and privacy of the applications executing inside a TEE enclave has received little attention. The general attitude is that the application is running inside a trusted computing base (TCB), and therefore can be trusted. This assumption may not be valid when it comes to unverified third-party applications.In this paper, we present PrivacyScope, a static code analyzer designed to detect leakage of private data by an application code running in a TEE. PrivacyScope accomplishes this by analyzing the application code and identifying violations of a property called nonreversibility. We introduce nonreversibility since the classical noninterference property falls short of detecting private data leakage in certain scenarios, e.g., in machine learning (ML) programs where the program output is always related to (private) input data. Given its strict reliance on observable state, the noninterference falls short of detecting private data leakage in these situations. By design, PrivacyScope detects both explicit and implicit information leakage. The nonreversibility property is formally defined based on the noninterference property. Additionally, we describe the algorithms for PrivacyScope as extensions to the runtime semantics of a general language. To evaluate the efficacy of our approach and proof-of-feasibility prototype, we apply PrivacyScope to detect data leakage in select open-source ML code modules including linear regression, k-means clustering and collaborative filtering. Also, PrivacyScope can detect intentional data leakage code injected by a programmer. We responsibly disclosed all the discovered vulnerabilities leading to disclosure of private data in the open-source ML program we analyzed.","PeriodicalId":158630,"journal":{"name":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS47774.2020.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Big data analytics is having a profound impact on many sectors of the economy by transforming raw data into actionable intelligence. However, increased use of sensitive business and private personal data with no or limited privacy safeguards has raised great concerns among individuals and government regulators. To address the growing tension between the need for data utility and the demand for data privacy, trusted execution environment (TEE) is being used in academic research as well as industrial application as a powerful primitive to enable confidential computation on the private data with only the result disclosed but not the original private data. While much of the current research has been focusing on protecting the TEE against attacks (e.g. side-channel information leakage), the security and privacy of the applications executing inside a TEE enclave has received little attention. The general attitude is that the application is running inside a trusted computing base (TCB), and therefore can be trusted. This assumption may not be valid when it comes to unverified third-party applications.In this paper, we present PrivacyScope, a static code analyzer designed to detect leakage of private data by an application code running in a TEE. PrivacyScope accomplishes this by analyzing the application code and identifying violations of a property called nonreversibility. We introduce nonreversibility since the classical noninterference property falls short of detecting private data leakage in certain scenarios, e.g., in machine learning (ML) programs where the program output is always related to (private) input data. Given its strict reliance on observable state, the noninterference falls short of detecting private data leakage in these situations. By design, PrivacyScope detects both explicit and implicit information leakage. The nonreversibility property is formally defined based on the noninterference property. Additionally, we describe the algorithms for PrivacyScope as extensions to the runtime semantics of a general language. To evaluate the efficacy of our approach and proof-of-feasibility prototype, we apply PrivacyScope to detect data leakage in select open-source ML code modules including linear regression, k-means clustering and collaborative filtering. Also, PrivacyScope can detect intentional data leakage code injected by a programmer. We responsibly disclosed all the discovered vulnerabilities leading to disclosure of private data in the open-source ML program we analyzed.