PrivacyScope: Automatic Analysis of Private Data Leakage in TEE-Protected Applications

2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2020-11-01 DOI:10.1109/ICDCS47774.2020.00013

Ruide Zhang, Ning Zhang, A. Moini, W. Lou, Thomas Hou

{"title":"PrivacyScope: Automatic Analysis of Private Data Leakage in TEE-Protected Applications","authors":"Ruide Zhang, Ning Zhang, A. Moini, W. Lou, Thomas Hou","doi":"10.1109/ICDCS47774.2020.00013","DOIUrl":null,"url":null,"abstract":"Big data analytics is having a profound impact on many sectors of the economy by transforming raw data into actionable intelligence. However, increased use of sensitive business and private personal data with no or limited privacy safeguards has raised great concerns among individuals and government regulators. To address the growing tension between the need for data utility and the demand for data privacy, trusted execution environment (TEE) is being used in academic research as well as industrial application as a powerful primitive to enable confidential computation on the private data with only the result disclosed but not the original private data. While much of the current research has been focusing on protecting the TEE against attacks (e.g. side-channel information leakage), the security and privacy of the applications executing inside a TEE enclave has received little attention. The general attitude is that the application is running inside a trusted computing base (TCB), and therefore can be trusted. This assumption may not be valid when it comes to unverified third-party applications.In this paper, we present PrivacyScope, a static code analyzer designed to detect leakage of private data by an application code running in a TEE. PrivacyScope accomplishes this by analyzing the application code and identifying violations of a property called nonreversibility. We introduce nonreversibility since the classical noninterference property falls short of detecting private data leakage in certain scenarios, e.g., in machine learning (ML) programs where the program output is always related to (private) input data. Given its strict reliance on observable state, the noninterference falls short of detecting private data leakage in these situations. By design, PrivacyScope detects both explicit and implicit information leakage. The nonreversibility property is formally defined based on the noninterference property. Additionally, we describe the algorithms for PrivacyScope as extensions to the runtime semantics of a general language. To evaluate the efficacy of our approach and proof-of-feasibility prototype, we apply PrivacyScope to detect data leakage in select open-source ML code modules including linear regression, k-means clustering and collaborative filtering. Also, PrivacyScope can detect intentional data leakage code injected by a programmer. We responsibly disclosed all the discovered vulnerabilities leading to disclosure of private data in the open-source ML program we analyzed.","PeriodicalId":158630,"journal":{"name":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS47774.2020.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Big data analytics is having a profound impact on many sectors of the economy by transforming raw data into actionable intelligence. However, increased use of sensitive business and private personal data with no or limited privacy safeguards has raised great concerns among individuals and government regulators. To address the growing tension between the need for data utility and the demand for data privacy, trusted execution environment (TEE) is being used in academic research as well as industrial application as a powerful primitive to enable confidential computation on the private data with only the result disclosed but not the original private data. While much of the current research has been focusing on protecting the TEE against attacks (e.g. side-channel information leakage), the security and privacy of the applications executing inside a TEE enclave has received little attention. The general attitude is that the application is running inside a trusted computing base (TCB), and therefore can be trusted. This assumption may not be valid when it comes to unverified third-party applications.In this paper, we present PrivacyScope, a static code analyzer designed to detect leakage of private data by an application code running in a TEE. PrivacyScope accomplishes this by analyzing the application code and identifying violations of a property called nonreversibility. We introduce nonreversibility since the classical noninterference property falls short of detecting private data leakage in certain scenarios, e.g., in machine learning (ML) programs where the program output is always related to (private) input data. Given its strict reliance on observable state, the noninterference falls short of detecting private data leakage in these situations. By design, PrivacyScope detects both explicit and implicit information leakage. The nonreversibility property is formally defined based on the noninterference property. Additionally, we describe the algorithms for PrivacyScope as extensions to the runtime semantics of a general language. To evaluate the efficacy of our approach and proof-of-feasibility prototype, we apply PrivacyScope to detect data leakage in select open-source ML code modules including linear regression, k-means clustering and collaborative filtering. Also, PrivacyScope can detect intentional data leakage code injected by a programmer. We responsibly disclosed all the discovered vulnerabilities leading to disclosure of private data in the open-source ML program we analyzed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PrivacyScope: tee保护应用程序中私人数据泄漏的自动分析

大数据分析通过将原始数据转化为可操作的情报，对经济的许多部门产生了深远的影响。然而，在没有或有限的隐私保护措施的情况下，越来越多地使用敏感的商业和私人数据，引起了个人和政府监管机构的极大担忧。为了解决数据实用需求和数据隐私需求之间日益紧张的关系，可信执行环境(TEE)作为一种强大的原语，被用于学术研究和工业应用中，以实现对私有数据的机密计算，仅公开结果而不公开原始私有数据。虽然目前的许多研究都集中在保护TEE免受攻击(例如，侧信道信息泄漏)，但在TEE飞地内执行的应用程序的安全性和隐私性却很少受到关注。一般的态度是，应用程序在可信计算基础(TCB)中运行，因此是可信的。当涉及到未经验证的第三方应用程序时，这种假设可能不成立。在本文中，我们介绍了PrivacyScope，这是一个静态代码分析器，用于检测在TEE中运行的应用程序代码泄漏的私有数据。PrivacyScope通过分析应用程序代码和识别违反称为不可逆性的属性来实现这一点。我们引入了不可逆性，因为经典的不干扰特性在某些情况下无法检测私有数据泄漏，例如，在机器学习(ML)程序中，程序输出总是与(私有)输入数据相关。由于其严格依赖于可观察状态，在这些情况下，不干扰检测无法检测到私有数据泄漏。通过设计，PrivacyScope可以检测显式和隐式信息泄漏。不可逆性是根据不干涉性正式定义的。此外，我们将PrivacyScope的算法描述为对通用语言的运行时语义的扩展。为了评估我们的方法和可行性验证原型的有效性，我们应用PrivacyScope在选择的开源ML代码模块(包括线性回归、k-means聚类和协同过滤)中检测数据泄漏。此外，PrivacyScope可以检测程序员故意注入的数据泄漏代码。我们负责任地披露了我们分析的开源ML程序中发现的所有导致私人数据泄露的漏洞。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助