Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment

Proceedings of the 2021 on Cloud Computing Security Workshop Pub Date : 2021-01-12 DOI:10.1145/3474123.3486764

Mahnush Movahedi, Benjamin M. Case, James Honaker, Andrew Knox, Li Li, Yiming Paul Li, Sanjay Saravanan, Shubho Sengupta, Erik Taubeneck

{"title":"Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment","authors":"Mahnush Movahedi, Benjamin M. Case, James Honaker, Andrew Knox, Li Li, Yiming Paul Li, Sanjay Saravanan, Shubho Sengupta, Erik Taubeneck","doi":"10.1145/3474123.3486764","DOIUrl":null,"url":null,"abstract":"Randomized Controlled Trials, when feasible, give the strongest and most trustworthy empirical measures of causal effects. They are the gold standard in many clinical, social, and behavioral fields of study. However, the most important settings often involve the most sensitive data, therefore cause privacy concerns. In this paper, we outline a way to deploy an end-to-end privacy-preserving protocol for learning causal effects from Randomized Controlled Trials (RCTs). We are particularly focused on the difficult and important case where one party determines which treatment an individual receives, and another party measures outcomes on individuals, and these parties do not want to leak any of their information to each other, but still want to collectively learn a true causal effect in the world. Moreover, we show how such a protocol can be scaled to 500 million rows of data and more than a billion gates. We also offer an open source deployment of this protocol. We accomplish this by a three-stage solution, interconnecting and blending three privacy technologies--private set intersection, multiparty computation, and differential privacy--to address core points of privacy leakage, at the join, at the point of computation, and at the release, respectively. The first stage uses the Private-ID protocol[8] to create a private encrypted join of the users. The second stage utilizes the encrypted join to run multiple instances of a general purpose MPC over a sharded database to aggregate statistics about each experimental group while discarding individuals who took an action before they received treatment. The third stage adds distributed and calibrated Differential Privacy (DP) noise within the final MPC computations to the released aggregate statistical estimates of causal effects and their uncertainty measures, providing formal two-sided privacy guarantees. We also evaluate the performance of multiple open source general purpose MPC libraries for this task. We additionally demonstrate how we have used this to create a working ads effectiveness measurement product capable of measuring hundreds of millions of individuals per experiment.","PeriodicalId":109533,"journal":{"name":"Proceedings of the 2021 on Cloud Computing Security Workshop","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 on Cloud Computing Security Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474123.3486764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Randomized Controlled Trials, when feasible, give the strongest and most trustworthy empirical measures of causal effects. They are the gold standard in many clinical, social, and behavioral fields of study. However, the most important settings often involve the most sensitive data, therefore cause privacy concerns. In this paper, we outline a way to deploy an end-to-end privacy-preserving protocol for learning causal effects from Randomized Controlled Trials (RCTs). We are particularly focused on the difficult and important case where one party determines which treatment an individual receives, and another party measures outcomes on individuals, and these parties do not want to leak any of their information to each other, but still want to collectively learn a true causal effect in the world. Moreover, we show how such a protocol can be scaled to 500 million rows of data and more than a billion gates. We also offer an open source deployment of this protocol. We accomplish this by a three-stage solution, interconnecting and blending three privacy technologies--private set intersection, multiparty computation, and differential privacy--to address core points of privacy leakage, at the join, at the point of computation, and at the release, respectively. The first stage uses the Private-ID protocol[8] to create a private encrypted join of the users. The second stage utilizes the encrypted join to run multiple instances of a general purpose MPC over a sharded database to aggregate statistics about each experimental group while discarding individuals who took an action before they received treatment. The third stage adds distributed and calibrated Differential Privacy (DP) noise within the final MPC computations to the released aggregate statistical estimates of causal effects and their uncertainty measures, providing formal two-sided privacy guarantees. We also evaluate the performance of multiple open source general purpose MPC libraries for this task. We additionally demonstrate how we have used this to create a working ads effectiveness measurement product capable of measuring hundreds of millions of individuals per experiment.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

保护隐私的随机对照试验:工业规模部署的协议

在可行的情况下，随机对照试验提供了最有力、最可信的因果关系实证测量。它们是许多临床、社会和行为研究领域的黄金标准。然而，最重要的设置通常涉及最敏感的数据，因此会引起隐私问题。在本文中，我们概述了一种部署端到端隐私保护协议的方法，用于从随机对照试验(rct)中学习因果效应。我们特别关注困难而重要的案例，即一方决定个体接受何种治疗，另一方衡量个体的结果，这些各方不想向彼此泄露任何信息，但仍然希望共同了解世界上真正的因果关系。此外，我们还展示了如何将这样的协议扩展到5亿行数据和超过10亿个门。我们还提供了该协议的开源部署。我们通过一个三阶段的解决方案来实现这一目标，将三种隐私技术——私有集交叉、多方计算和差分隐私——相互连接和混合，分别在连接点、计算点和发布点解决隐私泄露的核心问题。第一阶段使用private - id协议[8]创建用户的私有加密连接。第二阶段利用加密连接在分片数据库上运行通用MPC的多个实例，以聚合每个实验组的统计数据，同时丢弃在接受治疗之前采取行动的个体。第三阶段将最终MPC计算中的分布式和校准差分隐私(DP)噪声添加到已发布的因果效应及其不确定性度量的汇总统计估计中，提供正式的双边隐私保证。我们还评估了用于此任务的多个开源通用MPC库的性能。我们还演示了我们如何使用它来创建一个有效的广告效果测量产品，该产品能够在每次实验中测量数亿个人。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2021 on Cloud Computing Security Workshop

自引率

0.00%

发文量

期刊最新文献

ACCO: Algebraic Computation with Comparison Proceedings of the 2021 on Cloud Computing Security Workshop Live Migration of Operating System Containers in Encrypted Virtual Machines Programmable Security in the Age of Software-Defined Infrastructure m-Stability: Threshold Security Meets Transferable Utility