Privacy-Preserved Data Sharing for Evidence-Based Policy Decisions: A Demonstration Project Using Human Services Administrative Records for Evidence-Building Activities

N. Hart, David Archer, Erin Dalton
{"title":"Privacy-Preserved Data Sharing for Evidence-Based Policy Decisions: A Demonstration Project Using Human Services Administrative Records for Evidence-Building Activities","authors":"N. Hart, David Archer, Erin Dalton","doi":"10.2139/ssrn.3808054","DOIUrl":null,"url":null,"abstract":"Emerging privacy-preserving technologies and approaches hold considerable promise for improving data privacy and confidentiality in the 21st century. At the same time, more information is becoming accessible to support evidence-based policymaking.<br><br>In 2017, the U.S. Commission on Evidence-Based Policymaking unanimously recommended that further attention be given to the deployment of privacy-preserving data-sharing applications. If these types of applications can be tested and scaled in the near-term, they could vastly improve insights about important policy problems by using disparate datasets. At the same time, the approaches could promote substantial gains in privacy for the American public.<br><br>There are numerous ways to engage in privacy-preserving data sharing. This paper primarily focuses on secure computation, which allows information to be accessed securely, guarantees privacy, and permits analysis without making private information available. Three key issues motivated the launch of a domestic secure computation demonstration project using real government-collected data:<br><br>--Using new privacy-preserving approaches addresses pressing needs in society. Current widely accepted approaches to managing privacy risks—like preventing the identification of individuals or organizations in public datasets—will become less effective over time. While there are many practices currently in use to keep government-collected data confidential, they do not often incorporate modern developments in computer science, mathematics, and statistics in a timely way. New approaches can enable researchers to combine datasets to improve the capability for insights, without being impeded by traditional concerns about bringing large, identifiable datasets together. In fact, if successful, traditional approaches to combining data for analysis may not be as necessary.<br><br>--There are emerging technical applications to deploy certain privacy-preserving approaches in targeted settings. These emerging procedures are increasingly enabling larger-scale testing of privacy-preserving approaches across a variety of policy domains, governmental jurisdictions, and agency settings to demonstrate the privacy guarantees that accompany data access and use.<br><br>--Widespread adoption and use by public administrators will only follow meaningful and successful demonstration projects. For example, secure computation approaches are complex and can be difficult to understand for those unfamiliar with their potential. Implementing new privacy-preserving approaches will require thoughtful attention to public policy implications, public opinions, legal restrictions, and other administrative limitations that vary by agency and governmental entity.<br>This project used real-world government data to illustrate the applicability of secure computation compared to the classic data infrastructure available to some local governments. The project took place in a domestic, non-intelligence setting to increase the salience of potential lessons for public agencies.<br><br>Data obtained under a confidentiality agreement from Allegheny County’s Department of Human Services in Pennsylvania were analyzed to generate basic insights using privacy-preserving platforms. The analysis required merging more than 2 million records from five datasets owned by multiple government agencies in Allegheny County. Specifically, the demonstration relied on individual-level records about services to the homeless, mental health services, causes and incidences of mortality, family interventions, and incarceration to analyze four key questions about the proportion of: (1) people serving a sentence in jail who received publicly-funded mental health services; (2) parents involved in child welfare cases who received publicly-funded mental health services; (3) people serving a sentence in jail who received homelessness services; and (4) suicide victims who previously received publicly-funded mental health services. To BPC’s knowledge, this demonstration is the first of its kind completed in the human services field.<br><br>To demonstrate and characterize applicability of privacy-preserving computation for these analyses, the project team performed them on two distinct privacy-preserving platforms. The first platform, called Jana and developed as part of the Brandeis program for the Defense Advanced Research Projects Agency, achieves secure computation entirely in software. Jana uses a combination of encryption techniques to protect data while at rest and in transit, and uses secure multiparty computation to protect data during computation. Specifically, Jana uses multiple servers to perform computation on cryptographic secret shares of data, while assuring that those servers never see the data in decrypted form.<br><br>The second platform, called FIDES and developed as part of the IMPACT program for the U.S. Department of Homeland Security, achieves secure computation via a hardware-enabled cryptographic enclave. Specifically, FIDES uses an Intel Corporation processor and the Intel Software Guard Extensions to compute in an area of the processor that is restricted from access by other code running on the computer, including the computer’s own operating system. No part of the processor or software, aside from that hardware-secured enclave, ever sees the data in decrypted form.<br><br>These two privacy-preserving computation platforms offer similar approaches: data arrive at the computation platform already encrypted, analysis is performed in ways that strictly do not reveal anything about the data, and results are securely provided to users. The goal in these experiments was to compare these two approaches with a classic data analysis setting. Successful completion of the demonstration with human services data yielded the following insights:<br><br>--The experiments produced valid, reliable results. Both platforms generated valid results consistent with traditional data analysis approaches. This outcome suggests that the queries using these privacy-preserving approaches are not subject to diminished quality that would affect the validity or reliability of statistical conclusions. Therefore, multiparty computation models satisfy the demonstration’s core criteria for enabling data use and privacy preservation.<br><br>--The efficiency of the experiments presents a trade-off for policymakers. Different modes of operationalizing the privacy-preserving technologies offer trade-offs for answer timeliness. Analyses with nearly 200,000 records using the software-based approach required nearly three hours to complete, whereas the same queries in the hardware-enabled environment returned results in one-tenth of a second. These times have substantial implications for applications in government operations with rapid decision-making architectures.<br><br>These findings suggest that these approaches offer considerable promise for public policy in achieving improved data analysis and tangible privacy protections at the same time. However, effort is still needed to further develop privacy-preserving technologies to make their deployment more time efficient prior to widespread use in government agencies. The scope and scale of such deployments will likely have either substantial cost implications or substantial delays in response times for computation, depending on the desired trade-off for the privacy-preserving approach. In addition to developing technical precision for privacy guarantees, further development of the technologies must also include learning about approaches for deploying the protections within complex organizational or governmental infrastructures and legal frameworks that may not explicitly encourage such activities.<br><br>This demonstration project offers a compelling example of how the technologies can be deployed—which can advance consideration of the approach within domestic, non-intelligence agencies at all levels of government.","PeriodicalId":179517,"journal":{"name":"Information Privacy Law eJournal","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Privacy Law eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3808054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Emerging privacy-preserving technologies and approaches hold considerable promise for improving data privacy and confidentiality in the 21st century. At the same time, more information is becoming accessible to support evidence-based policymaking.

In 2017, the U.S. Commission on Evidence-Based Policymaking unanimously recommended that further attention be given to the deployment of privacy-preserving data-sharing applications. If these types of applications can be tested and scaled in the near-term, they could vastly improve insights about important policy problems by using disparate datasets. At the same time, the approaches could promote substantial gains in privacy for the American public.

There are numerous ways to engage in privacy-preserving data sharing. This paper primarily focuses on secure computation, which allows information to be accessed securely, guarantees privacy, and permits analysis without making private information available. Three key issues motivated the launch of a domestic secure computation demonstration project using real government-collected data:

--Using new privacy-preserving approaches addresses pressing needs in society. Current widely accepted approaches to managing privacy risks—like preventing the identification of individuals or organizations in public datasets—will become less effective over time. While there are many practices currently in use to keep government-collected data confidential, they do not often incorporate modern developments in computer science, mathematics, and statistics in a timely way. New approaches can enable researchers to combine datasets to improve the capability for insights, without being impeded by traditional concerns about bringing large, identifiable datasets together. In fact, if successful, traditional approaches to combining data for analysis may not be as necessary.

--There are emerging technical applications to deploy certain privacy-preserving approaches in targeted settings. These emerging procedures are increasingly enabling larger-scale testing of privacy-preserving approaches across a variety of policy domains, governmental jurisdictions, and agency settings to demonstrate the privacy guarantees that accompany data access and use.

--Widespread adoption and use by public administrators will only follow meaningful and successful demonstration projects. For example, secure computation approaches are complex and can be difficult to understand for those unfamiliar with their potential. Implementing new privacy-preserving approaches will require thoughtful attention to public policy implications, public opinions, legal restrictions, and other administrative limitations that vary by agency and governmental entity.
This project used real-world government data to illustrate the applicability of secure computation compared to the classic data infrastructure available to some local governments. The project took place in a domestic, non-intelligence setting to increase the salience of potential lessons for public agencies.

Data obtained under a confidentiality agreement from Allegheny County’s Department of Human Services in Pennsylvania were analyzed to generate basic insights using privacy-preserving platforms. The analysis required merging more than 2 million records from five datasets owned by multiple government agencies in Allegheny County. Specifically, the demonstration relied on individual-level records about services to the homeless, mental health services, causes and incidences of mortality, family interventions, and incarceration to analyze four key questions about the proportion of: (1) people serving a sentence in jail who received publicly-funded mental health services; (2) parents involved in child welfare cases who received publicly-funded mental health services; (3) people serving a sentence in jail who received homelessness services; and (4) suicide victims who previously received publicly-funded mental health services. To BPC’s knowledge, this demonstration is the first of its kind completed in the human services field.

To demonstrate and characterize applicability of privacy-preserving computation for these analyses, the project team performed them on two distinct privacy-preserving platforms. The first platform, called Jana and developed as part of the Brandeis program for the Defense Advanced Research Projects Agency, achieves secure computation entirely in software. Jana uses a combination of encryption techniques to protect data while at rest and in transit, and uses secure multiparty computation to protect data during computation. Specifically, Jana uses multiple servers to perform computation on cryptographic secret shares of data, while assuring that those servers never see the data in decrypted form.

The second platform, called FIDES and developed as part of the IMPACT program for the U.S. Department of Homeland Security, achieves secure computation via a hardware-enabled cryptographic enclave. Specifically, FIDES uses an Intel Corporation processor and the Intel Software Guard Extensions to compute in an area of the processor that is restricted from access by other code running on the computer, including the computer’s own operating system. No part of the processor or software, aside from that hardware-secured enclave, ever sees the data in decrypted form.

These two privacy-preserving computation platforms offer similar approaches: data arrive at the computation platform already encrypted, analysis is performed in ways that strictly do not reveal anything about the data, and results are securely provided to users. The goal in these experiments was to compare these two approaches with a classic data analysis setting. Successful completion of the demonstration with human services data yielded the following insights:

--The experiments produced valid, reliable results. Both platforms generated valid results consistent with traditional data analysis approaches. This outcome suggests that the queries using these privacy-preserving approaches are not subject to diminished quality that would affect the validity or reliability of statistical conclusions. Therefore, multiparty computation models satisfy the demonstration’s core criteria for enabling data use and privacy preservation.

--The efficiency of the experiments presents a trade-off for policymakers. Different modes of operationalizing the privacy-preserving technologies offer trade-offs for answer timeliness. Analyses with nearly 200,000 records using the software-based approach required nearly three hours to complete, whereas the same queries in the hardware-enabled environment returned results in one-tenth of a second. These times have substantial implications for applications in government operations with rapid decision-making architectures.

These findings suggest that these approaches offer considerable promise for public policy in achieving improved data analysis and tangible privacy protections at the same time. However, effort is still needed to further develop privacy-preserving technologies to make their deployment more time efficient prior to widespread use in government agencies. The scope and scale of such deployments will likely have either substantial cost implications or substantial delays in response times for computation, depending on the desired trade-off for the privacy-preserving approach. In addition to developing technical precision for privacy guarantees, further development of the technologies must also include learning about approaches for deploying the protections within complex organizational or governmental infrastructures and legal frameworks that may not explicitly encourage such activities.

This demonstration project offers a compelling example of how the technologies can be deployed—which can advance consideration of the approach within domestic, non-intelligence agencies at all levels of government.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于证据的政策决策的隐私保护数据共享:利用人类服务行政记录进行证据建设活动的示范项目
新兴的隐私保护技术和方法为改善21世纪的数据隐私和机密性带来了巨大的希望。与此同时,越来越多的信息可以用于支持基于证据的决策。2017年,美国循证政策制定委员会一致建议进一步关注保护隐私的数据共享应用程序的部署。如果这些类型的应用程序可以在短期内进行测试和扩展,它们可以通过使用不同的数据集大大提高对重要政策问题的洞察力。与此同时,这些方法可以促进美国公众在隐私方面取得实质性进展。有许多方法可以参与保护隐私的数据共享。本文主要关注安全计算,它允许安全访问信息,保证隐私,并允许在不使私有信息可用的情况下进行分析。三个关键问题推动了使用政府收集的真实数据启动国内安全计算示范项目:使用新的隐私保护方法解决了社会的迫切需求。目前被广泛接受的管理隐私风险的方法——比如防止在公共数据集中识别个人或组织——将随着时间的推移而变得不那么有效。虽然目前有许多做法用于保护政府收集的数据的机密性,但它们往往没有及时结合计算机科学、数学和统计学的现代发展。新的方法可以使研究人员能够组合数据集来提高洞察力的能力,而不会受到将大型可识别数据集放在一起的传统担忧的阻碍。事实上,如果成功的话,传统的结合数据进行分析的方法可能就没有必要了。—在目标设置中部署某些隐私保护方法的新兴技术应用。这些新兴的程序越来越多地支持在各种政策领域、政府管辖范围和机构设置中对隐私保护方法进行大规模测试,以证明数据访问和使用所带来的隐私保证。——只有在有意义和成功的示范项目之后,公共管理人员才能广泛采用和使用。例如,安全计算方法是复杂的,对于不熟悉其潜力的人来说可能难以理解。实施新的隐私保护方法将需要对公共政策影响、公众意见、法律限制和其他因机构和政府实体而异的行政限制进行深思熟虑的关注。该项目使用真实的政府数据来说明与一些地方政府可用的经典数据基础设施相比,安全计算的适用性。该项目在一个国内、非情报机构的环境中进行,以增加公共机构潜在经验教训的重要性。根据保密协议从宾夕法尼亚州阿勒格尼县人类服务部获得的数据进行分析,使用隐私保护平台生成基本见解。该分析需要合并来自阿勒格尼县多个政府机构拥有的五个数据集的200多万条记录。具体而言,该演示依赖于对无家可归者的服务、心理健康服务、死亡原因和发生率、家庭干预和监禁的个人记录,以分析有关以下比例的四个关键问题:(1)在监狱服刑的人接受公共资助的心理健康服务;(2)接受公费精神卫生服务的儿童福利案件的家长;(三)接受流浪服务的服刑人员;(4)以前接受过公共资助的心理健康服务的自杀受害者。据BPC所知,这是人类服务领域首次完成此类演示。为了证明和描述这些分析的隐私保护计算的适用性,项目团队在两个不同的隐私保护平台上执行了它们。第一个平台名为Jana,是为美国国防高级研究计划局开发的布兰代斯项目的一部分,完全在软件中实现安全计算。Jana使用加密技术组合来保护静态和传输中的数据,并使用安全的多方计算来保护计算过程中的数据。具体来说,Jana使用多个服务器对数据的加密秘密共享执行计算,同时确保这些服务器永远不会看到解密形式的数据。第二个平台称为FIDES,是美国国土安全部IMPACT计划的一部分,通过硬件支持的加密飞地实现安全计算。 具体来说,FIDES使用英特尔公司的处理器和英特尔软件保护扩展,在处理器的一个区域进行计算,该区域被计算机上运行的其他代码(包括计算机自己的操作系统)限制访问。处理器或软件的任何部分,除了硬件安全的飞地,都不会看到解密形式的数据。这两个保护隐私的计算平台提供了类似的方法:数据到达计算平台时已经加密,以严格不泄露任何数据的方式执行分析,并将结果安全地提供给用户。这些实验的目的是将这两种方法与经典的数据分析设置进行比较。利用人类服务数据成功完成演示产生了以下见解:——实验产生了有效、可靠的结果。这两个平台都产生了与传统数据分析方法一致的有效结果。这一结果表明,使用这些隐私保护方法的查询不会影响统计结论的有效性或可靠性。因此,多方计算模型满足演示的核心标准,以实现数据使用和隐私保护。——实验的效率对政策制定者来说是一种权衡。实现隐私保护技术的不同模式为回答的及时性提供了折衷。使用基于软件的方法对近20万条记录进行分析需要近3个小时才能完成,而在支持硬件的环境中,同样的查询只需要十分之一秒就能返回结果。这些时间对具有快速决策架构的政府操作中的应用程序具有重大影响。这些发现表明,这些方法为公共政策提供了相当大的希望,可以同时实现改进的数据分析和切实的隐私保护。然而,在政府机构广泛使用之前,仍然需要进一步努力开发隐私保护技术,使其部署更加省时。这种部署的范围和规模可能会产生巨大的成本影响或计算响应时间的严重延迟,这取决于对隐私保护方法的期望权衡。除了开发隐私保障的技术精确性之外,技术的进一步发展还必须包括学习在复杂的组织或政府基础设施和法律框架内部署保护措施的方法,这些框架可能不会明确鼓励此类活动。这个示范项目为如何部署这些技术提供了一个引人注目的例子,它可以促进各级政府的国内、非情报机构对这种方法的考虑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Policy Responses to Cross-border Central Bank Digital Currencies – Assessing the Transborder Effects of Digital Yuan Artificial Intelligence in the Internet of Health Things: Is the Solution to AI Privacy More AI? Comments on GDPR Enforcement EDPB Decision 01/020 Privacy Rights and Data Security: GDPR and Personal Data Driven Markets Big Boss is Watching You! The Right to Privacy of Employees in the Context of Workplace Surveillance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1