首页 > 最新文献

Journal of Privacy and Confidentiality最新文献

英文 中文
Differentially Private Guarantees for Analytics and Machine Learning on Graphs: A Survey of Results 图上分析和机器学习的差异化私有保证:成果概览
Q2 Mathematics Pub Date : 2024-02-11 DOI: 10.29012/jpc.820
Tamara T. Mueller, Dmitrii Usynin, Johannes C. Paetzold, R. Braren, D. Rueckert, Georgios Kaissis
We study the applications of differential privacy (DP) in the context of graph-structured data and discuss the formulations of DP applicable to the publication of graphs and their associated statistics as well as machine learning on graph-based data, including graph neural networks (GNNs). Interpreting DP guarantees in the context of graph-structured data can be challenging, as individual data points are interconnected (often non-linearly or sparsely). This connectivity complicates the computation of individual privacy loss in differentially private learning. The problem is exacerbated by an absence of a single, well-established formulation of DP in graph settings. This issue extends to the domain of GNNs, rendering private machine learning on graph-structured data a challenging task. A lack of prior systematisation work motivated us to study graph-based learning from a privacy perspective. In this work, we systematise different formulations of DP on graphs, discuss challenges and promising applications, including the GNN domain. We compare and separate works into graph analytics tasks and graph learning tasks with GNNs. We conclude our work with a discussion of open questions and potential directions for further research in this area.
我们研究了图结构数据背景下的差分隐私(DP)应用,并讨论了适用于发布图及其相关统计数据以及基于图数据(包括图神经网络(GNN))的机器学习的 DP 表述。在图结构数据的背景下解释 DP 保证具有挑战性,因为单个数据点是相互连接的(通常是非线性或稀疏的)。这种连通性使得计算差异化私有学习中的单个隐私损失变得更加复杂。由于在图环境中缺乏单一、成熟的 DP 表述,这个问题变得更加严重。这个问题延伸到了 GNN 领域,使得在图结构数据上进行隐私机器学习成为一项具有挑战性的任务。之前缺乏系统化工作促使我们从隐私角度研究基于图的学习。在这项工作中,我们系统阐述了图上 DP 的不同形式,讨论了包括 GNN 领域在内的挑战和有前景的应用。我们比较并区分了图分析任务和使用 GNN 的图学习任务。最后,我们讨论了这一领域的开放性问题和进一步研究的潜在方向。
{"title":"Differentially Private Guarantees for Analytics and Machine Learning on Graphs: A Survey of Results","authors":"Tamara T. Mueller, Dmitrii Usynin, Johannes C. Paetzold, R. Braren, D. Rueckert, Georgios Kaissis","doi":"10.29012/jpc.820","DOIUrl":"https://doi.org/10.29012/jpc.820","url":null,"abstract":"We study the applications of differential privacy (DP) in the context of graph-structured data and discuss the formulations of DP applicable to the publication of graphs and their associated statistics as well as machine learning on graph-based data, including graph neural networks (GNNs). Interpreting DP guarantees in the context of graph-structured data can be challenging, as individual data points are interconnected (often non-linearly or sparsely). This connectivity complicates the computation of individual privacy loss in differentially private learning. The problem is exacerbated by an absence of a single, well-established formulation of DP in graph settings. This issue extends to the domain of GNNs, rendering private machine learning on graph-structured data a challenging task. A lack of prior systematisation work motivated us to study graph-based learning from a privacy perspective. In this work, we systematise different formulations of DP on graphs, discuss challenges and promising applications, including the GNN domain. We compare and separate works into graph analytics tasks and graph learning tasks with GNNs. We conclude our work with a discussion of open questions and potential directions for further research in this area.","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139845759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Differentially Private Guarantees for Analytics and Machine Learning on Graphs: A Survey of Results 图上分析和机器学习的差异化私有保证:成果概览
Q2 Mathematics Pub Date : 2024-02-11 DOI: 10.29012/jpc.820
Tamara T. Mueller, Dmitrii Usynin, Johannes C. Paetzold, R. Braren, D. Rueckert, Georgios Kaissis
We study the applications of differential privacy (DP) in the context of graph-structured data and discuss the formulations of DP applicable to the publication of graphs and their associated statistics as well as machine learning on graph-based data, including graph neural networks (GNNs). Interpreting DP guarantees in the context of graph-structured data can be challenging, as individual data points are interconnected (often non-linearly or sparsely). This connectivity complicates the computation of individual privacy loss in differentially private learning. The problem is exacerbated by an absence of a single, well-established formulation of DP in graph settings. This issue extends to the domain of GNNs, rendering private machine learning on graph-structured data a challenging task. A lack of prior systematisation work motivated us to study graph-based learning from a privacy perspective. In this work, we systematise different formulations of DP on graphs, discuss challenges and promising applications, including the GNN domain. We compare and separate works into graph analytics tasks and graph learning tasks with GNNs. We conclude our work with a discussion of open questions and potential directions for further research in this area.
我们研究了图结构数据背景下的差分隐私(DP)应用,并讨论了适用于发布图及其相关统计数据以及基于图数据(包括图神经网络(GNN))的机器学习的 DP 表述。在图结构数据的背景下解释 DP 保证具有挑战性,因为单个数据点是相互连接的(通常是非线性或稀疏的)。这种连通性使得计算差异化私有学习中的单个隐私损失变得更加复杂。由于在图环境中缺乏单一、成熟的 DP 表述,这个问题变得更加严重。这个问题延伸到了 GNN 领域,使得在图结构数据上进行隐私机器学习成为一项具有挑战性的任务。之前缺乏系统化工作促使我们从隐私角度研究基于图的学习。在这项工作中,我们系统阐述了图上 DP 的不同形式,讨论了包括 GNN 领域在内的挑战和有前景的应用。我们比较并区分了图分析任务和使用 GNN 的图学习任务。最后,我们讨论了这一领域的开放性问题和进一步研究的潜在方向。
{"title":"Differentially Private Guarantees for Analytics and Machine Learning on Graphs: A Survey of Results","authors":"Tamara T. Mueller, Dmitrii Usynin, Johannes C. Paetzold, R. Braren, D. Rueckert, Georgios Kaissis","doi":"10.29012/jpc.820","DOIUrl":"https://doi.org/10.29012/jpc.820","url":null,"abstract":"We study the applications of differential privacy (DP) in the context of graph-structured data and discuss the formulations of DP applicable to the publication of graphs and their associated statistics as well as machine learning on graph-based data, including graph neural networks (GNNs). Interpreting DP guarantees in the context of graph-structured data can be challenging, as individual data points are interconnected (often non-linearly or sparsely). This connectivity complicates the computation of individual privacy loss in differentially private learning. The problem is exacerbated by an absence of a single, well-established formulation of DP in graph settings. This issue extends to the domain of GNNs, rendering private machine learning on graph-structured data a challenging task. A lack of prior systematisation work motivated us to study graph-based learning from a privacy perspective. In this work, we systematise different formulations of DP on graphs, discuss challenges and promising applications, including the GNN domain. We compare and separate works into graph analytics tasks and graph learning tasks with GNNs. We conclude our work with a discussion of open questions and potential directions for further research in this area.","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139785637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Beyond Legal Frameworks and Security Controls For Accessing Confidential Survey Data: Engaging Data Users in Data Protection 超越访问机密调查数据的法律框架和安全控制:让数据用户参与数据保护
Q2 Mathematics Pub Date : 2023-12-06 DOI: 10.29012/jpc.845
Amy Pienta, J. Jang, Margaret Levenstein
With a growing demand for data reuse and open data within the scientific ecosystem, protecting the confidentiality and privacy of survey data is increasingly important.  It requires more than legal procedures and technological controls; it requires social and behavioral intervention. In this research note, we delineate the disclosure risks of various types of survey data (i.e., longitudinal data, social network data, sensitive information and biomarkers, and geographic data), the current motivation for data reuse and challenges to data protection. Despite rigorous efforts to protect data, there are still threats to mitigate the protection of confidentiality in microdata. Unintentional data breaches, protocol violations, and the misuse of data are observed even in well-established restricted data access systems, which indicates that the systems all may rely heavily on trust. Creating and maintaining that trust is critical to secure data access. We suggest four ways of building trust; User-Centered Design Practices; Promoting Trust for Protecting Confidential Data; General Training in Research Ethics; Specific Training in Data Security Protocols, with an example of a new project ‘Researcher Passport’ by the Inter-university Consortium for Political and Social Research. Continuous user-focused improvements in restricted data access systems are necessary so that we promote a culture of trust among the research and data user community, train both in the general topic of responsible research and in the specific requirements of these systems, and offer systematic and holistic solutions.
随着科学生态系统中对数据重用和开放数据的需求不断增长,保护调查数据的机密性和隐私性变得越来越重要。它需要的不仅仅是法律程序和技术控制;它需要社会和行为干预。在这份研究报告中,我们描述了各种类型的调查数据(即纵向数据、社交网络数据、敏感信息和生物标记物以及地理数据)的披露风险、当前数据重用的动机以及数据保护面临的挑战。尽管为保护数据作出了严格的努力,但仍然存在减轻对微数据机密性保护的威胁。即使在完善的受限制的数据访问系统中,也可以观察到无意的数据泄露、协议违反和数据滥用,这表明系统都可能严重依赖信任。创建和维护这种信任对于确保数据访问的安全至关重要。我们提出了四种建立信任的方法:用户为中心的设计实践;促进信任以保护机密资料;研究伦理的一般训练;数据安全协议的具体培训,并以大学间政治和社会研究联盟的新项目“研究员护照”为例。不断以用户为中心改进受限制的数据访问系统是必要的,这样我们才能在研究和数据用户社区之间促进信任的文化,培训负责任的研究的一般主题和这些系统的具体要求,并提供系统和整体的解决方案。
{"title":"Beyond Legal Frameworks and Security Controls For Accessing Confidential Survey Data: Engaging Data Users in Data Protection","authors":"Amy Pienta, J. Jang, Margaret Levenstein","doi":"10.29012/jpc.845","DOIUrl":"https://doi.org/10.29012/jpc.845","url":null,"abstract":"With a growing demand for data reuse and open data within the scientific ecosystem, protecting the confidentiality and privacy of survey data is increasingly important.  It requires more than legal procedures and technological controls; it requires social and behavioral intervention. In this research note, we delineate the disclosure risks of various types of survey data (i.e., longitudinal data, social network data, sensitive information and biomarkers, and geographic data), the current motivation for data reuse and challenges to data protection. Despite rigorous efforts to protect data, there are still threats to mitigate the protection of confidentiality in microdata. Unintentional data breaches, protocol violations, and the misuse of data are observed even in well-established restricted data access systems, which indicates that the systems all may rely heavily on trust. Creating and maintaining that trust is critical to secure data access. We suggest four ways of building trust; User-Centered Design Practices; Promoting Trust for Protecting Confidential Data; General Training in Research Ethics; Specific Training in Data Security Protocols, with an example of a new project ‘Researcher Passport’ by the Inter-university Consortium for Political and Social Research. Continuous user-focused improvements in restricted data access systems are necessary so that we promote a culture of trust among the research and data user community, train both in the general topic of responsible research and in the specific requirements of these systems, and offer systematic and holistic solutions.","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138595982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protecting Sensitive Data Early in the Research Data Lifecycle 在研究数据生命周期的早期保护敏感数据
Q2 Mathematics Pub Date : 2023-12-06 DOI: 10.29012/jpc.846
Sebastian Karcher, Sefa Secen, Nicholas Weber
How do researchers in fieldwork-intensive disciplines protect sensitive data in the field, how do they assess their own practices, and how do they arrive at them? This article reports the results of a qualitative study with 36 semi-structured interviews with qualitative and multi-method researchers in political science and humanitarian aid/migration studies. We find that researchers frequently feel ill-prepared to handle the management of sensitive data in the field and find that formal institutions provide little support. Instead, they use a patchwork of sources to devise strategies for protecting their informants and their data. We argue that this carries substantial risks for the security of the data as well as their potential for later sharing and re-use. We conclude with some suggestions for effectively supporting data management in fieldwork-intensive research without unduly adding to the burden on researchers conducting it.
实地工作密集型学科的研究人员如何保护该领域的敏感数据,他们如何评估自己的实践,以及他们如何达到这些目标?本文报告了一项定性研究的结果,该研究采用了36个半结构化访谈,访谈对象是政治学和人道主义援助/移民研究领域的定性和多方法研究人员。我们发现,研究人员经常感到没有准备好处理该领域敏感数据的管理,并且发现正式机构提供的支持很少。相反,他们利用各种来源来制定保护线人及其数据的策略。我们认为,这对数据的安全性以及它们以后共享和重用的潜力带来了巨大的风险。最后,我们提出了一些建议,以有效地支持现场工作密集型研究中的数据管理,而不会过度增加研究人员进行数据管理的负担。
{"title":"Protecting Sensitive Data Early in the Research Data Lifecycle","authors":"Sebastian Karcher, Sefa Secen, Nicholas Weber","doi":"10.29012/jpc.846","DOIUrl":"https://doi.org/10.29012/jpc.846","url":null,"abstract":"How do researchers in fieldwork-intensive disciplines protect sensitive data in the field, how do they assess their own practices, and how do they arrive at them? This article reports the results of a qualitative study with 36 semi-structured interviews with qualitative and multi-method researchers in political science and humanitarian aid/migration studies. We find that researchers frequently feel ill-prepared to handle the management of sensitive data in the field and find that formal institutions provide little support. Instead, they use a patchwork of sources to devise strategies for protecting their informants and their data. We argue that this carries substantial risks for the security of the data as well as their potential for later sharing and re-use. We conclude with some suggestions for effectively supporting data management in fieldwork-intensive research without unduly adding to the burden on researchers conducting it.","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138594745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Restricted data management: the current practice and the future 受限数据管理:当前实践与未来
Q2 Mathematics Pub Date : 2023-12-06 DOI: 10.29012/jpc.844
J. Jang, Amy Pienta, Margaret Levenstein, Joe Saul
Many restricted data managing organizations across the world have adapted the Five Safes framework (safe data, safe projects, safe people, safe setting, safe output) for their management of restricted and confidential data. While the Five Safes have been well integrated throughout the data life cycle, organizations observe several unintended challenges regarding data being FAIR (Findable, Accessible, Interoperable, Reusable). In the current study, we review the current practice on the restricted data management and discuss challenges and future directions, especially focusing on data use agreements, disclosure risks review, and training. In the future, restricted data managing organizations may need to proactively take into consideration reducing inequalities in access to scientific development, preventing unethical use of data in their management of restricted and confidential data, and managing various types of data.
世界上许多受限制的数据管理组织都采用了“五个保险箱”框架(安全的数据、安全的项目、安全的人员、安全的环境、安全的输出)来管理受限制的机密数据。虽然五大安全标准已经在整个数据生命周期中得到了很好的集成,但组织在数据公平(可查找、可访问、可互操作、可重用)方面遇到了一些意想不到的挑战。在本研究中,我们回顾了目前限制数据管理的实践,并讨论了挑战和未来的方向,特别是关注数据使用协议,披露风险审查和培训。未来,限制性数据管理组织可能需要积极考虑减少获取科学发展的不平等现象,防止在管理限制性和机密数据时不道德地使用数据,以及管理各种类型的数据。
{"title":"Restricted data management: the current practice and the future","authors":"J. Jang, Amy Pienta, Margaret Levenstein, Joe Saul","doi":"10.29012/jpc.844","DOIUrl":"https://doi.org/10.29012/jpc.844","url":null,"abstract":"Many restricted data managing organizations across the world have adapted the Five Safes framework (safe data, safe projects, safe people, safe setting, safe output) for their management of restricted and confidential data. While the Five Safes have been well integrated throughout the data life cycle, organizations observe several unintended challenges regarding data being FAIR (Findable, Accessible, Interoperable, Reusable). In the current study, we review the current practice on the restricted data management and discuss challenges and future directions, especially focusing on data use agreements, disclosure risks review, and training. In the future, restricted data managing organizations may need to proactively take into consideration reducing inequalities in access to scientific development, preventing unethical use of data in their management of restricted and confidential data, and managing various types of data.","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138596307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
"I need a better description": An Investigation Into User Expectations For Differential Privacy “我需要一个更好的描述”:对用户对差异隐私期望的调查
Q2 Mathematics Pub Date : 2023-08-31 DOI: 10.29012/jpc.813
Rachel Cummings, Gabriel Kaptchuk, Elissa Redmiles
Despite recent widespread deployment of differential privacy, relatively little is known about what users think of differential privacy. In this work, we seek to explore users' privacy expectations related to differential privacy. Specifically, we investigate (1) whether users care about the protections afforded by differential privacy, and (2) whether they are therefore more willing to share their data with differentially private systems. Further, we attempt to understand (3) users' privacy expectations of the differentially private systems they may encounter in practice and (4) their willingness to share data in such systems. To answer these questions, we use a series of rigorously conducted surveys (n=2424). We find that users care about the kinds of information leaks against which differential privacy protects and are more willing to share their private information when the risks of these leaks are less likely to happen. Additionally, we find that the ways in which differential privacy is described in-the-wild haphazardly set users' privacy expectations, which can be misleading depending on the deployment. We synthesize our results into a framework for understanding a user's willingness to share information with differentially private systems, which takes into account the interaction between the user's prior privacy concerns and how differential privacy is described.
尽管最近差分隐私被广泛使用,但对于用户对差分隐私的看法却知之甚少。在这项工作中,我们试图探索与差异隐私相关的用户隐私期望。具体来说,我们调查了(1)用户是否关心差异隐私所提供的保护,以及(2)他们是否因此更愿意与差异隐私系统共享他们的数据。此外,我们试图理解(3)用户对他们在实践中可能遇到的不同隐私系统的隐私期望,以及(4)他们在此类系统中共享数据的意愿。为了回答这些问题,我们使用了一系列严格进行的调查(n=2424)。& # x0D;我们发现,用户更关心差分隐私保护的信息泄露类型,当这些泄露的风险不太可能发生时,用户更愿意分享他们的私人信息。此外,我们发现在野外描述差异隐私的方式随意地设定了用户的隐私期望,这可能会因部署而产生误导。我们将我们的结果综合到一个框架中,用于理解用户与不同隐私系统共享信息的意愿,该框架考虑了用户先前隐私问题与如何描述差异隐私之间的相互作用。
{"title":"\"I need a better description\": An Investigation Into User Expectations For Differential Privacy","authors":"Rachel Cummings, Gabriel Kaptchuk, Elissa Redmiles","doi":"10.29012/jpc.813","DOIUrl":"https://doi.org/10.29012/jpc.813","url":null,"abstract":"Despite recent widespread deployment of differential privacy, relatively little is known about what users think of differential privacy. In this work, we seek to explore users' privacy expectations related to differential privacy. Specifically, we investigate (1) whether users care about the protections afforded by differential privacy, and (2) whether they are therefore more willing to share their data with differentially private systems. Further, we attempt to understand (3) users' privacy expectations of the differentially private systems they may encounter in practice and (4) their willingness to share data in such systems. To answer these questions, we use a series of rigorously conducted surveys (n=2424).
 
 We find that users care about the kinds of information leaks against which differential privacy protects and are more willing to share their private information when the risks of these leaks are less likely to happen. Additionally, we find that the ways in which differential privacy is described in-the-wild haphazardly set users' privacy expectations, which can be misleading depending on the deployment. We synthesize our results into a framework for understanding a user's willingness to share information with differentially private systems, which takes into account the interaction between the user's prior privacy concerns and how differential privacy is described.","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135947024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthesizing Familial Linkages for Privacy in Microdata 微数据中隐私的家族关联综合
Q2 Mathematics Pub Date : 2023-08-31 DOI: 10.29012/jpc.767
Gary Benedetto, Evan Totty
As the Census Bureau strives to modernize its disclosure avoidance efforts in all of its outputs, synthetic data has become a successful way to provide external researchers a chance to conduct a wide variety of analyses on microdata while still satisfying the legal objective of protecting privacy of survey respondents. Some of the most useful variables for researchers are some of the trickiest to model: relationships between records. These can be family relationships, household relationships, or employer-employee relationships to name a few. This paper describes a method to match synthetic records together in a way that mimics the covariation between related records in the underlying, protected data.
由于人口普查局努力在其所有产出中现代化其避免披露的努力,合成数据已成为一种成功的方式,为外部研究人员提供了对微数据进行各种分析的机会,同时仍然满足保护调查受访者隐私的法律目标。对研究人员来说,一些最有用的变量也是最难建模的变量:记录之间的关系。这些关系可以是家庭关系、家庭关系或雇主与雇员的关系等等。本文描述了一种将合成记录匹配在一起的方法,这种方法模仿了底层受保护数据中相关记录之间的协变。
{"title":"Synthesizing Familial Linkages for Privacy in Microdata","authors":"Gary Benedetto, Evan Totty","doi":"10.29012/jpc.767","DOIUrl":"https://doi.org/10.29012/jpc.767","url":null,"abstract":"As the Census Bureau strives to modernize its disclosure avoidance efforts in all of its outputs, synthetic data has become a successful way to provide external researchers a chance to conduct a wide variety of analyses on microdata while still satisfying the legal objective of protecting privacy of survey respondents. Some of the most useful variables for researchers are some of the trickiest to model: relationships between records. These can be family relationships, household relationships, or employer-employee relationships to name a few. This paper describes a method to match synthetic records together in a way that mimics the covariation between related records in the underlying, protected data.","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45188480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CONSISTENT SPECTRAL CLUSTERING OF NETWORK BLOCK MODELS UNDER LOCAL DIFFERENTIAL PRIVACY. 局部差分隐私下网络块模型的一致谱聚类。
Q2 Mathematics Pub Date : 2022-11-02 DOI: 10.29012/jpc.811
Jonathan Hehir, Aleksandra Slavković, Xiaoyue Niu

The stochastic block model (SBM) and degree-corrected block model (DCBM) are network models often selected as the fundamental setting in which to analyze the theoretical properties of community detection methods. We consider the problem of spectral clustering of SBM and DCBM networks under a local form of edge differential privacy. Using a randomized response privacy mechanism called the edge-flip mechanism, we develop theoretical guarantees for differentially private community detection, demonstrating conditions under which this strong privacy guarantee can be upheld while achieving spectral clustering convergence rates that match the known rates without privacy. We prove the strongest theoretical results are achievable for dense networks (those with node degree linear in the number of nodes), while weak consistency is achievable under mild sparsity (node degree greater than n). We empirically demonstrate our results on a number of network examples.

随机块模型(SBM)和度校正块模型(DCBM)是网络模型,通常被选为分析社区检测方法理论性质的基本环境。我们考虑在边缘差分隐私的局部形式下SBM和DCBM网络的频谱聚类问题。使用一种称为边缘翻转机制的随机响应隐私机制,我们开发了差异隐私社区检测的理论保证,证明了在实现与已知速率匹配的光谱聚类收敛率的同时,可以维持这种强大的隐私保证的条件,而无需隐私。我们证明了对于密集网络(节点度在节点数量上线性的网络),最强的理论结果是可以实现的,而在轻度稀疏性(节点度大于n)下,弱一致性是可以实现。我们在一些网络例子中实证地证明了我们的结果。
{"title":"CONSISTENT SPECTRAL CLUSTERING OF NETWORK BLOCK MODELS UNDER LOCAL DIFFERENTIAL PRIVACY.","authors":"Jonathan Hehir,&nbsp;Aleksandra Slavković,&nbsp;Xiaoyue Niu","doi":"10.29012/jpc.811","DOIUrl":"https://doi.org/10.29012/jpc.811","url":null,"abstract":"<p><p>The stochastic block model (SBM) and degree-corrected block model (DCBM) are network models often selected as the fundamental setting in which to analyze the theoretical properties of community detection methods. We consider the problem of spectral clustering of SBM and DCBM networks under a local form of edge differential privacy. Using a randomized response privacy mechanism called the edge-flip mechanism, we develop theoretical guarantees for differentially private community detection, demonstrating conditions under which this strong privacy guarantee can be upheld while achieving spectral clustering convergence rates that match the known rates without privacy. We prove the strongest theoretical results are achievable for dense networks (those with node degree linear in the number of nodes), while weak consistency is achievable under mild sparsity (node degree greater than <math><mrow><msqrt><mi>n</mi></msqrt></mrow></math>). We empirically demonstrate our results on a number of network examples.</p>","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/73/b8/nihms-1889989.PMC10586058.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49686113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ON THE PRIVACY AND UTILITY PROPERTIES OF TRIPLE MATRIX-MASKING. 关于三重矩阵掩码的私密性和效用性质。
Q2 Mathematics Pub Date : 2020-06-01 DOI: 10.29012/jpc.674
A Adam Ding, Guanhong Miao, Samuel S Wu

Privacy protection is an important requirement in many statistical studies. A recently proposed data collection method, triple matrix-masking, retains exact summary statistics without exposing the raw data at any point in the process. In this paper, we provide theoretical formulation and proofs showing that a modified version of the procedure is strong collection obfuscating: no party in the data collection process is able to gain knowledge of the individual level data, even with some partially masked data information in addition to the publicly published data. This provides a theoretical foundation for the usage of such a procedure to collect masked data that allows exact statistical inference for linear models, while preserving a well-defined notion of privacy protection for each individual participant in the study. This paper fits into a line of work tackling the problem of how to create useful synthetic data without having a trustworthy data aggregator. We achieve this by splitting the trust between two parties, the "masking service provider" and the "data collector."

隐私保护是许多统计研究的重要要求。最近提出的一种数据收集方法,三重矩阵屏蔽,保留精确的汇总统计数据,而不会在过程中的任何一点暴露原始数据。在本文中,我们提供了理论公式和证明,表明该过程的修改版本是强集合混淆的:数据收集过程中的任何一方都无法获得个人层面数据的知识,即使除了公开发布的数据之外,还有一些部分被掩盖的数据信息。这为使用这样的程序来收集屏蔽数据提供了理论基础,这些数据允许对线性模型进行精确的统计推断,同时为研究中的每个个体参与者保留了一个定义良好的隐私保护概念。本文适合解决如何在没有可靠的数据聚合器的情况下创建有用的合成数据的问题。我们通过在两方(“屏蔽服务提供者”和“数据收集器”)之间分离信任来实现这一点。
{"title":"ON THE PRIVACY AND UTILITY PROPERTIES OF TRIPLE MATRIX-MASKING.","authors":"A Adam Ding,&nbsp;Guanhong Miao,&nbsp;Samuel S Wu","doi":"10.29012/jpc.674","DOIUrl":"https://doi.org/10.29012/jpc.674","url":null,"abstract":"<p><p>Privacy protection is an important requirement in many statistical studies. A recently proposed data collection method, triple matrix-masking, retains exact summary statistics without exposing the raw data at any point in the process. In this paper, we provide theoretical formulation and proofs showing that a modified version of the procedure is strong collection obfuscating: no party in the data collection process is able to gain knowledge of the individual level data, even with some partially masked data information in addition to the publicly published data. This provides a theoretical foundation for the usage of such a procedure to collect masked data that allows exact statistical inference for linear models, while preserving a well-defined notion of privacy protection for each individual participant in the study. This paper fits into a line of work tackling the problem of how to create useful synthetic data without having a trustworthy data aggregator. We achieve this by splitting the trust between two parties, the \"masking service provider\" and the \"data collector.\"</p>","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8580375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39881185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A New Data Collection Technique for Preserving Privacy. 一种保护隐私的数据收集新技术。
Q2 Mathematics Pub Date : 2016-01-01 Epub Date: 2018-02-02 DOI: 10.29012/jpc.v7i3.408
Samuel S Wu, Shigang Chen, Deborah Burr, Long Zhang

A major obstacle that hinders medical and social research is the lack of reliable data due to people's reluctance to reveal private information to strangers. Fortunately, statistical inference always targets a well-defined population rather than a particular individual subject and, in many current applications, data can be collected using a web-based system or other mobile devices. These two characteristics enable us to develop a data collection method, called triple matrix-masking (TM 2 ), which offers strong privacy protection with an immediate matrix transformation so that even the researchers cannot see the data, and then further uses matrix transformations to guarantee that the data will still be analyzable by standard statistical methods. The entities involved in the proposed process are a masking service provider who receives the initially masked data and then applies another mask, and the data collectors who partially decrypt the now doubly masked data and then apply a third mask before releasing the data to the public. A critical feature of the method is that the keys to generate the matrices are held separately. This ensures that nobody sees the actual data, but because of the specially designed transformations, statistical inference on parameters of interest can be conducted with the same results as if the original data were used. Hence the TM2 method hides sensitive data with no efficiency loss for statistical inference of binary and normal data, which improves over Warner's randomized response technique. In addition, we add several features to the proposed procedure: an error checking mechanism is built into the data collection process in order to make sure that the masked data used for analysis are an appropriate transformation of the original data; and a partial masking technique is introduced to grant data users access to non-sensitive personal information while sensitive information remains hidden.

阻碍医学和社会研究的一个主要障碍是缺乏可靠的数据,因为人们不愿意向陌生人透露私人信息。幸运的是,统计推断总是针对定义良好的人群,而不是特定的个体主题,并且在许多当前的应用程序中,可以使用基于web的系统或其他移动设备收集数据。这两个特点使我们开发了一种数据收集方法,称为三重矩阵屏蔽(TM 2),它通过即时矩阵变换提供强大的隐私保护,即使研究人员也看不到数据,然后进一步使用矩阵变换来保证数据仍然可以通过标准统计方法进行分析。在提议的过程中涉及的实体是一个屏蔽服务提供者,他接收最初被屏蔽的数据,然后应用另一个掩码,以及数据收集器,他们部分地解密现在双重被屏蔽的数据,然后在向公众发布数据之前应用第三个掩码。该方法的一个关键特征是生成矩阵的键是分开保存的。这可以确保没有人看到实际数据,但是由于特殊设计的转换,可以对感兴趣的参数进行统计推断,结果与使用原始数据相同。因此,TM2方法在对二值和正态数据进行统计推断时,在没有效率损失的情况下隐藏了敏感数据,比Warner的随机响应技术有所改进。此外,我们为所提出的过程增加了几个特征:在数据收集过程中内置错误检查机制,以确保用于分析的屏蔽数据是原始数据的适当转换;引入部分屏蔽技术,允许数据用户访问非敏感的个人信息,而敏感信息保持隐藏。
{"title":"A New Data Collection Technique for Preserving Privacy.","authors":"Samuel S Wu,&nbsp;Shigang Chen,&nbsp;Deborah Burr,&nbsp;Long Zhang","doi":"10.29012/jpc.v7i3.408","DOIUrl":"https://doi.org/10.29012/jpc.v7i3.408","url":null,"abstract":"<p><p>A major obstacle that hinders medical and social research is the lack of reliable data due to people's reluctance to reveal private information to strangers. Fortunately, statistical inference always targets a well-defined population rather than a particular individual subject and, in many current applications, data can be collected using a web-based system or other mobile devices. These two characteristics enable us to develop a data collection method, called <i>triple matrix-masking (TM</i> <sup>2</sup> <i>)</i>, which offers strong privacy protection with an immediate matrix transformation so that even the researchers cannot see the data, and then further uses matrix transformations to guarantee that the data will still be analyzable by standard statistical methods. The entities involved in the proposed process are a masking service provider who receives the initially masked data and then applies another mask, and the data collectors who partially decrypt the now doubly masked data and then apply a third mask before releasing the data to the public. A critical feature of the method is that the keys to generate the matrices are held separately. This ensures that nobody sees the actual data, but because of the specially designed transformations, statistical inference on parameters of interest can be conducted with the same results as if the original data were used. Hence the TM<sup>2</sup> method hides sensitive data with no efficiency loss for statistical inference of binary and normal data, which improves over Warner's randomized response technique. In addition, we add several features to the proposed procedure: an error checking mechanism is built into the data collection process in order to make sure that the masked data used for analysis are an appropriate transformation of the original data; and a partial masking technique is introduced to grant data users access to non-sensitive personal information while sensitive information remains hidden.</p>","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589820/pdf/nihms-1035142.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37086269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Journal of Privacy and Confidentiality
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1