隐私保护数据传播的随机响应模型

2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology Pub Date : 2012-09-27 DOI:10.1109/HISB.2012.63

Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong

{"title":"隐私保护数据传播的随机响应模型","authors":"Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong","doi":"10.1109/HISB.2012.63","DOIUrl":null,"url":null,"abstract":"Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Randomized Response Model for Privacy-Preserving Data Dissemination\",\"authors\":\"Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong\",\"doi\":\"10.1109/HISB.2012.63\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.\",\"PeriodicalId\":375089,\"journal\":{\"name\":\"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HISB.2012.63\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HISB.2012.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

医疗数据的公开传播鼓励有意义的研究和质量改进。然而，有一个很大的担忧是，不适当的披露可能会使敏感的个人信息处于危险之中。为了保持研究成果的有效性和个性化的隐私保护，我们提出了一种新颖实用的随机响应模型(k-shuffle)和统计信息恢复程序。前者将患者记录的分布与从k-1预先确定的分布中抽取的样本混合在一起，以确保差异隐私。后者允许数据接收者恢复感兴趣的子种群的统计属性(例如，均值和方差)，其精度与子种群的大小成正比。也就是说，我们的算法为较小的群体提供了更强的隐私保护，并为针对较大人群的研究提供了高数据可用性。最重要的是，在差分隐私保证下，数据接收方无法重构每个个体的记录到身份映射。总之，我们的方法提供了一种可扩展的保护隐私的数据传播机制，可以以集中式和分布式方式应用，这使得受干扰的数据可以外包(在云中)，从而降低隐私风险。我们的实验结果证明了我们的模型在使用合成和真实数据集的隐私保护、信息丢失和分类准确性方面的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Randomized Response Model for Privacy-Preserving Data Dissemination

Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology

自引率

0.00%

发文量

期刊最新文献

Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses Aggregated Indexing of Biomedical Time Series Data Temporal Analysis of Physicians' EHR Workflow during Outpatient Visits Does Domain Knowledge Matter for Assertion Annotation in Clinical Texts? A Randomized Response Model for Privacy-Preserving Data Dissemination