隐私保护数据传播的随机响应模型

Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong
{"title":"隐私保护数据传播的随机响应模型","authors":"Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong","doi":"10.1109/HISB.2012.63","DOIUrl":null,"url":null,"abstract":"Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Randomized Response Model for Privacy-Preserving Data Dissemination\",\"authors\":\"Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong\",\"doi\":\"10.1109/HISB.2012.63\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.\",\"PeriodicalId\":375089,\"journal\":{\"name\":\"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HISB.2012.63\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HISB.2012.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

医疗数据的公开传播鼓励有意义的研究和质量改进。然而,有一个很大的担忧是,不适当的披露可能会使敏感的个人信息处于危险之中。为了保持研究成果的有效性和个性化的隐私保护,我们提出了一种新颖实用的随机响应模型(k-shuffle)和统计信息恢复程序。前者将患者记录的分布与从k-1预先确定的分布中抽取的样本混合在一起,以确保差异隐私。后者允许数据接收者恢复感兴趣的子种群的统计属性(例如,均值和方差),其精度与子种群的大小成正比。也就是说,我们的算法为较小的群体提供了更强的隐私保护,并为针对较大人群的研究提供了高数据可用性。最重要的是,在差分隐私保证下,数据接收方无法重构每个个体的记录到身份映射。总之,我们的方法提供了一种可扩展的保护隐私的数据传播机制,可以以集中式和分布式方式应用,这使得受干扰的数据可以外包(在云中),从而降低隐私风险。我们的实验结果证明了我们的模型在使用合成和真实数据集的隐私保护、信息丢失和分类准确性方面的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Randomized Response Model for Privacy-Preserving Data Dissemination
Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses Aggregated Indexing of Biomedical Time Series Data Temporal Analysis of Physicians' EHR Workflow during Outpatient Visits Does Domain Knowledge Matter for Assertion Annotation in Clinical Texts? A Randomized Response Model for Privacy-Preserving Data Dissemination
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1