Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong
{"title":"隐私保护数据传播的随机响应模型","authors":"Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong","doi":"10.1109/HISB.2012.63","DOIUrl":null,"url":null,"abstract":"Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Randomized Response Model for Privacy-Preserving Data Dissemination\",\"authors\":\"Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong\",\"doi\":\"10.1109/HISB.2012.63\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.\",\"PeriodicalId\":375089,\"journal\":{\"name\":\"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HISB.2012.63\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HISB.2012.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Randomized Response Model for Privacy-Preserving Data Dissemination
Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.