Identifying Personal DNA Methylation Profiles by Genotype Inference

2017 IEEE Symposium on Security and Privacy (SP) Pub Date : 2017-05-22 DOI:10.1109/SP.2017.21

M. Backes, Pascal Berrang, M. Bieg, R. Eils, C. Herrmann, Mathias Humbert, I. Lehmann

{"title":"Identifying Personal DNA Methylation Profiles by Genotype Inference","authors":"M. Backes, Pascal Berrang, M. Bieg, R. Eils, C. Herrmann, Mathias Humbert, I. Lehmann","doi":"10.1109/SP.2017.21","DOIUrl":null,"url":null,"abstract":"Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health. In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"137 1","pages":"957-976"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2017.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health. In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过基因型推断确定个人DNA甲基化谱

自从第一次全基因组测序以来，生物医学研究界已经朝着更精确、可预测和个性化的医学迈出了重要的一步。如今，基因组数据被广泛认为是隐私敏感的，因此受到严格法规的保护，只有在仔细考虑后才会发布。然而，各种其他类型的生物医学数据没有受到任何专门法律手段的保护，因此传播起来就不那么周到了。这尤其适用于DNA甲基化数据，因为它是影响人类健康的最重要和众所周知的表观遗传因素之一。在本文中，我们表明，与上述观点相反，释放一个人的DNA甲基化数据会导致类似于释放一个人的实际基因组的隐私问题。我们表明，受基因组变异影响的一小部分甲基化区域已经足以推断某人基因组的一部分，并进一步将这种DNA甲基化谱映射到相应的基因组。值得注意的是，我们表明这种重新识别的准确率为97.5%，依赖于超过2500个基因组的数据集，并且我们可以使用适当的统计检验拒绝所有错误匹配的基因组。我们提出了一种新的加密方案，用于私下对肿瘤进行分类，从而在常见的临床环境中进行尊重隐私的医疗诊断，从而为应对这种威胁提供了手段。该方案依赖于随机森林和同态加密的组合，并且在诚实但好奇的模型中被证明是安全的。我们在真实的DNA甲基化数据上对该方案进行了评估，并表明我们可以将计算开销保持在我们的应用场景可以接受的值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE Symposium on Security and Privacy (SP)

自引率

0.00%

发文量