Identifying Personal DNA Methylation Profiles by Genotype Inference

M. Backes, Pascal Berrang, M. Bieg, R. Eils, C. Herrmann, Mathias Humbert, I. Lehmann
{"title":"Identifying Personal DNA Methylation Profiles by Genotype Inference","authors":"M. Backes, Pascal Berrang, M. Bieg, R. Eils, C. Herrmann, Mathias Humbert, I. Lehmann","doi":"10.1109/SP.2017.21","DOIUrl":null,"url":null,"abstract":"Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health. In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"137 1","pages":"957-976"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2017.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

Abstract

Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health. In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过基因型推断确定个人DNA甲基化谱
自从第一次全基因组测序以来,生物医学研究界已经朝着更精确、可预测和个性化的医学迈出了重要的一步。如今,基因组数据被广泛认为是隐私敏感的,因此受到严格法规的保护,只有在仔细考虑后才会发布。然而,各种其他类型的生物医学数据没有受到任何专门法律手段的保护,因此传播起来就不那么周到了。这尤其适用于DNA甲基化数据,因为它是影响人类健康的最重要和众所周知的表观遗传因素之一。在本文中,我们表明,与上述观点相反,释放一个人的DNA甲基化数据会导致类似于释放一个人的实际基因组的隐私问题。我们表明,受基因组变异影响的一小部分甲基化区域已经足以推断某人基因组的一部分,并进一步将这种DNA甲基化谱映射到相应的基因组。值得注意的是,我们表明这种重新识别的准确率为97.5%,依赖于超过2500个基因组的数据集,并且我们可以使用适当的统计检验拒绝所有错误匹配的基因组。我们提出了一种新的加密方案,用于私下对肿瘤进行分类,从而在常见的临床环境中进行尊重隐私的医疗诊断,从而为应对这种威胁提供了手段。该方案依赖于随机森林和同态加密的组合,并且在诚实但好奇的模型中被证明是安全的。我们在真实的DNA甲基化数据上对该方案进行了评估,并表明我们可以将计算开销保持在我们的应用场景可以接受的值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
To Catch a Ratter: Monitoring the Behavior of Amateur DarkComet RAT Operators in the Wild Under the Shadow of Sunshine: Understanding and Detecting Bulletproof Hosting on Legitimate Service Provider Networks Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security SoK: Science, Security and the Elusive Goal of Security as a Scientific Pursuit An Experimental Security Analysis of an Industrial Robot Controller
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1