机器学习辅助全基因组关联研究的有效推断

IF 31.7 1区 生物学 Q1 GENETICS & HEREDITY Nature genetics Pub Date : 2024-09-30 DOI:10.1038/s41588-024-01934-0
Jiacheng Miao, Yixuan Wu, Zhongxuan Sun, Xinran Miao, Tianyuan Lu, Jiwei Zhao, Qiongshi Lu
{"title":"机器学习辅助全基因组关联研究的有效推断","authors":"Jiacheng Miao, Yixuan Wu, Zhongxuan Sun, Xinran Miao, Tianyuan Lu, Jiwei Zhao, Qiongshi Lu","doi":"10.1038/s41588-024-01934-0","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) has become increasingly popular in almost all scientific disciplines, including human genetics. Owing to challenges related to sample collection and precise phenotyping, ML-assisted genome-wide association study (GWAS), which uses sophisticated ML techniques to impute phenotypes and then performs GWAS on the imputed outcomes, have become increasingly common in complex trait genetics research. However, the validity of ML-assisted GWAS associations has not been carefully evaluated. Here, we report pervasive risks for false-positive associations in ML-assisted GWAS and introduce Post-Prediction GWAS (POP-GWAS), a statistical framework that redesigns GWAS on ML-imputed outcomes. POP-GWAS ensures valid and powerful statistical inference irrespective of imputation quality and choice of algorithm, requiring only GWAS summary statistics as input. We employed POP-GWAS to perform a GWAS of bone mineral density derived from dual-energy X-ray absorptiometry imaging at 14 skeletal sites, identifying 89 new loci and revealing skeletal site-specific genetic architecture. Our framework offers a robust analytic solution for future ML-assisted GWAS. Post-prediction genome-wide association study (POP-GWAS) is a statistical framework that uses summary statistics from labeled samples with both observed and imputed phenotypes to debias single-nucleotide polymorphism effect size estimates for unlabeled samples with imputed phenotypes only, leading to valid and powerful inference.","PeriodicalId":18985,"journal":{"name":"Nature genetics","volume":"56 11","pages":"2361-2369"},"PeriodicalIF":31.7000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Valid inference for machine learning-assisted genome-wide association studies\",\"authors\":\"Jiacheng Miao, Yixuan Wu, Zhongxuan Sun, Xinran Miao, Tianyuan Lu, Jiwei Zhao, Qiongshi Lu\",\"doi\":\"10.1038/s41588-024-01934-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning (ML) has become increasingly popular in almost all scientific disciplines, including human genetics. Owing to challenges related to sample collection and precise phenotyping, ML-assisted genome-wide association study (GWAS), which uses sophisticated ML techniques to impute phenotypes and then performs GWAS on the imputed outcomes, have become increasingly common in complex trait genetics research. However, the validity of ML-assisted GWAS associations has not been carefully evaluated. Here, we report pervasive risks for false-positive associations in ML-assisted GWAS and introduce Post-Prediction GWAS (POP-GWAS), a statistical framework that redesigns GWAS on ML-imputed outcomes. POP-GWAS ensures valid and powerful statistical inference irrespective of imputation quality and choice of algorithm, requiring only GWAS summary statistics as input. We employed POP-GWAS to perform a GWAS of bone mineral density derived from dual-energy X-ray absorptiometry imaging at 14 skeletal sites, identifying 89 new loci and revealing skeletal site-specific genetic architecture. Our framework offers a robust analytic solution for future ML-assisted GWAS. Post-prediction genome-wide association study (POP-GWAS) is a statistical framework that uses summary statistics from labeled samples with both observed and imputed phenotypes to debias single-nucleotide polymorphism effect size estimates for unlabeled samples with imputed phenotypes only, leading to valid and powerful inference.\",\"PeriodicalId\":18985,\"journal\":{\"name\":\"Nature genetics\",\"volume\":\"56 11\",\"pages\":\"2361-2369\"},\"PeriodicalIF\":31.7000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.nature.com/articles/s41588-024-01934-0\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature genetics","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41588-024-01934-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

机器学习(ML)在包括人类遗传学在内的几乎所有科学学科中都越来越受欢迎。由于样本收集和精确表型方面的挑战,ML 辅助全基因组关联研究(GWAS)在复杂性状遗传学研究中越来越常见,该研究使用复杂的 ML 技术来推算表型,然后对推算结果进行 GWAS。然而,ML 辅助 GWAS 关联的有效性尚未得到仔细评估。在此,我们报告了 ML 辅助 GWAS 中普遍存在的假阳性关联风险,并介绍了预测后 GWAS(POP-GWAS)--一种在 ML 估算结果上重新设计 GWAS 的统计框架。POP-GWAS 不考虑估算质量和算法选择,只需将 GWAS 摘要统计作为输入,就能确保有效且强大的统计推断。我们利用 POP-GWAS 对 14 个骨骼部位的双能 X 射线吸收仪成像得出的骨矿物质密度进行了 GWAS 分析,发现了 89 个新的基因位点,并揭示了骨骼部位特异性遗传结构。我们的框架为未来的 ML 辅助 GWAS 提供了强大的分析解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Valid inference for machine learning-assisted genome-wide association studies
Machine learning (ML) has become increasingly popular in almost all scientific disciplines, including human genetics. Owing to challenges related to sample collection and precise phenotyping, ML-assisted genome-wide association study (GWAS), which uses sophisticated ML techniques to impute phenotypes and then performs GWAS on the imputed outcomes, have become increasingly common in complex trait genetics research. However, the validity of ML-assisted GWAS associations has not been carefully evaluated. Here, we report pervasive risks for false-positive associations in ML-assisted GWAS and introduce Post-Prediction GWAS (POP-GWAS), a statistical framework that redesigns GWAS on ML-imputed outcomes. POP-GWAS ensures valid and powerful statistical inference irrespective of imputation quality and choice of algorithm, requiring only GWAS summary statistics as input. We employed POP-GWAS to perform a GWAS of bone mineral density derived from dual-energy X-ray absorptiometry imaging at 14 skeletal sites, identifying 89 new loci and revealing skeletal site-specific genetic architecture. Our framework offers a robust analytic solution for future ML-assisted GWAS. Post-prediction genome-wide association study (POP-GWAS) is a statistical framework that uses summary statistics from labeled samples with both observed and imputed phenotypes to debias single-nucleotide polymorphism effect size estimates for unlabeled samples with imputed phenotypes only, leading to valid and powerful inference.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Nature genetics
Nature genetics 生物-遗传学
CiteScore
43.00
自引率
2.60%
发文量
241
审稿时长
3 months
期刊介绍: Nature Genetics publishes the very highest quality research in genetics. It encompasses genetic and functional genomic studies on human and plant traits and on other model organisms. Current emphasis is on the genetic basis for common and complex diseases and on the functional mechanism, architecture and evolution of gene networks, studied by experimental perturbation. Integrative genetic topics comprise, but are not limited to: -Genes in the pathology of human disease -Molecular analysis of simple and complex genetic traits -Cancer genetics -Agricultural genomics -Developmental genetics -Regulatory variation in gene expression -Strategies and technologies for extracting function from genomic data -Pharmacological genomics -Genome evolution
期刊最新文献
Genome-wide association analysis provides insights into the molecular etiology of dilated cardiomyopathy Genome-wide association study reveals mechanisms underlying dilated cardiomyopathy and myocardial resilience Author Correction: Brca1 haploinsufficiency promotes early tumor onset and epigenetic alterations in a mouse model of hereditary breast cancer Epigenetic scars of Brca1 loss point toward breast cancer cell of origin A temporal cortex cell atlas highlights gene expression dynamics during human brain maturation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1