Small-cohort GWAS discovery with AI over massive functional genomics knowledge graph.

Kexin Huang, Tony Zeng, Soner Koc, Alexandra Pettet, Jingtian Zhou, Mika Jain, Dongbo Sun, Camilo Ruiz, Hongyu Ren, Laurence Howe, Tom G Richardson, Adrian Cortes, Katie Aiello, Kim Branson, Andreas Pfenning, Jesse M Engreitz, Martin Jinye Zhang, Jure Leskovec
{"title":"Small-cohort GWAS discovery with AI over massive functional genomics knowledge graph.","authors":"Kexin Huang, Tony Zeng, Soner Koc, Alexandra Pettet, Jingtian Zhou, Mika Jain, Dongbo Sun, Camilo Ruiz, Hongyu Ren, Laurence Howe, Tom G Richardson, Adrian Cortes, Katie Aiello, Kim Branson, Andreas Pfenning, Jesse M Engreitz, Martin Jinye Zhang, Jure Leskovec","doi":"10.1101/2024.12.03.24318375","DOIUrl":null,"url":null,"abstract":"<p><p>Genome-wide association studies (GWASs) have identified tens of thousands of disease associated variants and provided critical insights into developing effective treatments. However, limited sample sizes have hindered the discovery of variants for uncommon and rare diseases. Here, we introduce KGWAS, a novel geometric deep learning method that leverages a massive functional knowledge graph across variants and genes to improve detection power in small-cohort GWASs significantly. KGWAS assesses the strength of a variant's association to disease based on the aggregate GWAS evidence across molecular elements interacting with the variant within the knowledge graph. Comprehensive simulations and replication experiments showed that, for small sample sizes ( <i>N</i> =1-10K), KGWAS identified up to 100% more statistically significant associations than state-of-the-art GWAS methods and achieved the same statistical power with up to 2.67× fewer samples. We applied KGWAS to 554 uncommon UK Biobank diseases ( <i>N</i> <sub>case</sub> <5K) and identified 183 more associations (46.9% improvement) than the original GWAS, where the gain further increases to 79.8% for 141 rare diseases (N <sub>case</sub> <300). The KGWAS-only discoveries are supported by abundant functional evidence, such as rs2155219 (on 11q13) associated with ulcerative colitis potentially via regulating <i>LRRC32</i> expression in CD4+ regulatory T cells, and rs7312765 (on 12q12) associated with the rare disease myasthenia gravis potentially via regulating <i>PPHLN1</i> expression in neuron-related cell types. Furthermore, KGWAS consistently improves downstream analyses such as identifying disease-specific network links for interpreting GWAS variants, identifying disease-associated genes, and identifying disease-relevant cell populations. Overall, KGWAS is a flexible and powerful AI model that integrates growing functional genomics data to discover novel variants, genes, cells, and networks, especially valuable for small cohort diseases.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11643201/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv : the preprint server for health sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.12.03.24318375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Genome-wide association studies (GWASs) have identified tens of thousands of disease associated variants and provided critical insights into developing effective treatments. However, limited sample sizes have hindered the discovery of variants for uncommon and rare diseases. Here, we introduce KGWAS, a novel geometric deep learning method that leverages a massive functional knowledge graph across variants and genes to improve detection power in small-cohort GWASs significantly. KGWAS assesses the strength of a variant's association to disease based on the aggregate GWAS evidence across molecular elements interacting with the variant within the knowledge graph. Comprehensive simulations and replication experiments showed that, for small sample sizes ( N =1-10K), KGWAS identified up to 100% more statistically significant associations than state-of-the-art GWAS methods and achieved the same statistical power with up to 2.67× fewer samples. We applied KGWAS to 554 uncommon UK Biobank diseases ( N case <5K) and identified 183 more associations (46.9% improvement) than the original GWAS, where the gain further increases to 79.8% for 141 rare diseases (N case <300). The KGWAS-only discoveries are supported by abundant functional evidence, such as rs2155219 (on 11q13) associated with ulcerative colitis potentially via regulating LRRC32 expression in CD4+ regulatory T cells, and rs7312765 (on 12q12) associated with the rare disease myasthenia gravis potentially via regulating PPHLN1 expression in neuron-related cell types. Furthermore, KGWAS consistently improves downstream analyses such as identifying disease-specific network links for interpreting GWAS variants, identifying disease-associated genes, and identifying disease-relevant cell populations. Overall, KGWAS is a flexible and powerful AI model that integrates growing functional genomics data to discover novel variants, genes, cells, and networks, especially valuable for small cohort diseases.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用人工智能在海量功能基因组学知识图谱上发现小队列 GWAS。
全基因组关联研究(GWAS)发现了数以万计的疾病相关变异,为开发有效的治疗方法提供了重要的启示。然而,有限的样本量阻碍了不常见和罕见疾病变异的发现。在这里,我们介绍一种新颖的几何深度学习方法 KGWAS,它利用跨变异和基因的海量功能知识图谱来显著提高小队列 GWAS 的检测能力。KGWAS 根据知识图谱中与变异体相互作用的分子元素的 GWAS 证据汇总,评估变异体与疾病相关的强度。全面的模拟和复制实验表明,对于小样本量(N =1-10K),KGWAS比最先进的GWAS方法多识别出100%的统计学意义上的关联,并且在样本数量减少2.67倍的情况下达到了相同的统计能力。我们将 KGWAS 应用于 554 种不常见的英国生物库疾病(N 例 CD4+ 调节性 T 细胞中 LRRC32 的表达,以及可能通过调节神经元相关细胞类型中 PPHLN1 的表达而与罕见病重症肌无力相关的 rs7312765(位于 12q12))。此外,KGWAS 还能不断改进下游分析,如识别疾病特异性网络链接以解释 GWAS 变异、识别疾病相关基因和识别疾病相关细胞群。总之,KGWAS 是一种灵活而强大的人工智能模型,它能整合不断增长的功能基因组学数据,发现新的变异、基因、细胞和网络,对小群体疾病尤其有价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Direct Prosthesis Force Control with Tactile Feedback May Connect with the Internal Model. Effects of commonly used antibiotics on children's developing gut microbiomes and resistomes in peri-urban Lima, Peru. Greater lesion damage is bidirectionally related with accelerated brain aging after stroke. Pallidal and motor cortical interactions determine gait initiation dynamics in Parkinson's disease. Prioritizing Parkinson's disease risk genes in genome-wide association loci.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1