medRxiv - Genetic and Genomic Medicine最新文献_第8页

Genetic association and machine learning improves discovery and prediction of type 1 diabetes 基因关联和机器学习提高了发现和预测 1 型糖尿病的能力

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-08-02 DOI: 10.1101/2024.07.31.24311310

Carolyn McGrail, Timothy J Sears, Parul Kudtarkar, Hannah Carter, Kyle J Gaulton

Type 1 diabetes (T1D) has a large genetic component, and expanded genetic studies of T1D can lead to novel biological and therapeutic discovery and improved risk prediction. In this study, we performed genetic association and fine-mapping analyses in 817,718 European ancestry samples genome-wide and 29,746 samples at the MHC locus, which identified 165 independent risk signals for T1D of which 19 were novel. We used risk variants to train a machine learning model (named T1GRS) to predict T1D, which highly differentiated T1D from non-disease and type 2 diabetes (T2D) in Europeans as well as African Americans at or beyond the level of current standards. We identified extensive non-linear interactions between risk loci in T1GRS, for example between HLA-DQB1*57 and INS, coding and non-coding HLA alleles, and DEXI, INS and other beta cell loci, that provided mechanistic insight and improved risk prediction. T1D individuals formed distinct clusters based on genetic features from T1GRS which had significant differences in age of onset, HbA1c, and renal disease severity. Finally, we provided T1GRS in formats to enhance accessibility of risk prediction to any user and computing environment. Overall, the improved genetic discovery and prediction of T1D will have wide clinical, therapeutic, and research applications.

1 型糖尿病（T1D）有很大的遗传因素，扩大对 T1D 的遗传研究可以发现新的生物学和治疗方法，并改善风险预测。在这项研究中，我们对 817,718 份欧洲血统样本进行了全基因组遗传关联分析和精细图谱分析，并对 29,746 份样本进行了 MHC 位点分析，结果发现了 165 个独立的 T1D 风险信号，其中 19 个是新的风险信号。我们利用风险变异来训练一个机器学习模型（命名为 T1GRS），以预测 T1D，该模型能高度区分欧洲人和非裔美国人中的 T1D 与非疾病和 2 型糖尿病（T2D），达到或超过现行标准的水平。我们在 T1GRS 中发现了风险位点之间广泛的非线性相互作用，例如 HLA-DQB1*57 和 INS、编码和非编码 HLA 等位基因以及 DEXI、INS 和其他 beta 细胞位点之间的相互作用，这些相互作用提供了机理上的见解并改进了风险预测。根据 T1GRS 的遗传特征，T1D 患者形成了不同的群组，这些群组在发病年龄、HbA1c 和肾病严重程度方面存在显著差异。最后，我们提供了 T1GRS 的格式，使任何用户和计算环境都能更方便地进行风险预测。总之，改进 T1D 的基因发现和预测将在临床、治疗和研究方面得到广泛应用。

{"title":"Genetic association and machine learning improves discovery and prediction of type 1 diabetes","authors":"Carolyn McGrail, Timothy J Sears, Parul Kudtarkar, Hannah Carter, Kyle J Gaulton","doi":"10.1101/2024.07.31.24311310","DOIUrl":"https://doi.org/10.1101/2024.07.31.24311310","url":null,"abstract":"Type 1 diabetes (T1D) has a large genetic component, and expanded genetic studies of T1D can lead to novel biological and therapeutic discovery and improved risk prediction. In this study, we performed genetic association and fine-mapping analyses in 817,718 European ancestry samples genome-wide and 29,746 samples at the MHC locus, which identified 165 independent risk signals for T1D of which 19 were novel. We used risk variants to train a machine learning model (named T1GRS) to predict T1D, which highly differentiated T1D from non-disease and type 2 diabetes (T2D) in Europeans as well as African Americans at or beyond the level of current standards. We identified extensive non-linear interactions between risk loci in T1GRS, for example between HLA-DQB1*57 and INS, coding and non-coding HLA alleles, and DEXI, INS and other beta cell loci, that provided mechanistic insight and improved risk prediction. T1D individuals formed distinct clusters based on genetic features from T1GRS which had significant differences in age of onset, HbA1c, and renal disease severity. Finally, we provided T1GRS in formats to enhance accessibility of risk prediction to any user and computing environment. Overall, the improved genetic discovery and prediction of T1D will have wide clinical, therapeutic, and research applications.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141887145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Deep Ensemble Encoder Network Method for Improved Polygenic Risk Score Prediction 改进多基因风险评分预测的深度集合编码器网络方法

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-08-02 DOI: 10.1101/2024.07.31.24311311

Okan B Ozdemir, Ruining Chen, Ruowang Li

Genome-wide association studies (GWAS) of various heritable human traits and diseases have identified numerous associated single nucleotide polymorphisms (SNPs), most of which have small or modest effects. Polygenic risk scores (PRS) aim to better estimate individuals' genetic predisposition by aggregating the effects of multiple SNPs from GWAS. However, current PRS is designed to capture only simple linear genetic effects across the genome, limiting their ability to fully account for the complex polygenic architecture. To address this, we propose DeepEnsembleEncodeNet (DEEN), a new method that ensembles autoencoders and fully connected neural networks (FCNNs) to better identify and model linear and non-linear SNP effects across different genomic regions, improving its ability to predict disease risks. To demonstrate DEEN's performance, we optimized the model across binary and continuous traits from the UK Biobank (UKBB). Model evaluation on the held-out UKBB testing dataset, as well as the independent All of Us (AoU) dataset, showed improved prediction and risk stratification, consistently outperforming other methods.

针对人类各种遗传性状和疾病的全基因组关联研究（GWAS）发现了许多相关的单核苷酸多态性（SNPs），其中大多数影响较小或不大。多基因风险评分（PRS）旨在通过汇总 GWAS 中多个 SNPs 的影响，更好地估计个体的遗传易感性。然而，目前的多基因风险评分仅能捕捉整个基因组中简单的线性遗传效应，从而限制了其充分考虑复杂的多基因结构的能力。为了解决这个问题，我们提出了 DeepEnsembleEncodeNet（DEEN），这是一种将自动编码器和全连接神经网络（FCNN）组合在一起的新方法，可以更好地识别不同基因组区域的线性和非线性 SNP 效应并建立模型，从而提高预测疾病风险的能力。为了证明 DEEN 的性能，我们在英国生物库 (UKBB) 的二元和连续性状中对模型进行了优化。在英国生物库测试数据集和独立的 "我们所有人"（AoU）数据集上进行的模型评估显示，预测和风险分层能力得到了提高，始终优于其他方法。

引用次数: 0

Machine learning-based proteogenomic data modeling identifies circulating plasma biomarkers for early detection of lung cancer 基于机器学习的蛋白质基因组数据建模确定了用于早期检测肺癌的循环血浆生物标记物

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-08-01 DOI: 10.1101/2024.07.30.24311241

Marcela A Johnson, Liping Hou, Bevan Emma Huang, Assieh Saadatpour, Abolfazl Doostparast Torshizi

Identifying genetic variants associated with lung cancer (LC) risk and their impact on plasma protein levels is crucial for understanding LC predisposition. The discovery of risk biomarkers can enhance early LC screening protocols and improve prognostic interventions. In this study, we performed a genome-wide association analysis using the UK Biobank and FinnGen. We identified genetic variants associated with LC and protein levels leveraging the UK Biobank Pharma Proteomics Project. The dysregulated proteins were then analyzed in pre-symptomatic LC cases compared to healthy controls followed by training machine learning models to predict future LC diagnosis. We achieved median AUCs ranging from 0.79 to 0.88 (0-4 years before diagnosis/YBD), 0.73 to 0.83 (5-9YBD), and 0.78 to 0.84 (0-9YBD) based on 5-fold cross-validation. Conducting survival analysis using the 5-9YBD cohort, we identified eight proteins, including CALCB, PLAUR/uPAR, and CD74 whose higher levels were associated with worse overall survival. We also identified potential plasma biomarkers, including previously reported candidates such as CEACAM5, CXCL17, GDF15, and WFDC2, which have shown associations with future LC diagnosis. These proteins are enriched in various pathways, including cytokine signaling, interleukin regulation, neutrophil degranulation, and lung fibrosis. In conclusion, this study generates novel insights into our understanding of the genome-proteome dynamics in LC. Furthermore, our findings present a promising panel of non-invasive plasma biomarkers that hold potential to support early LC screening initiatives and enhance future diagnostic interventions.

确定与肺癌（LC）风险相关的基因变异及其对血浆蛋白水平的影响对于了解肺癌易感性至关重要。发现风险生物标志物可以加强早期肺癌筛查方案并改善预后干预措施。在这项研究中，我们利用英国生物库和 FinnGen 进行了全基因组关联分析。我们利用英国生物库医药蛋白质组学项目确定了与 LC 和蛋白质水平相关的基因变异。然后，与健康对照组相比，分析了症状前 LC 病例中的失调蛋白，并训练了机器学习模型来预测未来的 LC 诊断。基于5倍交叉验证，我们获得了0.79至0.88（诊断前0-4年/YBD）、0.73至0.83（诊断前5-9年/YBD）和0.78至0.84（诊断前0-9年/YBD）的中位数AUC。在对 5-9YBD 组群进行生存分析时，我们发现包括 CALCB、PLAUR/uPAR 和 CD74 在内的 8 种蛋白质的水平越高，总生存期越短。我们还发现了潜在的血浆生物标志物，包括之前报道过的候选标志物，如 CEACAM5、CXCL17、GDF15 和 WFDC2，这些标志物与未来的 LC 诊断有关联。这些蛋白富集在各种通路中，包括细胞因子信号转导、白细胞介素调节、中性粒细胞脱颗粒和肺纤维化。总之，这项研究为我们了解 LC 的基因组-蛋白质组动态提供了新的视角。此外，我们的研究结果还提出了一组前景广阔的非侵入性血浆生物标志物，这些标志物有望支持早期肺癌筛查计划并增强未来的诊断干预措施。

{"title":"Machine learning-based proteogenomic data modeling identifies circulating plasma biomarkers for early detection of lung cancer","authors":"Marcela A Johnson, Liping Hou, Bevan Emma Huang, Assieh Saadatpour, Abolfazl Doostparast Torshizi","doi":"10.1101/2024.07.30.24311241","DOIUrl":"https://doi.org/10.1101/2024.07.30.24311241","url":null,"abstract":"Identifying genetic variants associated with lung cancer (LC) risk and their impact on plasma protein levels is crucial for understanding LC predisposition. The discovery of risk biomarkers can enhance early LC screening protocols and improve prognostic interventions. In this study, we performed a genome-wide association analysis using the UK Biobank and FinnGen. We identified genetic variants associated with LC and protein levels leveraging the UK Biobank Pharma Proteomics Project. The dysregulated proteins were then analyzed in pre-symptomatic LC cases compared to healthy controls followed by training machine learning models to predict future LC diagnosis. We achieved median AUCs ranging from 0.79 to 0.88 (0-4 years before diagnosis/YBD), 0.73 to 0.83 (5-9YBD), and 0.78 to 0.84 (0-9YBD) based on 5-fold cross-validation. Conducting survival analysis using the 5-9YBD cohort, we identified eight proteins, including CALCB, PLAUR/uPAR, and CD74 whose higher levels were associated with worse overall survival. We also identified potential plasma biomarkers, including previously reported candidates such as CEACAM5, CXCL17, GDF15, and WFDC2, which have shown associations with future LC diagnosis. These proteins are enriched in various pathways, including cytokine signaling, interleukin regulation, neutrophil degranulation, and lung fibrosis. In conclusion, this study generates novel insights into our understanding of the genome-proteome dynamics in LC. Furthermore, our findings present a promising panel of non-invasive plasma biomarkers that hold potential to support early LC screening initiatives and enhance future diagnostic interventions.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Genetic associations with human longevity are enriched for oncogenic genes. 与人类长寿相关的基因富含致癌基因。

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-08-01 DOI: 10.1101/2024.07.30.24311226

Junyoung Park, Andrés Peña-Tauber, Lia Talozzi, Michael D. Greicius, Yann Le Guen

Human lifespan is shaped by both genetic and environmental exposures and their interaction. To enable precision health, it is essential to understand how genetic variants contribute to earlier death or prolonged survival. In this study, we tested the association of common genetic variants and the burden of rare non-synonymous variants in a survival analysis, using age-at-death (N = 35,551, median [min, max] = 72.4 [40.9, 85.2]), and last-known-age (N = 358,282, median [min, max] = 71.9 [52.6, 88.7]), in European ancestry participants of the UK Biobank. The associations we identified seemed predominantly driven by cancer, likely due to the age range of the cohort. Common variant analysis highlighted three longevity-associated loci: APOE, ZSCAN23, and MUC5B. We identified six genes whose burden of loss-of-function variants is significantly associated with reduced lifespan: TET2, ATM, BRCA2, CKMT1B, BRCA1 and ASXL1. Additionally, in eight genes, the burden of pathogenic missense variants was associated with reduced lifespan: DNMT3A, SF3B1, CHL1, TET2, PTEN, SOX21, TP53 and SRSF2. Most of these genes have previously been linked to oncogenic-related pathways and some are linked to and are known to harbor somatic variants that predispose to clonal hematopoiesis. A direction-agnostic (SKAT-O) approach additionally identified significant associations with C1orf52, TERT, IDH2, and RLIM, highlighting a link between telomerase function and longevity as well as identifying additional oncogenic genes.Our results emphasize the importance of understanding genetic factors driving the most prevalent causes of mortality at a population level, highlighting the potential of early genetic testing to identify germline and somatic variants increasing one's susceptibility to cancer and/or early death.

人类的寿命受遗传和环境暴露及其相互作用的影响。为了实现精准健康，了解遗传变异如何导致提早死亡或延长生存期至关重要。在这项研究中，我们利用英国生物库的欧洲血统参与者的死亡年龄（N = 35551，中位数[最小，最大] = 72.4 [40.9，85.2]）和最后已知年龄（N = 358282，中位数[最小，最大] = 71.9 [52.6，88.7]），在生存分析中测试了常见遗传变异与罕见非同义变异负担之间的关联。我们发现的关联似乎主要由癌症驱动，这可能是由于队列的年龄范围所致。常见变异分析强调了三个与长寿相关的位点：APOE、ZSCAN23 和 MUC5B。我们发现有六个基因的功能缺失变异与寿命缩短密切相关：TET2、ATM、BRCA2、CKMT1B、BRCA1 和 ASXL1。此外，有八个基因的致病性错义变异与寿命缩短有关：DNMT3A、SF3B1、CHL1、TET2、PTEN、SOX21、TP53 和 SRSF2。这些基因中的大多数以前都与致癌相关通路有关，其中一些与体细胞变异有关，而且已知这些体细胞变异容易导致克隆性造血。我们的研究结果强调了在人群水平上了解导致最普遍死亡原因的遗传因素的重要性，突出了早期基因检测在确定增加癌症和/或早死易感性的种系和体细胞变异方面的潜力。

{"title":"Genetic associations with human longevity are enriched for oncogenic genes.","authors":"Junyoung Park, Andrés Peña-Tauber, Lia Talozzi, Michael D. Greicius, Yann Le Guen","doi":"10.1101/2024.07.30.24311226","DOIUrl":"https://doi.org/10.1101/2024.07.30.24311226","url":null,"abstract":"Human lifespan is shaped by both genetic and environmental exposures and their interaction. To enable precision health, it is essential to understand how genetic variants contribute to earlier death or prolonged survival. In this study, we tested the association of common genetic variants and the burden of rare non-synonymous variants in a survival analysis, using age-at-death (N = 35,551, median [min, max] = 72.4 [40.9, 85.2]), and last-known-age (N = 358,282, median [min, max] = 71.9 [52.6, 88.7]), in European ancestry participants of the UK Biobank. The associations we identified seemed predominantly driven by cancer, likely due to the age range of the cohort. Common variant analysis highlighted three longevity-associated loci: APOE, ZSCAN23, and MUC5B. We identified six genes whose burden of loss-of-function variants is significantly associated with reduced lifespan: TET2, ATM, BRCA2, CKMT1B, BRCA1 and ASXL1. Additionally, in eight genes, the burden of pathogenic missense variants was associated with reduced lifespan: DNMT3A, SF3B1, CHL1, TET2, PTEN, SOX21, TP53 and SRSF2. Most of these genes have previously been linked to oncogenic-related pathways and some are linked to and are known to harbor somatic variants that predispose to clonal hematopoiesis. A direction-agnostic (SKAT-O) approach additionally identified significant associations with C1orf52, TERT, IDH2, and RLIM, highlighting a link between telomerase function and longevity as well as identifying additional oncogenic genes.\u0000Our results emphasize the importance of understanding genetic factors driving the most prevalent causes of mortality at a population level, highlighting the potential of early genetic testing to identify germline and somatic variants increasing one's susceptibility to cancer and/or early death.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating genomic variants and developmental milestones to predict cognitive and adaptive outcomes in autistic children 整合基因组变异和发育里程碑，预测自闭症儿童的认知和适应结果

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-08-01 DOI: 10.1101/2024.07.31.24311250

Vincent-Raphael Bourque, Zoe Schmilovich, Guillaume Huguet, Jade England, Adeniran Okewole, Cecile Poulain, Thomas Renne, Martineau Jean-Louis, Zohra Saci, Xinhe Zhang, Thomas Rolland, Aurelie Labbe, Jacob Vorstman, Guy Rouleau, Simon Baron-Cohen, Laurent Mottron, Richard A.I. Bethlehem, Varun Warrier, Sebastien Jacquemont

Although the first signs of autism are often observed as early as 18-36 months of age, there is a broad uncertainty regarding future development, and clinicians lack predictive tools to identify those who will later be diagnosed with co-occurring intellectual disability (ID). Here, we developed predictive models of ID in autistic children (n=5,633 from three cohorts), integrating different classes of genetic variants alongside developmental milestones. The integrated model yielded an AUC ROC=0.65, with this predictive performance cross-validated and generalised across cohorts. Positive predictive values reached up to 55%, accurately identifying 10% of ID cases. The ability to stratify the probabilities of ID using genetic variants was up to twofold greater in individuals with delayed milestones compared to those with typical development. These findings underscore the potential of models in neurodevelopmental medicine that integrate genomics and clinical observations to predict outcomes and target interventions.

虽然自闭症的最初症状通常在 18-36 个月大时就可观察到，但未来的发展却存在广泛的不确定性，临床医生缺乏预测工具来识别那些日后会被诊断为并发智障（ID）的儿童。在此，我们开发了自闭症儿童智障的预测模型（n=5,633，来自三个队列），将不同类别的遗传变异与发育里程碑整合在一起。综合模型的AUC ROC=0.65，这一预测性能经过交叉验证，并在不同队列中得到推广。阳性预测值高达 55%，能准确识别 10% 的 ID 病例。利用基因变异对发育里程碑延迟个体的ID概率进行分层的能力是发育典型个体的两倍。这些发现强调了神经发育医学模型的潜力，该模型整合了基因组学和临床观察，可预测结果并有针对性地采取干预措施。

{"title":"Integrating genomic variants and developmental milestones to predict cognitive and adaptive outcomes in autistic children","authors":"Vincent-Raphael Bourque, Zoe Schmilovich, Guillaume Huguet, Jade England, Adeniran Okewole, Cecile Poulain, Thomas Renne, Martineau Jean-Louis, Zohra Saci, Xinhe Zhang, Thomas Rolland, Aurelie Labbe, Jacob Vorstman, Guy Rouleau, Simon Baron-Cohen, Laurent Mottron, Richard A.I. Bethlehem, Varun Warrier, Sebastien Jacquemont","doi":"10.1101/2024.07.31.24311250","DOIUrl":"https://doi.org/10.1101/2024.07.31.24311250","url":null,"abstract":"Although the first signs of autism are often observed as early as 18-36 months of age, there is a broad uncertainty regarding future development, and clinicians lack predictive tools to identify those who will later be diagnosed with co-occurring intellectual disability (ID). Here, we developed predictive models of ID in autistic children (n=5,633 from three cohorts), integrating different classes of genetic variants alongside developmental milestones. The integrated model yielded an AUC ROC=0.65, with this predictive performance cross-validated and generalised across cohorts. Positive predictive values reached up to 55%, accurately identifying 10% of ID cases. The ability to stratify the probabilities of ID using genetic variants was up to twofold greater in individuals with delayed milestones compared to those with typical development. These findings underscore the potential of models in neurodevelopmental medicine that integrate genomics and clinical observations to predict outcomes and target interventions.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Social Determinants of Health and Lifestyle Risk Factors Modulate Genetic Susceptibility for Women's Health Outcomes 健康的社会决定因素和生活方式风险因素调节妇女健康结果的遗传易感性

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-07-31 DOI: 10.1101/2024.07.29.24311189

Lindsay A Guare, Jagyashila Das, Lannawill Caruth, Shefali Setia Verma

Women's health conditions are influenced by both genetic and environmental factors. Understanding these factors individually and their interactions is crucial for implementing preventative, personalized medicine. However, since genetics and environmental exposures, particularly social determinants of health (SDoH), are correlated with race and ancestry, risk models without careful consideration of these measures can exacerbate health disparities. We focused on seven women's health disorders in the All of Us Research Program: breast cancer, cervical cancer, endometriosis, ovarian cancer, preeclampsia, uterine cancer, and uterine fibroids. We computed polygenic risk scores (PRSs) from publicly available weights and tested the effect of the PRSs on their respective phenotypes as well as any effects of genetic risk on age at diagnosis. We next tested the effects of environmental risk factors (BMI, lifestyle measures, and SDoH) on age at diagnosis. Finally, we examined the impact of environmental exposures in modulating genetic risk by stratified logistic regressions for different tertiles of the environment variables, comparing the effect size of the PRS. Of the twelve sets of weights for the seven conditions, nine were significantly and positively associated with their respective phenotypes. None of the PRSs was associated with different age at diagnoses in the time-to-event analyses. The highest environmental risk group tended to be diagnosed earlier than the low and medium-risk groups. For example, the cases of breast cancer, ovarian cancer, uterine cancer, and uterine fibroids in highest BMI tertile were diagnosed significantly earlier than the low and medium BMI groups, respectively). PRS regression coefficients were often the largest in the highest environment risk groups, showing increased susceptibility to genetic risk. This study's strengths include the diversity of the All of Us study cohort, the consideration of SDoH themes, and the examination of key risk factors and their interrelationships. These elements collectively underscore the importance of integrating genetic and environmental data to develop more precise risk models, enhance personalized medicine, and ultimately reduce health disparities.

妇女的健康状况受到遗传和环境因素的影响。单独了解这些因素及其相互作用对于实施预防性个性化医疗至关重要。然而，由于遗传和环境暴露，尤其是健康的社会决定因素（SDoH），与种族和血统相关，因此不仔细考虑这些因素的风险模型可能会加剧健康差距。在 "我们所有人研究计划 "中，我们重点研究了七种女性健康疾病：乳腺癌、宫颈癌、子宫内膜异位症、卵巢癌、子痫前期、子宫癌和子宫肌瘤。我们根据公开的权重计算了多基因风险评分（PRS），并测试了多基因风险评分对各自表型的影响，以及遗传风险对诊断年龄的影响。接下来，我们测试了环境风险因素（体重指数、生活方式和 SDoH）对诊断年龄的影响。最后，我们对环境变量的不同分层进行了分层逻辑回归，比较了 PRS 的效应大小，从而检验了环境暴露在调节遗传风险方面的影响。在七个条件的 12 组权重中，有九组与各自的表型显著正相关。在时间到事件分析中，没有一个 PRS 与不同的诊断年龄相关。环境风险最高的组别往往比低风险和中等风险组别更早确诊。例如，最高 BMI 三元组中的乳腺癌、卵巢癌、子宫癌和子宫肌瘤病例的确诊时间分别明显早于低和中 BMI 组）。环境风险最高组的 PRS 回归系数往往最大，表明遗传风险的易感性增加。这项研究的优点包括 "我们所有人 "研究队列的多样性、对 SDoH 主题的考虑以及对关键风险因素及其相互关系的研究。这些因素共同强调了整合遗传和环境数据以开发更精确的风险模型、加强个性化医疗并最终减少健康差异的重要性。

{"title":"Social Determinants of Health and Lifestyle Risk Factors Modulate Genetic Susceptibility for Women's Health Outcomes","authors":"Lindsay A Guare, Jagyashila Das, Lannawill Caruth, Shefali Setia Verma","doi":"10.1101/2024.07.29.24311189","DOIUrl":"https://doi.org/10.1101/2024.07.29.24311189","url":null,"abstract":"Women's health conditions are influenced by both genetic and environmental factors. Understanding these factors individually and their interactions is crucial for implementing preventative, personalized medicine. However, since genetics and environmental exposures, particularly social determinants of health (SDoH), are correlated with race and ancestry, risk models without careful consideration of these measures can exacerbate health disparities. We focused on seven women's health disorders in the All of Us Research Program: breast cancer, cervical cancer, endometriosis, ovarian cancer, preeclampsia, uterine cancer, and uterine fibroids. We computed polygenic risk scores (PRSs) from publicly available weights and tested the effect of the PRSs on their respective phenotypes as well as any effects of genetic risk on age at diagnosis. We next tested the effects of environmental risk factors (BMI, lifestyle measures, and SDoH) on age at diagnosis. Finally, we examined the impact of environmental exposures in modulating genetic risk by stratified logistic regressions for different tertiles of the environment variables, comparing the effect size of the PRS. Of the twelve sets of weights for the seven conditions, nine were significantly and positively associated with their respective phenotypes. None of the PRSs was associated with different age at diagnoses in the time-to-event analyses. The highest environmental risk group tended to be diagnosed earlier than the low and medium-risk groups. For example, the cases of breast cancer, ovarian cancer, uterine cancer, and uterine fibroids in highest BMI tertile were diagnosed significantly earlier than the low and medium BMI groups, respectively). PRS regression coefficients were often the largest in the highest environment risk groups, showing increased susceptibility to genetic risk. This study's strengths include the diversity of the All of Us study cohort, the consideration of SDoH themes, and the examination of key risk factors and their interrelationships. These elements collectively underscore the importance of integrating genetic and environmental data to develop more precise risk models, enhance personalized medicine, and ultimately reduce health disparities.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Haplotype Analysis Reveals Pleiotropic Disease Associations in the HLA Region 单倍型分析揭示了 HLA 区域的多型性疾病相关性

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-07-31 DOI: 10.1101/2024.07.29.24311183

Courtney Jean Smith, Satu Strausz, FinnGen, Jeffrey P Spence, Hanna M Ollila, Jonathan K Pritchard

The human leukocyte antigen (HLA) region plays an important role in human health through involvement in immune cell recognition and maturation. While genetic variation in the HLA region is associated with many diseases, the pleiotropic patterns of these associations have not been systematically investigated. Here, we developed a haplotype approach to investigate disease associations phenome-wide for 412,181 Finnish individuals and 2,459 traits. Across the 1,035 diseases with a GWAS association, we found a 17-fold average per-SNP enrichment of hits in the HLA region. Altogether, we identified 7,649 HLA associations across 647 traits, including 1,750 associations uncovered by haplotype analysis. We find some haplotypes show trade-offs between diseases, while others consistently increase risk across traits, indicating a complex pleiotropic landscape involving a range of diseases. This study highlights the extensive impact of HLA variation on disease risk, and underscores the importance of classical and non-classical genes, as well as non-coding variation.

人类白细胞抗原（HLA）区域通过参与免疫细胞的识别和成熟，在人类健康中发挥着重要作用。虽然 HLA 区域的遗传变异与许多疾病有关，但这些关联的多效应模式尚未得到系统研究。在此，我们开发了一种单倍型方法，对 412,181 个芬兰个体和 2,459 个性状的疾病关联进行全表型调查。在 1035 种具有 GWAS 关联性的疾病中，我们发现 HLA 区域中平均每个 SNP 的命中率富集了 17 倍。我们总共发现了 647 个性状中的 7649 个 HLA 相关性，包括通过单倍型分析发现的 1750 个相关性。我们发现一些单倍型在不同疾病之间表现出权衡作用，而另一些单倍型则在不同性状之间持续增加风险，这表明存在着涉及一系列疾病的复杂的多效应格局。这项研究凸显了 HLA 变异对疾病风险的广泛影响，并强调了经典和非经典基因以及非编码变异的重要性。

引用次数: 0

Disentangling shared genetic etiologies for kidney function and cardiovascular diseases 厘清肾功能和心血管疾病的共同遗传病因

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-07-27 DOI: 10.1101/2024.07.26.24310191

Jun Qiao, Kaixin Yao, Yujuan Yuan, Xichen Yang, Le Zhou, Yinqi Long, Miaoran Chen, Wenjia Xie, Yixuan Yang, Yangpo Cao, Siim Pauklin, Jinguo Xu, Yining Yang, Yuliang Feng

Cardiovascular diseases (CVDs) are the leading cause of death worldwide, with chronic kidney disease (CKD) identified as a significant risk factor. CKD is primarily monitored through the estimated glomerular filtration rate (eGFR), calculated using the CKD-EPI equation. Although epidemiological and clinical studies have consistently demonstrated strong associations between eGFR and CVDs, the genetic underpinnings of this relationship remain elusive. Recent genome-wide association studies (GWAS) have highlighted the polygenic nature of these conditions and identified several risk loci correlating with their cross-phenotypes. Nonetheless, the extent and pattern of their pleiotropic effects have yet to be fully elucidated. We analyzed the most comprehensive GWAS summary statistics, involving around 7.5 million individuals, to investigate the shared genetic architectures and the underlying mechanisms between eGFR and CVDs, focusing on single nucleotide polymorphisms (SNPs), genes, biological pathways, and proteins exhibiting pleiotropic effects. Our study identified 508 distinct genomic locations associated with pleiotropic effects across multiple traits, involving 379 unique genes, notably L3MBTL3 (6q23.1), MMP24 (20q11.22), and ABO (9q34.2). Additionally, pathways such as stem cell population maintenance and the glutathione metabolism pathway were pivotal in mediating the relationships between these traits. From the perspective of vertical pleiotropy, our findings suggest a causal relationship between eGFR and conditions such as atrial fibrillation and venous thromboembolism. These insights significantly enhance our understanding of the genetic links between eGFR and CVDs, potentially guiding the development of novel therapeutic strategies and improving the clinical management of these conditions.

心血管疾病（CVD）是导致全球死亡的主要原因，而慢性肾脏疾病（CKD）被认为是一个重要的风险因素。慢性肾脏病主要通过估算的肾小球滤过率（eGFR）进行监测，eGFR 是通过 CKD-EPI 公式计算得出的。尽管流行病学和临床研究不断证明 eGFR 与心血管疾病之间存在密切联系，但这种关系的遗传基础仍然难以捉摸。最近的全基因组关联研究（GWAS）强调了这些疾病的多基因性质，并确定了与其交叉表型相关的几个风险位点。然而，其多向效应的程度和模式尚未完全阐明。我们分析了涉及约 750 万人的最全面的 GWAS 统计摘要，以研究 eGFR 和心血管疾病之间的共享遗传结构和潜在机制，重点关注表现出多向效应的单核苷酸多态性（SNP）、基因、生物通路和蛋白质。我们的研究发现了 508 个不同的基因组位置与多种性状的多向效应相关，涉及 379 个独特的基因，尤其是 L3MBTL3（6q23.1）、MMP24（20q11.22）和 ABO（9q34.2）。此外，干细胞群体维持和谷胱甘肽代谢途径等通路在介导这些性状之间的关系中起着关键作用。从垂直多效性的角度来看，我们的研究结果表明，eGFR 与心房颤动和静脉血栓栓塞等疾病之间存在因果关系。这些见解大大加深了我们对 eGFR 与心血管疾病之间遗传联系的理解，有可能指导新型治疗策略的开发并改善这些疾病的临床管理。

{"title":"Disentangling shared genetic etiologies for kidney function and cardiovascular diseases","authors":"Jun Qiao, Kaixin Yao, Yujuan Yuan, Xichen Yang, Le Zhou, Yinqi Long, Miaoran Chen, Wenjia Xie, Yixuan Yang, Yangpo Cao, Siim Pauklin, Jinguo Xu, Yining Yang, Yuliang Feng","doi":"10.1101/2024.07.26.24310191","DOIUrl":"https://doi.org/10.1101/2024.07.26.24310191","url":null,"abstract":"Cardiovascular diseases (CVDs) are the leading cause of death worldwide, with chronic kidney disease (CKD) identified as a significant risk factor. CKD is primarily monitored through the estimated glomerular filtration rate (eGFR), calculated using the CKD-EPI equation. Although epidemiological and clinical studies have consistently demonstrated strong associations between eGFR and CVDs, the genetic underpinnings of this relationship remain elusive. Recent genome-wide association studies (GWAS) have highlighted the polygenic nature of these conditions and identified several risk loci correlating with their cross-phenotypes. Nonetheless, the extent and pattern of their pleiotropic effects have yet to be fully elucidated. We analyzed the most comprehensive GWAS summary statistics, involving around 7.5 million individuals, to investigate the shared genetic architectures and the underlying mechanisms between eGFR and CVDs, focusing on single nucleotide polymorphisms (SNPs), genes, biological pathways, and proteins exhibiting pleiotropic effects. Our study identified 508 distinct genomic locations associated with pleiotropic effects across multiple traits, involving 379 unique genes, notably L3MBTL3 (6q23.1), MMP24 (20q11.22), and ABO (9q34.2). Additionally, pathways such as stem cell population maintenance and the glutathione metabolism pathway were pivotal in mediating the relationships between these traits. From the perspective of vertical pleiotropy, our findings suggest a causal relationship between eGFR and conditions such as atrial fibrillation and venous thromboembolism. These insights significantly enhance our understanding of the genetic links between eGFR and CVDs, potentially guiding the development of novel therapeutic strategies and improving the clinical management of these conditions.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141771238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes LDAK-KVIK 可对定量和二元表型进行快速、强大的混合模型关联分析

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-07-26 DOI: 10.1101/2024.07.25.24311005

Jasper Hof, Doug Speed

Mixed-model association analysis (MMAA) is the preferred tool for performing a genome-wide association study, because it enables robust control of type 1 error and increased statistical power to detect trait-associated loci. However, existing MMAA tools often suffer from long runtimes and high memory requirements. We present LDAK-KVIK, a novel MMAA tool for analyzing quantitative and binary phenotypes. Using simulated phenotypes, we show that LDAK-KVIK produces well-calibrated test statistics, both for homogeneous and heterogeneous datasets. LDAK-KVIK is computationally-efficient, requiring less than 20 CPU hours and 8Gb memory to analyse genome-wide data for 350k individuals. These demands are similar to those of REGENIE, one of the most efficient existing MMAA tools, and up to 30 times less than those of BOLT-LMM, currently the most powerful MMAA tool. When applied to real phenotypes, LDAK-KVIK has the highest power of all tools considered. For example, across 40 quantitative phenotypes from the UK Biobank (average sample size 349k), LDAK-KVIK finds 16% more significant loci than classical linear regression, whereas BOLT-LMM and REGENIE find 15% and 11% more, respectively. LDAK-KVIK can also perform gene-based tests; across the 40 quantitative UK Biobank phenotypes, LDAK-KVIK finds 18% more significant genes than the leading existing tool.

混合模型关联分析（MMAA）是进行全基因组关联研究的首选工具，因为它能稳健地控制1型误差，提高检测性状相关位点的统计能力。然而，现有的 MMAA 工具往往存在运行时间长、内存要求高的问题。我们介绍了 LDAK-KVIK，这是一种用于分析定量和二元表型的新型 MMAA 工具。通过模拟表型，我们发现 LDAK-KVIK 可以生成校准良好的测试统计量，无论是同质数据集还是异质数据集。LDAK-KVIK 的计算效率很高，分析 350k 个个体的全基因组数据只需不到 20 个 CPU 小时和 8Gb 内存。这些要求与现有最高效的 MMAA 工具之一 REGENIE 相似，比目前最强大的 MMAA 工具 BOLT-LMM 低 30 倍。当应用于真实表型时，LDAK-KVIK 是所有工具中功能最强的。例如，在英国生物库（平均样本量为 349k）的 40 个定量表型中，LDAK-KVIK 发现的重要基因位点比经典线性回归多 16%，而 BOLT-LMM 和 REGENIE 发现的重要基因位点分别多 15% 和 11%。LDAK-KVIK 还可以进行基于基因的测试；在英国生物样本库的 40 种定量表型中，LDAK-KVIK 发现的重要基因比现有的主要工具多 18%。

{"title":"LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes","authors":"Jasper Hof, Doug Speed","doi":"10.1101/2024.07.25.24311005","DOIUrl":"https://doi.org/10.1101/2024.07.25.24311005","url":null,"abstract":"Mixed-model association analysis (MMAA) is the preferred tool for performing a genome-wide association study, because it enables robust control of type 1 error and increased statistical power to detect trait-associated loci. However, existing MMAA tools often suffer from long runtimes and high memory requirements. We present LDAK-KVIK, a novel MMAA tool for analyzing quantitative and binary phenotypes. Using simulated phenotypes, we show that LDAK-KVIK produces well-calibrated test statistics, both for homogeneous and heterogeneous datasets. LDAK-KVIK is computationally-efficient, requiring less than 20 CPU hours and 8Gb memory to analyse genome-wide data for 350k individuals. These demands are similar to those of REGENIE, one of the most efficient existing MMAA tools, and up to 30 times less than those of BOLT-LMM, currently the most powerful MMAA tool. When applied to real phenotypes, LDAK-KVIK has the highest power of all tools considered. For example, across 40 quantitative phenotypes from the UK Biobank (average sample size 349k), LDAK-KVIK finds 16% more significant loci than classical linear regression, whereas BOLT-LMM and REGENIE find 15% and 11% more, respectively. LDAK-KVIK can also perform gene-based tests; across the 40 quantitative UK Biobank phenotypes, LDAK-KVIK finds 18% more significant genes than the leading existing tool.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Population Performance and Individual Agreement of Coronary Artery Disease Polygenic Risk Scores 冠状动脉疾病多基因风险评分的人群表现和个体一致性

medRxiv - Genetic and Genomic Medicine

Pub Date : 2024-07-26 DOI: 10.1101/2024.07.25.24310931

Sarah A Abramowitz, Kristin Boulier, Karl Keat, Katherine Cardone, Manu Shivakumar, John M. DePaolo, Renae M. Judy, Penn Medicine BioBank, Dokyoon Kim, Daniel J Rader, Marylyn D Ritchie, Benjamin F Voight, Bogdan Pasaniuc, Michael Levin, Scott M. Damrauer

Importance: Polygenic risk scores (PRSs) for coronary artery disease (CAD) are a growing clinical and commercial reality. Whether existing scores provide similar individual-level assessments of disease liability is a critical consideration for clinical implementation that remains uncharacterized. Objective:Characterize the reliability of CAD PRSs that perform equivalently at the population level at predicting individual-level risk. Design:Cross-sectional Study. Setting:All of Us Research Program (AOU), Penn Medicine Biobank (PMBB), and UCLA ATLAS Precision Health Biobank. Participants: Volunteers of diverse genetic backgrounds enrolled in AOU, PMBB, and UCLA with available electronic health record and genotyping data. Exposures:Polygenic risk for CAD from previously published PRSs and new PRSs developed separately from the testing cohorts. Main Outcomes and Measures:Sets of CAD PRSs that perform population prediction equivalently were identified by comparing calibration and discrimination (Brier score and AUROC) of generalized linear models of prevalent CAD using Bayesian analysis of variance. Among equivalently performing scores, individual-level agreement between risk estimates was tested with intraclass correlation (ICC) and Light's Kappa, measures of inter-rater reliability. Results:50 PRSs were calculated for 171,095 AOU participants. When included in a model of prevalent CAD, 48 scores had practically equivalent Brier scores and AUROCs (region of practical equivalence = 0.02). Across these scores, 84% of participants had at least one score in both the top and bottom risk quintile. Continuous agreement of individual risk predictions from the 48 scores was poor, with an ICC of 0.351 (95% CI; 0.349, 0.352). Agreement between two statistically equivalent scores was moderate, with an ICC of 0.649 (95% CI; 0.646, 0.652). Light's Kappa, used to evaluate consistency of assignment to high-risk thresholds, did not exceed 0.56 (interpreted as 'fair') across statistically and practically equivalent scores. Repeating the analysis among 41,193 PMBB and 50,748 UCLA participants yielded different sets of statistically and practically equivalent scores which also lacked strong individual agreement. Conclusions and Relevance:Across three diverse biobanks, CAD PRSs that performed equivalently at the population level produced unreliable individual risk estimates. Approaches to clinical implementation of CAD PRSs must consider the potential for discordant individual risk estimates from otherwise indistinguishable scores.

重要性：冠状动脉疾病（CAD）的多基因风险评分（PRSs）在临床和商业上的应用日益广泛。现有的评分是否能对疾病责任提供类似的个体水平评估，是临床实施的一个关键考虑因素，但目前尚未定性。目的：描述在人群水平上预测个体风险表现相当的 CAD PRS 的可靠性。设计：横断面研究。地点：All of Us Research Program (AOU)、Penn Medicine Biobank (PMBB) 和 UCLA ATLAS Precision Health Biobank。参与者：在 AOU、PMBB 和加州大学洛杉矶分校注册并拥有电子健康记录和基因分型数据的具有不同遗传背景的志愿者。暴露因素：CAD 的多基因风险来自之前公布的 PRS 和根据测试队列单独开发的新 PRS。主要结果和测量指标：通过贝叶斯方差分析比较流行性 CAD 的广义线性模型的校准和区分度（Brier 评分和 AUROC），确定了对人群进行等效预测的 CAD PRS 集。在性能相当的评分中，使用类内相关（ICC）和莱特卡帕（Light's Kappa）测试了个人水平的风险估计值之间的一致性，这是衡量评分者之间可靠性的指标。结果：为 171,095 名 AOU 参与者计算了 50 个 PRS。当纳入流行性冠状动脉粥样硬化模型时，有 48 个评分的 Brier 评分和 AUROC 实际上是等效的（实际等效区域 = 0.02）。在这些分数中，84% 的参与者至少有一个分数处于风险最高和最低的五分位数。48 个评分的个人风险预测连续一致性较差，ICC 为 0.351 (95% CI; 0.349, 0.352)。两个统计等效评分之间的一致性为中等，ICC 为 0.649 (95% CI; 0.646, 0.652)。用于评估高风险阈值分配一致性的莱特卡帕（Light's Kappa）在统计和实际评分相当的情况下不超过 0.56（解释为 "尚可"）。在 41,193 名 PMBB 和 50,748 名 UCLA 参与者中重复进行分析，得出了不同的统计和实际等效分数集，这些分数集也缺乏很强的个体一致性。结论和意义：在三个不同的生物库中，在人群水平上表现相当的 CAD PRS 产生了不可靠的个体风险估计值。临床应用 CAD PRS 的方法必须考虑到从原本无差别的评分中得出不一致的个体风险估计值的可能性。

{"title":"Population Performance and Individual Agreement of Coronary Artery Disease Polygenic Risk Scores","authors":"Sarah A Abramowitz, Kristin Boulier, Karl Keat, Katherine Cardone, Manu Shivakumar, John M. DePaolo, Renae M. Judy, Penn Medicine BioBank, Dokyoon Kim, Daniel J Rader, Marylyn D Ritchie, Benjamin F Voight, Bogdan Pasaniuc, Michael Levin, Scott M. Damrauer","doi":"10.1101/2024.07.25.24310931","DOIUrl":"https://doi.org/10.1101/2024.07.25.24310931","url":null,"abstract":"Importance: Polygenic risk scores (PRSs) for coronary artery disease (CAD) are a growing clinical and commercial reality. Whether existing scores provide similar individual-level assessments of disease liability is a critical consideration for clinical implementation that remains uncharacterized. Objective:\u0000Characterize the reliability of CAD PRSs that perform equivalently at the population level at predicting individual-level risk. Design:\u0000Cross-sectional Study. Setting:\u0000All of Us Research Program (AOU), Penn Medicine Biobank (PMBB), and UCLA ATLAS Precision Health Biobank. Participants: Volunteers of diverse genetic backgrounds enrolled in AOU, PMBB, and UCLA with available electronic health record and genotyping data. Exposures:\u0000Polygenic risk for CAD from previously published PRSs and new PRSs developed separately from the testing cohorts. Main Outcomes and Measures:\u0000Sets of CAD PRSs that perform population prediction equivalently were identified by comparing calibration and discrimination (Brier score and AUROC) of generalized linear models of prevalent CAD using Bayesian analysis of variance. Among equivalently performing scores, individual-level agreement between risk estimates was tested with intraclass correlation (ICC) and Light's Kappa, measures of inter-rater reliability. Results:\u000050 PRSs were calculated for 171,095 AOU participants. When included in a model of prevalent CAD, 48 scores had practically equivalent Brier scores and AUROCs (region of practical equivalence = 0.02). Across these scores, 84% of participants had at least one score in both the top and bottom risk quintile. Continuous agreement of individual risk predictions from the 48 scores was poor, with an ICC of 0.351 (95% CI; 0.349, 0.352). Agreement between two statistically equivalent scores was moderate, with an ICC of 0.649 (95% CI; 0.646, 0.652). Light's Kappa, used to evaluate consistency of assignment to high-risk thresholds, did not exceed 0.56 (interpreted as 'fair') across statistically and practically equivalent scores. Repeating the analysis among 41,193 PMBB and 50,748 UCLA participants yielded different sets of statistically and practically equivalent scores which also lacked strong individual agreement. Conclusions and Relevance:\u0000Across three diverse biobanks, CAD PRSs that performed equivalently at the population level produced unreliable individual risk estimates. Approaches to clinical implementation of CAD PRSs must consider the potential for discordant individual risk estimates from otherwise indistinguishable scores.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141771239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0