Trait Imputation Enhances Nonlinear Genetic Prediction for Some Traits.

IF 5.1 3区生物学 Genetics Pub Date : 2024-09-10 DOI:10.1093/genetics/iyae148

Ruoyu He,Jinwen Fu,Jingchen Ren,Wei Pan

{"title":"Trait Imputation Enhances Nonlinear Genetic Prediction for Some Traits.","authors":"Ruoyu He,Jinwen Fu,Jingchen Ren,Wei Pan","doi":"10.1093/genetics/iyae148","DOIUrl":null,"url":null,"abstract":"The expansive collection of genetic and phenotypic data within biobanks offers an unprecedented opportunity for biomedical research. However, the frequent occurrence of missing phenotypes presents a significant barrier to fully leveraging this potential. In our target application, on one hand, we have only a small and complete dataset with both genotypes and phenotypes to build a genetic prediction model, commonly called a polygenic (risk) score (PGS or PRS); on the other hand, we have a large dataset of genotypes (e.g. from a biobank) without the phenotype of interest. Our goal is to leverage the large dataset of genotypes (but without the phenotype) and a separate GWAS summary dataset of the phenotype to impute the phenotypes, which are then used as an individual-level dataset, along with the small complete dataset, to build a nonlinear model as PGS. More specifically, we trained some nonlinear models to 7 imputed and observed phenotypes from the UK Biobank data. We then trained an ensemble model to integrate these models for each trait, resulting in higher R2 values in prediction than using only the small complete (observed) dataset. Additionally, for 2 of the 7 traits, we observed that the nonlinear model trained with the imputed traits had higher R2 than using the imputed traits directly as the PGS, while for the remaining 5 traits, no improvement was found. These findings demonstrates the potential of leveraging existing genetic data and accounting for nonlinear genetic relationships to improve prediction accuracy for some traits.","PeriodicalId":12706,"journal":{"name":"Genetics","volume":"11 1","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/genetics/iyae148","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The expansive collection of genetic and phenotypic data within biobanks offers an unprecedented opportunity for biomedical research. However, the frequent occurrence of missing phenotypes presents a significant barrier to fully leveraging this potential. In our target application, on one hand, we have only a small and complete dataset with both genotypes and phenotypes to build a genetic prediction model, commonly called a polygenic (risk) score (PGS or PRS); on the other hand, we have a large dataset of genotypes (e.g. from a biobank) without the phenotype of interest. Our goal is to leverage the large dataset of genotypes (but without the phenotype) and a separate GWAS summary dataset of the phenotype to impute the phenotypes, which are then used as an individual-level dataset, along with the small complete dataset, to build a nonlinear model as PGS. More specifically, we trained some nonlinear models to 7 imputed and observed phenotypes from the UK Biobank data. We then trained an ensemble model to integrate these models for each trait, resulting in higher R2 values in prediction than using only the small complete (observed) dataset. Additionally, for 2 of the 7 traits, we observed that the nonlinear model trained with the imputed traits had higher R2 than using the imputed traits directly as the PGS, while for the remaining 5 traits, no improvement was found. These findings demonstrates the potential of leveraging existing genetic data and accounting for nonlinear genetic relationships to improve prediction accuracy for some traits.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

性状置换增强了某些性状的非线性遗传预测能力

生物库中广泛收集的基因和表型数据为生物医学研究提供了前所未有的机遇。然而，经常出现的表型缺失是充分发挥这一潜力的重大障碍。在我们的目标应用中，一方面，我们只有一个包含基因型和表型的小型完整数据集来建立遗传预测模型，即通常所说的多基因（风险）评分（PGS 或 PRS）；另一方面，我们有一个包含基因型的大型数据集（如来自生物库的数据集），却没有感兴趣的表型。我们的目标是利用大型基因型数据集（但不含表型）和单独的表型 GWAS 摘要数据集来估算表型，然后将其作为个体级数据集，与小型完整数据集一起，建立一个非线性模型作为 PGS。更具体地说，我们对英国生物库数据中的 7 个估算表型和观察表型进行了非线性模型训练。然后，我们训练了一个集合模型来整合每个性状的这些模型，结果预测的 R2 值高于仅使用小型完整（观测）数据集的预测值。此外，对于 7 个性状中的 2 个性状，我们观察到使用估算性状训练的非线性模型比直接使用估算性状作为 PGS 的 R2 值更高，而对于其余 5 个性状，没有发现任何改进。这些发现表明，利用现有遗传数据和非线性遗传关系来提高某些性状的预测准确性是有潜力的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Genetics 生物-遗传学

CiteScore

6.20

自引率

6.10%

发文量

177

期刊介绍： GENETICS is published by the Genetics Society of America, a scholarly society that seeks to deepen our understanding of the living world by advancing our understanding of genetics. Since 1916, GENETICS has published high-quality, original research presenting novel findings bearing on genetics and genomics. The journal publishes empirical studies of organisms ranging from microbes to humans, as well as theoretical work. While it has an illustrious history, GENETICS has changed along with the communities it serves: it is not your mentor''s journal. The editors make decisions quickly – in around 30 days – without sacrificing the excellence and scholarship for which the journal has long been known. GENETICS is a peer reviewed, peer-edited journal, with an international reach and increasing visibility and impact. All editorial decisions are made through collaboration of at least two editors who are practicing scientists. GENETICS is constantly innovating: expanded types of content include Reviews, Commentary (current issues of interest to geneticists), Perspectives (historical), Primers (to introduce primary literature into the classroom), Toolbox Reviews, plus YeastBook, FlyBook, and WormBook (coming spring 2016). For particularly time-sensitive results, we publish Communications. As part of our mission to serve our communities, we''ve published thematic collections, including Genomic Selection, Multiparental Populations, Mouse Collaborative Cross, and the Genetics of Sex.

期刊最新文献

Likelihoods for a general class of ARGs under the SMC. On the patterns of genetic intra-tumor heterogeneity before and after treatment. The structural role of Skp1 in the synaptonemal complex is conserved in nematodes. Interaction between ESCRT-III proteins and the yeast SERINC homolog Tms1. Role of male gonad-enriched microRNAs in sperm production in C. elegans.