Fast Analysis of Biobank-Size Data and Meta-Analysis using the BGLR R-package.

IF 2.1 3区 生物学 Q3 GENETICS & HEREDITY G3: Genes|Genomes|Genetics Pub Date : 2024-12-09 DOI:10.1093/g3journal/jkae288
Paulino Pérez-Rodríguez, Gustavo de Los Campos, Hao Wu, Ana I Vazquez, Kyle Jones
{"title":"Fast Analysis of Biobank-Size Data and Meta-Analysis using the BGLR R-package.","authors":"Paulino Pérez-Rodríguez, Gustavo de Los Campos, Hao Wu, Ana I Vazquez, Kyle Jones","doi":"10.1093/g3journal/jkae288","DOIUrl":null,"url":null,"abstract":"<p><p>Analyzing human genomic data from biobanks and large-scale genetic evaluations often requires fitting models with a sample size exceeding the number of DNA markers used (n > p). For instance, developing Polygenic Scores (PGS) for humans and genomic prediction for genetic evaluations of agricultural species may require fitting models involving a few thousand SNPs using data with hundreds of thousands of samples. In such cases, computations based on sufficient statistics are more efficient than those based on individual genotype-phenotype data. Additionally, software that admits sufficient statistics as inputs can be used to analyze data from multiple sources jointly without the need to share individual genotype-phenotype data. Therefore, we developed functionality within the BGLR R-package that generates posterior samples for Bayesian shrinkage and variable selection models from sufficient statistics. In this article, we present an overview of the new methods incorporated in the BGLR R-package, demonstrate the use of the new software through simple examples, provide several computational benchmarks, and present a real-data example using data from the UK-Biobank, All of Us, and the HCHS/SOL cohort demonstrating how a joint analysis from multiple cohorts can be implemented without sharing individual genotype-phenotype data, and how a combined analysis can improve the prediction accuracy of PGS for Hispanics--a group severely underrepresented in GWAS data.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"G3: Genes|Genomes|Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/g3journal/jkae288","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Analyzing human genomic data from biobanks and large-scale genetic evaluations often requires fitting models with a sample size exceeding the number of DNA markers used (n > p). For instance, developing Polygenic Scores (PGS) for humans and genomic prediction for genetic evaluations of agricultural species may require fitting models involving a few thousand SNPs using data with hundreds of thousands of samples. In such cases, computations based on sufficient statistics are more efficient than those based on individual genotype-phenotype data. Additionally, software that admits sufficient statistics as inputs can be used to analyze data from multiple sources jointly without the need to share individual genotype-phenotype data. Therefore, we developed functionality within the BGLR R-package that generates posterior samples for Bayesian shrinkage and variable selection models from sufficient statistics. In this article, we present an overview of the new methods incorporated in the BGLR R-package, demonstrate the use of the new software through simple examples, provide several computational benchmarks, and present a real-data example using data from the UK-Biobank, All of Us, and the HCHS/SOL cohort demonstrating how a joint analysis from multiple cohorts can be implemented without sharing individual genotype-phenotype data, and how a combined analysis can improve the prediction accuracy of PGS for Hispanics--a group severely underrepresented in GWAS data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用BGLR r包快速分析生物库大小数据和元分析。
分析来自生物库的人类基因组数据和大规模遗传评估通常需要使用超过所用DNA标记数量的样本量拟合模型。例如,为人类开发多基因评分(PGS)和为农业物种的遗传评估进行基因组预测,可能需要使用数十万个样本的数据来拟合涉及几千个snp的模型。在这种情况下,基于充分统计的计算比基于个体基因型-表型数据的计算更有效。此外,允许足够的统计数据作为输入的软件可以用于联合分析来自多个来源的数据,而无需共享单个基因型-表型数据。因此,我们在BGLR r包中开发了功能,可以从足够的统计数据中为贝叶斯收缩和变量选择模型生成后验样本。在本文中,我们概述了纳入BGLR r包的新方法,通过简单的示例演示了新软件的使用,提供了几个计算基准,并使用来自UK-Biobank, All of Us和HCHS/SOL队列的数据提供了一个实际数据示例,演示了如何在不共享个体基因型-表型数据的情况下实现来自多个队列的联合分析。以及综合分析如何提高西班牙裔美国人的PGS预测准确性——这一群体在GWAS数据中代表性严重不足。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
G3: Genes|Genomes|Genetics
G3: Genes|Genomes|Genetics GENETICS & HEREDITY-
CiteScore
5.10
自引率
3.80%
发文量
305
审稿时长
3-8 weeks
期刊介绍: G3: Genes, Genomes, Genetics provides a forum for the publication of high‐quality foundational research, particularly research that generates useful genetic and genomic information such as genome maps, single gene studies, genome‐wide association and QTL studies, as well as genome reports, mutant screens, and advances in methods and technology. The Editorial Board of G3 believes that rapid dissemination of these data is the necessary foundation for analysis that leads to mechanistic insights. G3, published by the Genetics Society of America, meets the critical and growing need of the genetics community for rapid review and publication of important results in all areas of genetics. G3 offers the opportunity to publish the puzzling finding or to present unpublished results that may not have been submitted for review and publication due to a perceived lack of a potential high-impact finding. G3 has earned the DOAJ Seal, which is a mark of certification for open access journals, awarded by DOAJ to journals that achieve a high level of openness, adhere to Best Practice and high publishing standards.
期刊最新文献
Female germline expression of OVO transcription factor bridges Drosophila generations. High-quality genome assembly and annotation of the crested gecko (Correlophus ciliatus). A collection of split-Gal4 drivers targeting conserved signaling ligands in Drosophila. Collinearity-based Assembly Correction Tool GUI: Software for collinearity-based genome assembly correction. Masu salmon species complex relationships and sex chromosomes revealed from analyses of the masu salmon (Oncorhynchus masou masou) genome assembly.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1