Population stratification correction using Bayesian shrinkage priors for genetic association studies

IF 1 4区 生物学 Q4 GENETICS & HEREDITY Annals of Human Genetics Pub Date : 2023-09-28 DOI:10.1111/ahg.12527
Zilu Liu, Asuman S. Turkmen, Shili Lin
{"title":"Population stratification correction using Bayesian shrinkage priors for genetic association studies","authors":"Zilu Liu,&nbsp;Asuman S. Turkmen,&nbsp;Shili Lin","doi":"10.1111/ahg.12527","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>Population stratification (PS) is a major source of confounding in population-based genetic association studies of quantitative traits. Principal component regression (PCR) and linear mixed model (LMM) are two commonly used approaches to account for PS in association studies. Previous studies have shown that LMM can be interpreted as including all principal components (PCs) as random-effect covariates. However, including all PCs in LMM may dilute the influence of relevant PCs in some scenarios, while including only a few preselected PCs in PCR may fail to fully capture the genetic diversity.</p>\n </section>\n \n <section>\n \n <h3> Materials and methods</h3>\n \n <p>To address these shortcomings, we introduce Bayestrat—a method to detect associated variants with PS correction under the Bayesian LASSO framework. To adjust for PS, Bayestrat accommodates a large number of PCs and utilizes appropriate shrinkage priors to shrink the effects of nonassociated PCs.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Simulation results show that Bayestrat consistently controls type I error rates and achieves higher power compared to its non-shrinkage counterparts, especially when the number of PCs included in the model is large. As a demonstration of the utility of Bayestrat, we apply it to the Multi-Ethnic Study of Atherosclerosis (MESA). Variants and genes associated with serum triglyceride or HDL cholesterol are identified in our analyses.</p>\n </section>\n \n <section>\n \n <h3> Discussion</h3>\n \n <p>The automatic and self-selection features of Bayestrat make it particularly suited in situations with complex underlying PS scenarios, where it is unknown a priori which PCs are potential confounders, yet the number that needs to be considered could be large in order to fully account for PS.</p>\n </section>\n </div>","PeriodicalId":8085,"journal":{"name":"Annals of Human Genetics","volume":"87 6","pages":"302-315"},"PeriodicalIF":1.0000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ahg.12527","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Human Genetics","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ahg.12527","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Population stratification (PS) is a major source of confounding in population-based genetic association studies of quantitative traits. Principal component regression (PCR) and linear mixed model (LMM) are two commonly used approaches to account for PS in association studies. Previous studies have shown that LMM can be interpreted as including all principal components (PCs) as random-effect covariates. However, including all PCs in LMM may dilute the influence of relevant PCs in some scenarios, while including only a few preselected PCs in PCR may fail to fully capture the genetic diversity.

Materials and methods

To address these shortcomings, we introduce Bayestrat—a method to detect associated variants with PS correction under the Bayesian LASSO framework. To adjust for PS, Bayestrat accommodates a large number of PCs and utilizes appropriate shrinkage priors to shrink the effects of nonassociated PCs.

Results

Simulation results show that Bayestrat consistently controls type I error rates and achieves higher power compared to its non-shrinkage counterparts, especially when the number of PCs included in the model is large. As a demonstration of the utility of Bayestrat, we apply it to the Multi-Ethnic Study of Atherosclerosis (MESA). Variants and genes associated with serum triglyceride or HDL cholesterol are identified in our analyses.

Discussion

The automatic and self-selection features of Bayestrat make it particularly suited in situations with complex underlying PS scenarios, where it is unknown a priori which PCs are potential confounders, yet the number that needs to be considered could be large in order to fully account for PS.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
遗传关联研究中使用贝叶斯收缩先验的群体分层校正。
引言:群体分层(PS)是基于群体的数量性状遗传关联研究中混淆的主要来源。主成分回归(PCR)和线性混合模型(LMM)是关联研究中解释PS的两种常用方法。先前的研究表明,LMM可以被解释为包括所有主成分(PC)作为随机效应协变量。然而,在某些情况下,将所有PC纳入LMM可能会削弱相关PC的影响,而仅将少数预选PC纳入PCR可能无法完全捕捉遗传多样性。材料和方法:为了解决这些缺点,我们介绍了Bayestrat——一种在贝叶斯LASSO框架下通过PS校正检测相关变体的方法。为了调整PS,Bayestrat容纳了大量的PC,并利用适当的收缩先验来收缩非关联PC的效果。结果:仿真结果表明,与非收缩PC相比,Bayestat始终控制I型错误率,并实现了更高的功率,尤其是当模型中包含的PC数量很大时。为了证明Bayestrat的实用性,我们将其应用于动脉粥样硬化的多民族研究(MESA)。在我们的分析中确定了与血清甘油三酯或高密度脂蛋白胆固醇相关的变体和基因。讨论:Bayestrat的自动和自选择功能使其特别适合于具有复杂潜在PS场景的情况,在这种情况下,先验地不知道哪些PC是潜在的混杂因素,但为了充分考虑PS,需要考虑的数量可能很大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Annals of Human Genetics
Annals of Human Genetics 生物-遗传学
CiteScore
4.20
自引率
0.00%
发文量
34
审稿时长
3 months
期刊介绍: Annals of Human Genetics publishes material directly concerned with human genetics or the application of scientific principles and techniques to any aspect of human inheritance. Papers that describe work on other species that may be relevant to human genetics will also be considered. Mathematical models should include examples of application to data where possible. Authors are welcome to submit Supporting Information, such as data sets or additional figures or tables, that will not be published in the print edition of the journal, but which will be viewable via the online edition and stored on the website.
期刊最新文献
Intermittent episodes of acute severe encephalomyopathy and early death in two siblings caused by biallelic likely pathogenic variants in FASTKD2: Expanding phenotype and literature review. Secondary findings in 443 exome sequencing data. Gastroesophageal reflux disease increases predisposition to severe COVID-19: Insights from integrated Mendelian randomization and genetic analysis. Clinical and immunological features of four patients with activation-induced cytidine deaminase deficiency: Renal amyloidosis and other presentations. Incorporating familial risk, lifestyle factors, and pharmacogenomic insights into personalized noncommunicable disease (NCD) reports for healthcare funder beneficiaries participating in the Open Genome Project.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1