A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank.

IF 8.1 1区生物学 Q1 GENETICS & HEREDITY American journal of human genetics Pub Date : 2019-12-05 Epub Date: 2019-11-14 DOI:10.1016/j.ajhg.2019.10.008

Wenjian Bi, Zhangchen Zhao, Rounak Dey, Lars G Fritsche, Bhramar Mukherjee, Seunggeun Lee

{"title":"A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank.","authors":"Wenjian Bi, Zhangchen Zhao, Rounak Dey, Lars G Fritsche, Bhramar Mukherjee, Seunggeun Lee","doi":"10.1016/j.ajhg.2019.10.008","DOIUrl":null,"url":null,"abstract":"<p><p>The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G × E) effects. Compared with marginal genetic association studies, G × E analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G × E effects. However, due to the large computation cost and the presence of case-control imbalance, existing methods often fail. Here we propose a scalable and accurate method, SPAGE (SaddlePoint Approximation implementation of G × E analysis), that is applicable for genome-wide scale phenome-wide G × E studies. SPAGE fits a genotype-independent logistic model only once across the genome-wide analysis in order to reduce computation cost, and SPAGE uses a saddlepoint approximation (SPA) to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios. Simulation studies show that SPAGE is 33-79 times faster than the Wald test and 72-439 times faster than the Firth's test, and SPAGE can control type I error rates at the genome-wide significance level even when case-control ratios are extremely unbalanced. Through the analysis of UK-Biobank data of 344,341 white British European-ancestry samples, we show that SPAGE can efficiently analyze large samples while controlling for unbalanced case-control ratios.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"1182-1192"},"PeriodicalIF":8.1000,"publicationDate":"2019-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6904814/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of human genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ajhg.2019.10.008","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/11/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G × E) effects. Compared with marginal genetic association studies, G × E analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G × E effects. However, due to the large computation cost and the presence of case-control imbalance, existing methods often fail. Here we propose a scalable and accurate method, SPAGE (SaddlePoint Approximation implementation of G × E analysis), that is applicable for genome-wide scale phenome-wide G × E studies. SPAGE fits a genotype-independent logistic model only once across the genome-wide analysis in order to reduce computation cost, and SPAGE uses a saddlepoint approximation (SPA) to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios. Simulation studies show that SPAGE is 33-79 times faster than the Wald test and 72-439 times faster than the Firth's test, and SPAGE can control type I error rates at the genome-wide significance level even when case-control ratios are extremely unbalanced. Through the analysis of UK-Biobank data of 344,341 white British European-ancestry samples, we show that SPAGE can efficiently analyze large samples while controlling for unbalanced case-control ratios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种快速准确的全基因组全表型G × E分析方法及其在英国生物样本库中的应用。

大多数复杂疾病的病因涉及遗传变异、环境因素和基因-环境相互作用（G × E）效应。与边际遗传关联研究相比，gxe分析需要更多的样本和详细的环境暴露测量，这限制了可能的发现。具有详细表型和环境信息的大规模基于人群的生物库，如UK-Biobank，可以成为鉴定G × E效应的理想资源。然而，由于计算成本大，且存在病例控制不平衡，现有方法往往失败。在这里，我们提出了一种可扩展和精确的方法，SPAGE (SaddlePoint Approximation implementation of G × E analysis)，它适用于全基因组规模的全表型G × E研究。为了降低计算成本，SPAGE在全基因组分析中只拟合一次与基因型无关的逻辑模型，并且SPAGE使用鞍点近似（SPA）来校准病例对照比不平衡的表型分析的检验统计量。模拟研究表明，SPAGE比Wald检验快33-79倍，比Firth检验快72-439倍，即使在病例-对照比极度不平衡的情况下，SPAGE也能在全基因组显著性水平上控制I型错误率。通过对UK-Biobank中344,341份英国白人欧洲血统样本的分析，我们发现SPAGE可以有效地分析大样本，同时控制不平衡的病例-对照比率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

American journal of human genetics 生物-遗传学

CiteScore

14.70

自引率

4.10%

发文量

185

审稿时长

1 months

期刊介绍： The American Journal of Human Genetics (AJHG) is a monthly journal published by Cell Press, chosen by The American Society of Human Genetics (ASHG) as its premier publication starting from January 2008. AJHG represents Cell Press's first society-owned journal, and both ASHG and Cell Press anticipate significant synergies between AJHG content and that of other Cell Press titles.