双样本汇总水平数据顺式孟德尔随机化的统计方法

IF 3.8 4区医学 Q3 GENETICS & HEREDITY Genetic Epidemiology Pub Date : 2022-10-23 DOI:10.1002/gepi.22506

Apostolos Gkatzionis, Stephen Burgess, Paul J. Newcombe

{"title":"双样本汇总水平数据顺式孟德尔随机化的统计方法","authors":"Apostolos Gkatzionis, Stephen Burgess, Paul J. Newcombe","doi":"10.1002/gepi.22506","DOIUrl":null,"url":null,"abstract":"Mendelian randomization (MR) is the use of genetic variants to assess the existence of a causal relationship between a risk factor and an outcome of interest. Here, we focus on two-sample summary-data MR analyses with many correlated variants from a single gene region, particularly on cis-MR studies which use protein expression as a risk factor. Such studies must rely on a small, curated set of variants from the studied region; using all variants in the region requires inverting an ill-conditioned genetic correlation matrix and results in numerically unstable causal effect estimates. We review methods for variable selection and estimation in cis-MR with summary-level data, ranging from stepwise pruning and conditional analysis to principal components analysis, factor analysis, and Bayesian variable selection. In a simulation study, we show that the various methods have comparable performance in analyses with large sample sizes and strong genetic instruments. However, when weak instrument bias is suspected, factor analysis and Bayesian variable selection produce more reliable inferences than simple pruning approaches, which are often used in practice. We conclude by examining two case studies, assessing the effects of low-density lipoprotein-cholesterol and serum testosterone on coronary heart disease risk using variants in the HMGCR and SHBG gene regions, respectively.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 1","pages":"3-25"},"PeriodicalIF":3.8000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22506","citationCount":"23","resultStr":"{\"title\":\"Statistical methods for cis-Mendelian randomization with two-sample summary-level data\",\"authors\":\"Apostolos Gkatzionis, Stephen Burgess, Paul J. Newcombe\",\"doi\":\"10.1002/gepi.22506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mendelian randomization (MR) is the use of genetic variants to assess the existence of a causal relationship between a risk factor and an outcome of interest. Here, we focus on two-sample summary-data MR analyses with many correlated variants from a single gene region, particularly on cis-MR studies which use protein expression as a risk factor. Such studies must rely on a small, curated set of variants from the studied region; using all variants in the region requires inverting an ill-conditioned genetic correlation matrix and results in numerically unstable causal effect estimates. We review methods for variable selection and estimation in cis-MR with summary-level data, ranging from stepwise pruning and conditional analysis to principal components analysis, factor analysis, and Bayesian variable selection. In a simulation study, we show that the various methods have comparable performance in analyses with large sample sizes and strong genetic instruments. However, when weak instrument bias is suspected, factor analysis and Bayesian variable selection produce more reliable inferences than simple pruning approaches, which are often used in practice. We conclude by examining two case studies, assessing the effects of low-density lipoprotein-cholesterol and serum testosterone on coronary heart disease risk using variants in the HMGCR and SHBG gene regions, respectively.\",\"PeriodicalId\":12710,\"journal\":{\"name\":\"Genetic Epidemiology\",\"volume\":\"47 1\",\"pages\":\"3-25\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2022-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22506\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetic Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22506\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22506","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 23

摘要

孟德尔随机化(MR)是利用遗传变异来评估风险因素与目标结果之间是否存在因果关系。在这里，我们将重点放在双样本汇总数据MR分析上，其中包含来自单个基因区域的许多相关变体，特别是使用蛋白质表达作为风险因素的顺式MR研究。这类研究必须依赖于来自研究地区的一组经过精心策划的小变量;使用该地区的所有变异需要对病态遗传相关矩阵进行反转，并导致在数值上不稳定的因果效应估计。我们回顾了顺式mr中变量选择和估计的方法，从逐步修剪和条件分析到主成分分析、因子分析和贝叶斯变量选择。在模拟研究中，我们表明各种方法在大样本量和强大的遗传工具的分析中具有相当的性能。然而，当怀疑弱仪器偏差时，因子分析和贝叶斯变量选择比实践中经常使用的简单修剪方法产生更可靠的推断。我们通过两个案例研究得出结论，分别使用HMGCR和SHBG基因区域的变异来评估低密度脂蛋白-胆固醇和血清睾酮对冠心病风险的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Statistical methods for cis-Mendelian randomization with two-sample summary-level data

Mendelian randomization (MR) is the use of genetic variants to assess the existence of a causal relationship between a risk factor and an outcome of interest. Here, we focus on two-sample summary-data MR analyses with many correlated variants from a single gene region, particularly on cis-MR studies which use protein expression as a risk factor. Such studies must rely on a small, curated set of variants from the studied region; using all variants in the region requires inverting an ill-conditioned genetic correlation matrix and results in numerically unstable causal effect estimates. We review methods for variable selection and estimation in cis-MR with summary-level data, ranging from stepwise pruning and conditional analysis to principal components analysis, factor analysis, and Bayesian variable selection. In a simulation study, we show that the various methods have comparable performance in analyses with large sample sizes and strong genetic instruments. However, when weak instrument bias is suspected, factor analysis and Bayesian variable selection produce more reliable inferences than simple pruning approaches, which are often used in practice. We conclude by examining two case studies, assessing the effects of low-density lipoprotein-cholesterol and serum testosterone on coronary heart disease risk using variants in the HMGCR and SHBG gene regions, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genetic Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

4.40

自引率

9.50%

发文量

审稿时长

6-12 weeks

期刊介绍： Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations. Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.

期刊最新文献

Shared and Distinct Genetic Factors Underlying Bile Acid Regulation and Intrahepatic Cholestasis of Pregnancy. Variant Prioritization by Pedigree-Based Haplotyping Evaluating a Mendelian Risk Prediction Model That Aggregates Across Genes and Cancers Issue Information Methods for Prioritizing Causal Genes in Molecular Studies of Human Disease: The State of the Art