Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations.

IF 0.4 4区数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-12-01 DOI:10.1515/sagmb-2013-0084

Mette Langaas, Øyvind Bakke

{"title":"Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations.","authors":"Mette Langaas, Øyvind Bakke","doi":"10.1515/sagmb-2013-0084","DOIUrl":null,"url":null,"abstract":"<p><p>In genetic association studies, detecting disease-genotype association is a primary goal. We study seven robust test statistics for such association when the underlying genetic model is unknown, for data on disease status (case or control) and genotype (three genotypes of a biallelic genetic marker). In such studies, p-values have predominantly been calculated by asymptotic approximations or by simulated permutations. We consider an exact method, conditional enumeration. When the number of simulated permutations tends to infinity, the permutation p-value approaches the conditional enumeration p-value, but calculating the latter is much more efficient than performing simulated permutations. We have studied case-control sample sizes with 500-5000 cases and 500-15,000 controls, and significance levels from 5 × 10(-8) to 0.05, thus our results are applicable to genetic association studies with only a few genetic markers under study, intermediate follow-up studies, and genome-wide association studies. Our main findings are: (i) If all monotone genetic models are of interest, the best performance in the situations under study is achieved for the robust test statistics based on the maximum over a range of Cochran-Armitage trend tests with different scores and for the constrained likelihood ratio test. (ii) For significance levels below 0.05, for the test statistics under study, asymptotic approximations may give a test size up to 20 times the nominal level, and should therefore be used with caution. (iii) Calculating p-values based on exact conditional enumeration is a powerful, valid and computationally feasible approach, and we advocate its use in genetic association studies.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"13 6","pages":"675-92"},"PeriodicalIF":0.4000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2013-0084","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2013-0084","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 6

Abstract

In genetic association studies, detecting disease-genotype association is a primary goal. We study seven robust test statistics for such association when the underlying genetic model is unknown, for data on disease status (case or control) and genotype (three genotypes of a biallelic genetic marker). In such studies, p-values have predominantly been calculated by asymptotic approximations or by simulated permutations. We consider an exact method, conditional enumeration. When the number of simulated permutations tends to infinity, the permutation p-value approaches the conditional enumeration p-value, but calculating the latter is much more efficient than performing simulated permutations. We have studied case-control sample sizes with 500-5000 cases and 500-15,000 controls, and significance levels from 5 × 10(-8) to 0.05, thus our results are applicable to genetic association studies with only a few genetic markers under study, intermediate follow-up studies, and genome-wide association studies. Our main findings are: (i) If all monotone genetic models are of interest, the best performance in the situations under study is achieved for the robust test statistics based on the maximum over a range of Cochran-Armitage trend tests with different scores and for the constrained likelihood ratio test. (ii) For significance levels below 0.05, for the test statistics under study, asymptotic approximations may give a test size up to 20 times the nominal level, and should therefore be used with caution. (iii) Calculating p-values based on exact conditional enumeration is a powerful, valid and computationally feasible approach, and we advocate its use in genetic association studies.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在遗传关联研究中检测疾病-基因型关联的鲁棒方法:使用精确条件枚举而不是模拟排列或渐近近似计算p值。

在遗传关联研究中，检测疾病-基因型关联是一个主要目标。当潜在的遗传模型未知时，我们研究了7个关于疾病状态(病例或对照)和基因型(双等位基因遗传标记的三种基因型)的相关检验统计数据。在这类研究中，p值主要是通过渐近逼近或模拟排列来计算的。我们考虑一种精确方法，条件枚举。当模拟排列的数量趋于无穷大时，排列的p值接近条件枚举的p值，但计算后者比执行模拟排列要有效得多。我们研究了500-5000例病例和500- 15000例对照的病例-对照样本量，显著性水平从5 × 10(-8)到0.05，因此我们的结果适用于仅研究少数遗传标记的遗传关联研究、中期随访研究和全基因组关联研究。我们的主要发现是:(i)如果所有的单调遗传模型都是感兴趣的，那么在研究的情况下，基于不同分数的Cochran-Armitage趋势检验范围内的最大值的稳健检验统计量和约束似然比检验达到了最佳性能。(ii)对于0.05以下的显著性水平，对于正在研究的检验统计量，渐近近似可能会给出高达名义水平20倍的检验大小，因此应谨慎使用。(iii)基于精确条件枚举计算p值是一种强大、有效和计算可行的方法，我们提倡在遗传关联研究中使用它。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY

自引率

11.10%

发文量

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.