Robustness of the linear mixed effects model to error distribution assumptions and the consequences for genome-wide association studies.

IF 0.8 4区 数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-10-01 DOI:10.1515/sagmb-2013-0066
Nicole M Warrington, Kate Tilling, Laura D Howe, Lavinia Paternoster, Craig E Pennell, Yan Yan Wu, Laurent Briollais
{"title":"Robustness of the linear mixed effects model to error distribution assumptions and the consequences for genome-wide association studies.","authors":"Nicole M Warrington,&nbsp;Kate Tilling,&nbsp;Laura D Howe,&nbsp;Lavinia Paternoster,&nbsp;Craig E Pennell,&nbsp;Yan Yan Wu,&nbsp;Laurent Briollais","doi":"10.1515/sagmb-2013-0066","DOIUrl":null,"url":null,"abstract":"<p><p>Genome-wide association studies have been successful in uncovering novel genetic variants that are associated with disease status or cross-sectional phenotypic traits. Researchers are beginning to investigate how genes play a role in the development of a trait over time. Linear mixed effects models (LMM) are commonly used to model longitudinal data; however, it is unclear if the failure to meet the models distributional assumptions will affect the conclusions when conducting a genome-wide association study. In an extensive simulation study, we compare coverage probabilities, bias, type 1 error rates and statistical power when the error of the LMM is either heteroscedastic or has a non-Gaussian distribution. We conclude that the model is robust to misspecification if the same function of age is included in the fixed and random effects. However, type 1 error of the genetic effect over time is inflated, regardless of the model misspecification, if the polynomial function for age in the fixed and random effects differs. In situations where the model will not converge with a high order polynomial function in the random effects, a reduced function can be used but a robust standard error needs to be calculated to avoid inflation of the type 1 error. As an illustration, a LMM was applied to longitudinal body mass index (BMI) data over childhood in the ALSPAC cohort; the results emphasised the need for the robust standard error to ensure correct inference of associations of longitudinal BMI with chromosome 16 single nucleotide polymorphisms.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2013-0066","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2013-0066","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 15

Abstract

Genome-wide association studies have been successful in uncovering novel genetic variants that are associated with disease status or cross-sectional phenotypic traits. Researchers are beginning to investigate how genes play a role in the development of a trait over time. Linear mixed effects models (LMM) are commonly used to model longitudinal data; however, it is unclear if the failure to meet the models distributional assumptions will affect the conclusions when conducting a genome-wide association study. In an extensive simulation study, we compare coverage probabilities, bias, type 1 error rates and statistical power when the error of the LMM is either heteroscedastic or has a non-Gaussian distribution. We conclude that the model is robust to misspecification if the same function of age is included in the fixed and random effects. However, type 1 error of the genetic effect over time is inflated, regardless of the model misspecification, if the polynomial function for age in the fixed and random effects differs. In situations where the model will not converge with a high order polynomial function in the random effects, a reduced function can be used but a robust standard error needs to be calculated to avoid inflation of the type 1 error. As an illustration, a LMM was applied to longitudinal body mass index (BMI) data over childhood in the ALSPAC cohort; the results emphasised the need for the robust standard error to ensure correct inference of associations of longitudinal BMI with chromosome 16 single nucleotide polymorphisms.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
线性混合效应模型对误差分布假设的稳健性和全基因组关联研究的结果。
全基因组关联研究已经成功地揭示了与疾病状态或横断面表型性状相关的新型遗传变异。研究人员开始研究基因是如何随着时间的推移在一种特征的发展中发挥作用的。线性混合效应模型(LMM)是常用的纵向数据模型;然而,在进行全基因组关联研究时,不符合模型的分布假设是否会影响结论尚不清楚。在广泛的模拟研究中,我们比较了LMM误差为异方差或非高斯分布时的覆盖概率、偏差、1型错误率和统计功率。我们得出结论,如果在固定效应和随机效应中包含相同的年龄函数,则模型对错误规范具有鲁棒性。然而,如果固定效应和随机效应中的年龄多项式函数不同,则不管模型的错误说明如何,遗传效应随时间的第一类误差都会被夸大。在随机效应中模型不收敛于高阶多项式函数的情况下,可以使用简化函数,但需要计算鲁棒标准误差,以避免第一类误差的膨胀。作为一个例子,LMM应用于ALSPAC队列儿童时期的纵向体重指数(BMI)数据;结果强调需要稳健的标准误差,以确保正确推断纵向BMI与16号染色体单核苷酸多态性的关联。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY
自引率
11.10%
发文量
8
期刊介绍: Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.
期刊最新文献
When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself? Sparse latent factor regression models for genome-wide and epigenome-wide association studies Low variability in the underlying cellular landscape adversely affects the performance of interaction-based approaches for conducting cell-specific analyses of DNA methylation in bulk samples. AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions. Collocation based training of neural ordinary differential equations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1