Grid-based Gaussian process models for longitudinal genetic data

IF 0.6 Q4 STATISTICS & PROBABILITY Communications for Statistical Applications and Methods Pub Date : 2022-01-31 DOI:10.29220/csam.2022.29.1.745

Wonil Chung

{"title":"Grid-based Gaussian process models for longitudinal genetic data","authors":"Wonil Chung","doi":"10.29220/csam.2022.29.1.745","DOIUrl":null,"url":null,"abstract":"Although various statistical methods have been developed to map time-dependent genetic factors, most identiﬁed genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time / environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely di ﬃ cult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main e ﬀ ect or some interaction e ﬀ ect(s), via an unspeciﬁed function. To improve the ﬂexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of ﬁxed grid points although each subject may have di ﬀ erent numbers of measurements at di ﬀ erent time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To e ﬃ ciently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications for Statistical Applications and Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29220/csam.2022.29.1.745","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 1

Abstract

Although various statistical methods have been developed to map time-dependent genetic factors, most identiﬁed genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time / environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely di ﬃ cult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main e ﬀ ect or some interaction e ﬀ ect(s), via an unspeciﬁed function. To improve the ﬂexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of ﬁxed grid points although each subject may have di ﬀ erent numbers of measurements at di ﬀ erent time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To e ﬃ ciently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于网格的纵向遗传数据高斯过程模型

尽管已经开发了各种统计方法来绘制与时间相关的遗传因素，但大多数已确定的遗传变异只能解释纵向性状中估计的遗传变异的一小部分。已知基因-基因和基因-时间/环境相互作用是缺失遗传性的重要推定来源。然而，由于包含这种相互作用的模型的参数空间非常大，映射上位基因-基因相互作用是非常困难的。本文提出了一种基于高斯过程(GP)的非参数贝叶斯变量选择方法。它绘制了多个遗传标记，而不局限于成对的相互作用。GP模型不是显式地对每个主要和交互项进行建模，而是通过未指定的函数测量每个标记的重要性，而不管它主要是由于主要影响还是某些交互影响。为了提高GP模型的灵活性，我们提出了一种基于网格的主题内依赖结构的新方法。该方法可以精确逼近复杂的协方差结构。协方差矩阵的维数仅取决于固定网格点的数量，尽管每个受试者在不同的时间点可能有不同的测量次数。提出了偏差信息准则(DIC)和贝叶斯预测信息准则(BPIC)来选择网格点的最优数量。为了有效地提取后验样本，我们将混合蒙特卡罗方法与部分折叠吉布斯(PCG)采样器相结合。我们将提出的GP模型应用于与年龄相关的体重的小鼠数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Communications for Statistical Applications and Methods STATISTICS & PROBABILITY-

CiteScore

0.90

自引率

0.00%

发文量

期刊介绍： Communications for Statistical Applications and Methods (Commun. Stat. Appl. Methods, CSAM) is an official journal of the Korean Statistical Society and Korean International Statistical Society. It is an international and Open Access journal dedicated to publishing peer-reviewed, high quality and innovative statistical research. CSAM publishes articles on applied and methodological research in the areas of statistics and probability. It features rapid publication and broad coverage of statistical applications and methods. It welcomes papers on novel applications of statistical methodology in the areas including medicine (pharmaceutical, biotechnology, medical device), business, management, economics, ecology, education, computing, engineering, operational research, biology, sociology and earth science, but papers from other areas are also considered.