Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of p values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.
{"title":"ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets","authors":"Sarmistha Das, Deo Kumar Srivastava","doi":"10.1002/gepi.22536","DOIUrl":"10.1002/gepi.22536","url":null,"abstract":"<p>Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of <i>p</i> values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"600-616"},"PeriodicalIF":2.1,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41108946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siri N. Skodvin, Håkon K. Gjessing, Astanand Jugessur, Julia Romanowska, Christian M. Page, Elizabeth C. Corfield, Yunsung Lee, Siri E. Håberg, Miriam Gjerdevik
Infertility is a heterogeneous phenotype, and for many couples, the causes of fertility problems remain unknown. One understudied hypothesis is that allelic interactions between the genotypes of the two parents may influence the risk of infertility. Our aim was, therefore, to investigate how allelic interactions can be modeled using parental genotype data linked to 15,789 pregnancies selected from the Norwegian Mother, Father, and Child Cohort Study. The newborns in 1304 of these pregnancies were conceived using assisted reproductive technologies (ART), and the remainder were conceived naturally. Treating the use of ART as a proxy for infertility, different parameterizations were implemented in a genome-wide screen for interaction effects between maternal and paternal alleles at the same locus. Some of the models were more similar in the way they were parameterized, and some produced similar results when implemented on a genome-wide scale. The results showed near-significant interaction effects in genes relevant to the phenotype under study, such as Dynein axonemal heavy chain 17 (DNAH17) with a recognized role in male infertility. More generally, the interaction models presented here are readily adaptable to the study of other phenotypes in which maternal and paternal allelic interactions are likely to be involved.
{"title":"Statistical methods to detect mother–father genetic interaction effects on risk of infertility: A genome-wide approach","authors":"Siri N. Skodvin, Håkon K. Gjessing, Astanand Jugessur, Julia Romanowska, Christian M. Page, Elizabeth C. Corfield, Yunsung Lee, Siri E. Håberg, Miriam Gjerdevik","doi":"10.1002/gepi.22534","DOIUrl":"10.1002/gepi.22534","url":null,"abstract":"<p>Infertility is a heterogeneous phenotype, and for many couples, the causes of fertility problems remain unknown. One understudied hypothesis is that allelic interactions between the genotypes of the two parents may influence the risk of infertility. Our aim was, therefore, to investigate how allelic interactions can be modeled using parental genotype data linked to 15,789 pregnancies selected from the Norwegian Mother, Father, and Child Cohort Study. The newborns in 1304 of these pregnancies were conceived using assisted reproductive technologies (ART), and the remainder were conceived naturally. Treating the use of ART as a proxy for infertility, different parameterizations were implemented in a genome-wide screen for interaction effects between maternal and paternal alleles at the same locus. Some of the models were more similar in the way they were parameterized, and some produced similar results when implemented on a genome-wide scale. The results showed near-significant interaction effects in genes relevant to the phenotype under study, such as Dynein axonemal heavy chain 17 (<i>DNAH17</i>) with a recognized role in male infertility. More generally, the interaction models presented here are readily adaptable to the study of other phenotypes in which maternal and paternal allelic interactions are likely to be involved.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"503-519"},"PeriodicalIF":2.1,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22534","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10084980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyi Chen, Zhaotong Lin, Xiaotong Shen, Ling Li, Wei Pan
We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.
{"title":"Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data","authors":"Siyi Chen, Zhaotong Lin, Xiaotong Shen, Ling Li, Wei Pan","doi":"10.1002/gepi.22535","DOIUrl":"10.1002/gepi.22535","url":null,"abstract":"<p>We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"585-599"},"PeriodicalIF":2.1,"publicationDate":"2023-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22535","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10158155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gibran Hemani, Apostolos Gkatzionis, Kate Tilling, George Davey Smith
<p>In 2017 we presented the MR Steiger method, a sensitivity analysis in Mendelian randomization (MR) for inferring causal directions between variables (Hemani et al., <span>2017</span>). We discussed many of its potential limitations including that unmeasured confounding under certain extreme circumstances could lead to the wrong inferred causal direction. Lutz et al. (<span>2022</span>) propose an R package (UCRMS) for performing sensitivity analysis of the MR Steiger method, and use it in an illustration to suggest that the MR Steiger method has a ~90% chance of giving the wrong answer due to unmeasured confounding. In this note we will show that an error in their approach to sensitivity analysis leads to the wrong conclusion about the validity of the MR Steiger test. We provide a valid alternative which uses the observed data to investigate sensitivity to unmeasured confounding.</p><p>A sensitivity analysis aims to understand the degree to which a result can change due to uncertainties in the inputs (Saltelli, <span>2002</span>). In this case for the MR Steiger test, we need to ask how sensitive is the inference of the causal direction between X and Y to possible values of unmeasured confounders influencing X and Y. Importantly, there is relative certainty in many of the parameters of this system because they are easily observed, for example, the variances of X, Y and the instrumental variables (IVs), the estimated effect of the IVs on X and Y, and therefore the IV estimate of the effect of X on Y. Often the ordinary least squares (OLS) association between X and Y is also available either due to the analysis being performed using individual level data, or by sourcing the estimate from other published results. Therefore, an appropriate sensitivity analysis must explore the extent to which the inferred causal direction between X and Y can change due to unmeasured confounding, without causing these observed parameters to change.</p><p>Lutz et al.'s proposed method does not attempt to fix all observable parameters. In the simple example provided by Lutz et al. the variance of Y varies between 28 and 39, and the OLS estimate varies between 1 and −1 across the parameter values used for the sensitivity analysis. This arises because the residual variance—which is unobserved—is fixed in their approach. Instead the phenotypic variance—which is observed—should be fixed. If they were presenting a simulation of the general performance of MR Steiger under unmeasured confounding then it would not matter that the simulated parameters are not tied to those observed in a particular empirical analysis. However in a sensitivity analysis, allowing observed parameters to vary provides no value to the analyst. To say that unmeasured confounding could reverse the causal direction, provided that the variance of Y also changes drastically, is of little use to the researcher who has a data set with an observed variance of Y. If some quantities are observed (i.e. the re
2017年,我们提出了MR Steiger方法,这是一种孟德尔随机化(MR)的敏感性分析,用于推断变量之间的因果方向(Hemani et al., 2017)。我们讨论了它的许多潜在局限性,包括在某些极端情况下无法测量的混淆可能导致错误的推断因果方向。Lutz等人(2022)提出了一个R包(UCRMS),用于对MR Steiger方法进行敏感性分析,并在一个插图中使用它来表明MR Steiger方法有90%的机会由于未测量的混杂而给出错误的答案。在本笔记中,我们将表明,在他们的方法敏感性分析的错误导致错误的结论关于MR Steiger测试的有效性。我们提供了一种有效的替代方法,它使用观察到的数据来研究对未测量混杂的敏感性。敏感性分析旨在了解由于输入中的不确定性而导致结果变化的程度(Saltelli, 2002)。在MR Steiger检验的这种情况下,我们需要问X和Y之间因果方向的推断对影响X和Y的未测量混杂因素的可能值有多敏感。重要的是,该系统的许多参数具有相对确定性,因为它们很容易观察到,例如,X、Y和工具变量(IVs)的方差,IVs对X和Y的估计影响,因此,X对Y的影响的IV估计。通常,X和Y之间的普通最小二乘(OLS)关联也可以通过使用个人水平数据进行分析或通过从其他已发表的结果中获取估计而获得。因此,适当的敏感性分析必须探讨在不引起这些观测参数变化的情况下,推断出的X和Y之间的因果方向在多大程度上可能由于未测量的混杂而发生变化。Lutz等人提出的方法并不试图固定所有可观察的参数。在Lutz等人提供的简单示例中,Y的方差在28到39之间变化,用于敏感性分析的参数值的OLS估计值在1到−1之间变化。这是因为残差——未被观察到的——在他们的方法中是固定的。相反,观察到的表型差异应该是固定的。如果他们在未测量的混杂下对MR Steiger的一般性能进行模拟,那么模拟参数与在特定经验分析中观察到的参数无关紧要。然而,在敏感性分析中,允许观察到的参数变化对分析人员没有任何价值。假设Y的方差也急剧变化,那么说未测量的混杂可以逆转因果方向,对于拥有观测方差为Y的数据集的研究人员来说是没有多大用处的。如果观察到一些数量(即Y对X的回归系数、仪器在X中解释的方差、X和Y的方差以及IV效应估计都观察到),只允许β uy ${beta}_{{uy}}$和β ux ${beta}_{{ux}}$变化并通过改变残差方差进行补偿,snp结果r2 ${R}^{2}$在任何β y ${beta}_{{y}}$和下都不会改变β ux ${beta}_{{ux}}$ parameters(支持信息说明)。简要介绍Lutz等人。 在分析中,他们指出,对于β xy =1$ {beta}_{{xy}}=1$的因果效应,β ux =−5$ {beta}_{{ux}}=-5$和β y ${beta有特定的未测量的混杂参数}_{{y}}$的取值范围在0到11之间。使用这些参数,他们建议MR Steiger方法有~90%的机会返回错误的因果方向。但是如果β ux ${beta}_{{ux}}$和β uy ${beta}_{{uy}}$被允许使用相同的值范围(例如:−11至11),那么Steiger方法只会在36%的混淆情况下返回不正确的结果。如果β ux ${beta}_{{ux}}$和β uy ${beta}_{{uy}}$的范围被限制为- 1到1,那么错误的结果只会出现在0.02%的场景中。在我们2017年的论文中(支持信息:注3)我们分析了更广泛的场景范围,以全面评估不可测量的混杂因素通常可能引入问题的程度,并得出结论,在大多数实际情况下,rxy 2 <0.2$ {< mpaddxmlns ="http://www.w3.org/1998/Math/MathML">R</mpadded>}_{{xy}}^{2}lt 0.2$,未测量的混淆导致错误因果方向的机会非常小。如果分析人员有动机检查MR Steiger对未测量混杂的敏感性,则需要采用不同的方法,询问未测量混杂的哪些值支持对给定经验观察数量(X, Y和工具的方差,工具对X和Y的影响,以及X对Y的OLS估计)的推断因果方向。然后分析人员可以确定对其结论提出怀疑所需的混杂值是否合理。或者,可以确定可能的混杂参数空间的多少部分支持推断的因果方向。在补充说明中,我们提供了这个问题的分析解决方案。我们说明,在分析特定水平上,当混杂因素解释X和y中的大部分方差时,未测量的混杂因素逆转MR Steiger推断的因果方向的概率仅超过低概率。该方法包含在TwoSampleMR包中,并且速度很快,因为它使用封闭形式计算而不是UCRMS实现的随机模拟方法。
{"title":"Sensitivity analyses gain relevance by fixing parameters observable during the empirical analyses","authors":"Gibran Hemani, Apostolos Gkatzionis, Kate Tilling, George Davey Smith","doi":"10.1002/gepi.22530","DOIUrl":"10.1002/gepi.22530","url":null,"abstract":"<p>In 2017 we presented the MR Steiger method, a sensitivity analysis in Mendelian randomization (MR) for inferring causal directions between variables (Hemani et al., <span>2017</span>). We discussed many of its potential limitations including that unmeasured confounding under certain extreme circumstances could lead to the wrong inferred causal direction. Lutz et al. (<span>2022</span>) propose an R package (UCRMS) for performing sensitivity analysis of the MR Steiger method, and use it in an illustration to suggest that the MR Steiger method has a ~90% chance of giving the wrong answer due to unmeasured confounding. In this note we will show that an error in their approach to sensitivity analysis leads to the wrong conclusion about the validity of the MR Steiger test. We provide a valid alternative which uses the observed data to investigate sensitivity to unmeasured confounding.</p><p>A sensitivity analysis aims to understand the degree to which a result can change due to uncertainties in the inputs (Saltelli, <span>2002</span>). In this case for the MR Steiger test, we need to ask how sensitive is the inference of the causal direction between X and Y to possible values of unmeasured confounders influencing X and Y. Importantly, there is relative certainty in many of the parameters of this system because they are easily observed, for example, the variances of X, Y and the instrumental variables (IVs), the estimated effect of the IVs on X and Y, and therefore the IV estimate of the effect of X on Y. Often the ordinary least squares (OLS) association between X and Y is also available either due to the analysis being performed using individual level data, or by sourcing the estimate from other published results. Therefore, an appropriate sensitivity analysis must explore the extent to which the inferred causal direction between X and Y can change due to unmeasured confounding, without causing these observed parameters to change.</p><p>Lutz et al.'s proposed method does not attempt to fix all observable parameters. In the simple example provided by Lutz et al. the variance of Y varies between 28 and 39, and the OLS estimate varies between 1 and −1 across the parameter values used for the sensitivity analysis. This arises because the residual variance—which is unobserved—is fixed in their approach. Instead the phenotypic variance—which is observed—should be fixed. If they were presenting a simulation of the general performance of MR Steiger under unmeasured confounding then it would not matter that the simulated parameters are not tied to those observed in a particular empirical analysis. However in a sensitivity analysis, allowing observed parameters to vary provides no value to the analyst. To say that unmeasured confounding could reverse the causal direction, provided that the variance of Y also changes drastically, is of little use to the researcher who has a data set with an observed variance of Y. If some quantities are observed (i.e. the re","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"461-462"},"PeriodicalIF":2.1,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22530","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10001501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Here we compare a recently proposed method and software package, regmed, with our own previously developed package, BayesNetty, designed to allow exploratory analysis of complex causal relationships between biological variables. We find that regmed generally has poorer recall but much better precision than BayesNetty. This is perhaps not too surprising as regmed is specifically designed for use with high-dimensional data. BayesNetty is found to be more sensitive to the resulting multiple testing problem encountered in these circumstances. However, as regmed is not designed to handle missing data, its performance is severely affected when missing data is present, whereas the performance of BayesNetty is only slightly affected. The performance of regmed can be rescued in this situation by first using BayesNetty to impute the missing data, and then applying regmed to the resulting “filled-in” data set.
{"title":"Comparison of regmed and BayesNetty for exploring causal models with many variables","authors":"Richard Howey, Heather J. Cordell","doi":"10.1002/gepi.22532","DOIUrl":"10.1002/gepi.22532","url":null,"abstract":"<p>Here we compare a recently proposed method and software package, <span>regmed</span>, with our own previously developed package, BayesNetty, designed to allow exploratory analysis of complex causal relationships between biological variables. We find that \u0000<span>regmed</span> generally has poorer recall but much better precision than BayesNetty. This is perhaps not too surprising as \u0000<span>regmed</span> is specifically designed for use with high-dimensional data. BayesNetty is found to be more sensitive to the resulting multiple testing problem encountered in these circumstances. However, as \u0000<span>regmed</span> is not designed to handle missing data, its performance is severely affected when missing data is present, whereas the performance of BayesNetty is only slightly affected. The performance of \u0000<span>regmed</span> can be rescued in this situation by first using BayesNetty to impute the missing data, and then applying \u0000<span>regmed</span> to the resulting “filled-in” data set.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"496-502"},"PeriodicalIF":2.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22532","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9689871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manyan Huang, Chen Lyu, Nianjun Liu, Wendy N. Nembhard, John S. Witte, Charlotte A. Hobbs, Ming Li, the National Birth Defects Prevention Study
The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal–fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother–infant pairs and 1306 control mother–infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, TMEM107 (p = 1.64e−06) and CTC1 (p = 2.0e−06), were identified for significant association with CHD in common variants analysis. Gene TMEM107 regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene CTC1 plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of TMEM107 and CTC1 with CHDs.
{"title":"A gene-based association test of interactions for maternal–fetal genotypes identifies genes associated with nonsyndromic congenital heart defects","authors":"Manyan Huang, Chen Lyu, Nianjun Liu, Wendy N. Nembhard, John S. Witte, Charlotte A. Hobbs, Ming Li, the National Birth Defects Prevention Study","doi":"10.1002/gepi.22533","DOIUrl":"10.1002/gepi.22533","url":null,"abstract":"<p>The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal–fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother–infant pairs and 1306 control mother–infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, <i>TMEM107</i> (<i>p</i> = 1.64e−06) and <i>CTC1</i> (<i>p</i> = 2.0e−06), were identified for significant association with CHD in common variants analysis. Gene <i>TMEM107</i> regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene <i>CTC1</i> plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of <i>TMEM107</i> and <i>CTC1</i> with CHDs.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"475-495"},"PeriodicalIF":2.1,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22533","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9669966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phenotypic variation in human is the results of genetic variation and environmental influences. Understanding the contribution of genetic and environmental components to phenotypic variation is of great interest. The variance explained by genome-wide single nucleotide polymorphisms (SNPs) typically represents a small proportion of the phenotypic variance for complex traits, which may be because the genome is only a part of the whole biological process to shape the phenotypes. In this study, we propose to partition the phenotypic variance of three anthropometric traits, using gene expression levels and environmental variables from GTEx data. We use the gene expression of four tissues that are deemed relevant for the anthropometric traits (two adipose tissues, skeletal muscle tissue and blood tissue). Additionally, we estimate the transcriptome–environment correlation that partly underlies the phenotypes of the anthropometric traits. We found that genetic factors play a significant role in determining body mass index (BMI), with the proportion of phenotypic variance explained by gene expression levels of visceral adipose tissue being 0.68 (SE = 0.06). However, we also observed that environmental factors such as age, sex, ancestry, smoking status, and drinking alcohol status have a small but significant impact (0.005, SE = 0.001). Interestingly, we found a significant negative correlation between the transcriptomic and environmental effects on BMI (transcriptome–environment correlation = −0.54, SE = 0.14), suggesting an antagonistic relationship. This implies that individuals with lower genetic profiles may be more susceptible to the effects of environmental factors on BMI, while those with higher genetic profiles may be less susceptible. We also show that the estimated transcriptomic variance varies across tissues, e.g., the gene expression levels of whole blood tissue and environmental variables explain a lower proportion of BMI phenotypic variance (0.16, SE = 0.05 and 0.04, SE = 0.004 respectively). We observed a significant positive correlation between transcriptomic and environmental effects (1.21, SE = 0.23) for this tissue. In conclusion, phenotypic variance partitioning can be done using gene expression and environmental data even with a small sample size (n = 838 from GTEx data), which can provide insights into how the transcriptomic and environmental effects contribute to the phenotypes of the anthropometric traits.
{"title":"Phenotypic variance partitioning by transcriptomic gene expression levels and environmental variables for anthropometric traits using GTEx data","authors":"Pastor Jullian Fabres, S. Hong Lee","doi":"10.1002/gepi.22531","DOIUrl":"10.1002/gepi.22531","url":null,"abstract":"<p>Phenotypic variation in human is the results of genetic variation and environmental influences. Understanding the contribution of genetic and environmental components to phenotypic variation is of great interest. The variance explained by genome-wide single nucleotide polymorphisms (SNPs) typically represents a small proportion of the phenotypic variance for complex traits, which may be because the genome is only a part of the whole biological process to shape the phenotypes. In this study, we propose to partition the phenotypic variance of three anthropometric traits, using gene expression levels and environmental variables from GTEx data. We use the gene expression of four tissues that are deemed relevant for the anthropometric traits (two adipose tissues, skeletal muscle tissue and blood tissue). Additionally, we estimate the transcriptome–environment correlation that partly underlies the phenotypes of the anthropometric traits. We found that genetic factors play a significant role in determining body mass index (BMI), with the proportion of phenotypic variance explained by gene expression levels of visceral adipose tissue being 0.68 (SE = 0.06). However, we also observed that environmental factors such as age, sex, ancestry, smoking status, and drinking alcohol status have a small but significant impact (0.005, SE = 0.001). Interestingly, we found a significant negative correlation between the transcriptomic and environmental effects on BMI (transcriptome–environment correlation = −0.54, SE = 0.14), suggesting an antagonistic relationship. This implies that individuals with lower genetic profiles may be more susceptible to the effects of environmental factors on BMI, while those with higher genetic profiles may be less susceptible. We also show that the estimated transcriptomic variance varies across tissues, e.g., the gene expression levels of whole blood tissue and environmental variables explain a lower proportion of BMI phenotypic variance (0.16, SE = 0.05 and 0.04, SE = 0.004 respectively). We observed a significant positive correlation between transcriptomic and environmental effects (1.21, SE = 0.23) for this tissue. In conclusion, phenotypic variance partitioning can be done using gene expression and environmental data even with a small sample size (<i>n</i> = 838 from GTEx data), which can provide insights into how the transcriptomic and environmental effects contribute to the phenotypes of the anthropometric traits.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"465-474"},"PeriodicalIF":2.1,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22531","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9687294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.
{"title":"Ravages: An R package for the simulation and analysis of rare variants in multicategory phenotypes","authors":"Ozvan Bocher, Gaëlle Marenne, Emmanuelle Génin, Hervé Perdry","doi":"10.1002/gepi.22529","DOIUrl":"10.1002/gepi.22529","url":null,"abstract":"<p>Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"450-460"},"PeriodicalIF":2.1,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22529","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10385156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This commentary briefly describes the process and steps that underlie the launching of the journal Genetic Epidemiology in 1984 and the International Genetic Epidemiology Society (IGES, to be pronounced as “I guess”) in 1992.
{"title":"A Brief History behind the journal Genetic Epidemiology and the International Genetic Epidemiology Society","authors":"Dabeeru C. Rao","doi":"10.1002/gepi.22528","DOIUrl":"10.1002/gepi.22528","url":null,"abstract":"<p>This commentary briefly describes the process and steps that underlie the launching of the journal Genetic Epidemiology in 1984 and the International Genetic Epidemiology Society (IGES, to be pronounced as “I guess”) in 1992.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 5","pages":"361-364"},"PeriodicalIF":2.1,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22528","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9673429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuqi Wang, Chi-Yang Chiu, Alexander F. Wilson, Joan E. Bailey-Wilson, Elvira Agron, Emily Y. Chew, Jaeil Ahn, Momiao Xiong, Ruzong Fan
In genetic studies, many phenotypes have multiple naturally ordered discrete values. The phenotypes can be correlated with each other. If multiple correlated ordinal traits are analyzed simultaneously, the power of analysis may increase significantly while the false positives can be controlled well. In this study, we propose bivariate functional ordinal linear regression (BFOLR) models using latent regressions with cumulative logit link or probit link to perform a gene-based analysis for bivariate ordinal traits and sequencing data. In the proposed BFOLR models, genetic variant data are viewed as stochastic functions of physical positions, and the genetic effects are treated as a function of physical positions. The BFOLR models take the correlation of the two ordinal traits into account via latent variables. The BFOLR models are built upon functional data analysis which can be revised to analyze the bivariate ordinal traits and high-dimension genetic data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Extensive simulation studies show that the likelihood ratio tests of the BFOLR models control type I errors well and have good power performance. The BFOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes, CFH and ARMS2, are found to strongly associate with eye drusen size, drusen area, age-related macular degeneration (AMD) categories, and AMD severity scale.
{"title":"Gene-level association analysis of bivariate ordinal traits with functional regressions","authors":"Shuqi Wang, Chi-Yang Chiu, Alexander F. Wilson, Joan E. Bailey-Wilson, Elvira Agron, Emily Y. Chew, Jaeil Ahn, Momiao Xiong, Ruzong Fan","doi":"10.1002/gepi.22524","DOIUrl":"10.1002/gepi.22524","url":null,"abstract":"<p>In genetic studies, many phenotypes have multiple naturally ordered discrete values. The phenotypes can be correlated with each other. If multiple correlated ordinal traits are analyzed simultaneously, the power of analysis may increase significantly while the false positives can be controlled well. In this study, we propose bivariate functional ordinal linear regression (BFOLR) models using latent regressions with cumulative logit link or probit link to perform a gene-based analysis for bivariate ordinal traits and sequencing data. In the proposed BFOLR models, genetic variant data are viewed as stochastic functions of physical positions, and the genetic effects are treated as a function of physical positions. The BFOLR models take the correlation of the two ordinal traits into account via latent variables. The BFOLR models are built upon functional data analysis which can be revised to analyze the bivariate ordinal traits and high-dimension genetic data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Extensive simulation studies show that the likelihood ratio tests of the BFOLR models control type I errors well and have good power performance. The BFOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes, CFH and ARMS2, are found to strongly associate with eye drusen size, drusen area, age-related macular degeneration (AMD) categories, and AMD severity scale.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"409-431"},"PeriodicalIF":2.1,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10065139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}