首页 > 最新文献

Genetic Epidemiology最新文献

英文 中文
ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets ioSearch:一种使用新算法识别相互作用的多组学生物标志物的方法,应用于癌症数据集。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-10-05 DOI: 10.1002/gepi.22536
Sarmistha Das, Deo Kumar Srivastava

Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of p values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.

通过将多种组学整合在一起来鉴定生物标志物是很重要的,因为复杂的疾病是由于各种遗传物质的复杂相互作用而发生的。由于多重测试负担,传统的单组学关联测试既没有探索这种关键的组间依赖性,也没有识别出中等弱的信号。相反,多组学数据集成提供了互补的信息,但会增加多重测试负担、不同组学特征固有的数据多样性、高维度等。大多数可用的方法使用降维技术来解决亚型分类问题,以避免样本量问题,但相互作用的多组学生物标志物识别方法不可用。我们提出了一个两步模型,首先使用逻辑回归研究表型-组学关联。然后,使用稀疏主成分选择疾病相关组学,该主成分在多变量多元回归框架中从两个组学中探索多个变量的相互关系。在这个模型的基础上,我们开发了一种多组学生物标志物识别算法,即相互作用组学搜索(ioSearch),该算法通过使用通路信息来联合测试多个组学与疾病以及组学之间关联的影响,从而减少多重测试负担。此外,根据p值进行推断可能使其成为一种易于解释的生物标志物识别工具。广泛的模拟表明ioSearch在统计上是强大的,具有可控的I型错误率。它在公开的癌症数据集中的应用确定了重要途径中的相关组学特征。
{"title":"ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets","authors":"Sarmistha Das,&nbsp;Deo Kumar Srivastava","doi":"10.1002/gepi.22536","DOIUrl":"10.1002/gepi.22536","url":null,"abstract":"<p>Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of <i>p</i> values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"600-616"},"PeriodicalIF":2.1,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41108946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical methods to detect mother–father genetic interaction effects on risk of infertility: A genome-wide approach 检测父母遗传相互作用对不孕风险影响的统计方法:全基因组方法。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-08-28 DOI: 10.1002/gepi.22534
Siri N. Skodvin, Håkon K. Gjessing, Astanand Jugessur, Julia Romanowska, Christian M. Page, Elizabeth C. Corfield, Yunsung Lee, Siri E. Håberg, Miriam Gjerdevik

Infertility is a heterogeneous phenotype, and for many couples, the causes of fertility problems remain unknown. One understudied hypothesis is that allelic interactions between the genotypes of the two parents may influence the risk of infertility. Our aim was, therefore, to investigate how allelic interactions can be modeled using parental genotype data linked to 15,789 pregnancies selected from the Norwegian Mother, Father, and Child Cohort Study. The newborns in 1304 of these pregnancies were conceived using assisted reproductive technologies (ART), and the remainder were conceived naturally. Treating the use of ART as a proxy for infertility, different parameterizations were implemented in a genome-wide screen for interaction effects between maternal and paternal alleles at the same locus. Some of the models were more similar in the way they were parameterized, and some produced similar results when implemented on a genome-wide scale. The results showed near-significant interaction effects in genes relevant to the phenotype under study, such as Dynein axonemal heavy chain 17 (DNAH17) with a recognized role in male infertility. More generally, the interaction models presented here are readily adaptable to the study of other phenotypes in which maternal and paternal allelic interactions are likely to be involved.

不孕是一种异质性表型,对许多夫妇来说,生育问题的原因仍然未知。一个研究不足的假设是,父母双方基因型之间的等位基因相互作用可能会影响不孕的风险。因此,我们的目的是研究如何使用从挪威母亲、父亲和儿童队列研究中选择的15789例妊娠的父母基因型数据来模拟等位基因相互作用。其中1304例新生儿是使用辅助生殖技术(ART)受孕的,其余为自然受孕。将抗逆转录病毒疗法作为不孕不育的替代品,在全基因组筛查中对同一基因座的母亲和父亲等位基因之间的相互作用效应进行了不同的参数化。其中一些模型在参数化方面更为相似,有些模型在全基因组范围内实施时产生了类似的结果。结果显示,与所研究表型相关的基因,如Dynein轴索重链17(DNAH17),在男性不育中具有公认的作用,具有近乎显著的相互作用效应。更普遍地说,本文提出的相互作用模型很容易适用于研究可能涉及母体和父系等位基因相互作用的其他表型。
{"title":"Statistical methods to detect mother–father genetic interaction effects on risk of infertility: A genome-wide approach","authors":"Siri N. Skodvin,&nbsp;Håkon K. Gjessing,&nbsp;Astanand Jugessur,&nbsp;Julia Romanowska,&nbsp;Christian M. Page,&nbsp;Elizabeth C. Corfield,&nbsp;Yunsung Lee,&nbsp;Siri E. Håberg,&nbsp;Miriam Gjerdevik","doi":"10.1002/gepi.22534","DOIUrl":"10.1002/gepi.22534","url":null,"abstract":"<p>Infertility is a heterogeneous phenotype, and for many couples, the causes of fertility problems remain unknown. One understudied hypothesis is that allelic interactions between the genotypes of the two parents may influence the risk of infertility. Our aim was, therefore, to investigate how allelic interactions can be modeled using parental genotype data linked to 15,789 pregnancies selected from the Norwegian Mother, Father, and Child Cohort Study. The newborns in 1304 of these pregnancies were conceived using assisted reproductive technologies (ART), and the remainder were conceived naturally. Treating the use of ART as a proxy for infertility, different parameterizations were implemented in a genome-wide screen for interaction effects between maternal and paternal alleles at the same locus. Some of the models were more similar in the way they were parameterized, and some produced similar results when implemented on a genome-wide scale. The results showed near-significant interaction effects in genes relevant to the phenotype under study, such as Dynein axonemal heavy chain 17 (<i>DNAH17</i>) with a recognized role in male infertility. More generally, the interaction models presented here are readily adaptable to the study of other phenotypes in which maternal and paternal allelic interactions are likely to be involved.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"503-519"},"PeriodicalIF":2.1,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22534","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10084980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data GWAS汇总数据中无效工具变量的因果代谢物网络推断。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-08-13 DOI: 10.1002/gepi.22535
Siyi Chen, Zhaotong Lin, Xiaotong Shen, Ling Li, Wei Pan

We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.

我们提出结构方程模型(sem)作为一般框架来推断代谢物和其他复杂性状的因果网络。传统上,sem仅在假设所有工具变量(IVs)都有效的情况下用于个人层面的数据。为了克服这些限制,我们提出了基于SEMs的单样本和双样本因果网络推理方法,它们可以:(1)进行因果分析并发现多个特征之间的因果关系;(2)考虑到可能存在的一些无效的IVs;(3)在没有个人水平数据时,允许仅使用全基因组关联研究(GWAS)汇总统计数据进行数据分析;(4)考虑性状之间存在双向关系的可能性。我们的方法采用简单的逐步选择来识别无效的IVs,从而避免假阳性,同时可能增加基于两阶段最小二乘法(2SLS)的真实发现。我们使用真实的GWAS数据和模拟数据来证明我们的方法优于标准的2SLS/ sem。对于真实的数据分析,我们提出的方法应用于人类血液代谢物GWAS汇总数据集,以揭示代谢物之间假定的因果关系;我们还发现了一些(假定的)导致阿尔茨海默病(AD)的代谢物,这些代谢物与推断的因果代谢物网络一起,提示了一些可能参与AD的代谢物途径。
{"title":"Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data","authors":"Siyi Chen,&nbsp;Zhaotong Lin,&nbsp;Xiaotong Shen,&nbsp;Ling Li,&nbsp;Wei Pan","doi":"10.1002/gepi.22535","DOIUrl":"10.1002/gepi.22535","url":null,"abstract":"<p>We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"585-599"},"PeriodicalIF":2.1,"publicationDate":"2023-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22535","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10158155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity analyses gain relevance by fixing parameters observable during the empirical analyses 敏感性分析通过确定经验分析中可观察到的参数来获得相关性
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-07-07 DOI: 10.1002/gepi.22530
Gibran Hemani, Apostolos Gkatzionis, Kate Tilling, George Davey Smith
<p>In 2017 we presented the MR Steiger method, a sensitivity analysis in Mendelian randomization (MR) for inferring causal directions between variables (Hemani et al., <span>2017</span>). We discussed many of its potential limitations including that unmeasured confounding under certain extreme circumstances could lead to the wrong inferred causal direction. Lutz et al. (<span>2022</span>) propose an R package (UCRMS) for performing sensitivity analysis of the MR Steiger method, and use it in an illustration to suggest that the MR Steiger method has a ~90% chance of giving the wrong answer due to unmeasured confounding. In this note we will show that an error in their approach to sensitivity analysis leads to the wrong conclusion about the validity of the MR Steiger test. We provide a valid alternative which uses the observed data to investigate sensitivity to unmeasured confounding.</p><p>A sensitivity analysis aims to understand the degree to which a result can change due to uncertainties in the inputs (Saltelli, <span>2002</span>). In this case for the MR Steiger test, we need to ask how sensitive is the inference of the causal direction between X and Y to possible values of unmeasured confounders influencing X and Y. Importantly, there is relative certainty in many of the parameters of this system because they are easily observed, for example, the variances of X, Y and the instrumental variables (IVs), the estimated effect of the IVs on X and Y, and therefore the IV estimate of the effect of X on Y. Often the ordinary least squares (OLS) association between X and Y is also available either due to the analysis being performed using individual level data, or by sourcing the estimate from other published results. Therefore, an appropriate sensitivity analysis must explore the extent to which the inferred causal direction between X and Y can change due to unmeasured confounding, without causing these observed parameters to change.</p><p>Lutz et al.'s proposed method does not attempt to fix all observable parameters. In the simple example provided by Lutz et al. the variance of Y varies between 28 and 39, and the OLS estimate varies between 1 and −1 across the parameter values used for the sensitivity analysis. This arises because the residual variance—which is unobserved—is fixed in their approach. Instead the phenotypic variance—which is observed—should be fixed. If they were presenting a simulation of the general performance of MR Steiger under unmeasured confounding then it would not matter that the simulated parameters are not tied to those observed in a particular empirical analysis. However in a sensitivity analysis, allowing observed parameters to vary provides no value to the analyst. To say that unmeasured confounding could reverse the causal direction, provided that the variance of Y also changes drastically, is of little use to the researcher who has a data set with an observed variance of Y. If some quantities are observed (i.e. the re
2017年,我们提出了MR Steiger方法,这是一种孟德尔随机化(MR)的敏感性分析,用于推断变量之间的因果方向(Hemani et al., 2017)。我们讨论了它的许多潜在局限性,包括在某些极端情况下无法测量的混淆可能导致错误的推断因果方向。Lutz等人(2022)提出了一个R包(UCRMS),用于对MR Steiger方法进行敏感性分析,并在一个插图中使用它来表明MR Steiger方法有90%的机会由于未测量的混杂而给出错误的答案。在本笔记中,我们将表明,在他们的方法敏感性分析的错误导致错误的结论关于MR Steiger测试的有效性。我们提供了一种有效的替代方法,它使用观察到的数据来研究对未测量混杂的敏感性。敏感性分析旨在了解由于输入中的不确定性而导致结果变化的程度(Saltelli, 2002)。在MR Steiger检验的这种情况下,我们需要问X和Y之间因果方向的推断对影响X和Y的未测量混杂因素的可能值有多敏感。重要的是,该系统的许多参数具有相对确定性,因为它们很容易观察到,例如,X、Y和工具变量(IVs)的方差,IVs对X和Y的估计影响,因此,X对Y的影响的IV估计。通常,X和Y之间的普通最小二乘(OLS)关联也可以通过使用个人水平数据进行分析或通过从其他已发表的结果中获取估计而获得。因此,适当的敏感性分析必须探讨在不引起这些观测参数变化的情况下,推断出的X和Y之间的因果方向在多大程度上可能由于未测量的混杂而发生变化。Lutz等人提出的方法并不试图固定所有可观察的参数。在Lutz等人提供的简单示例中,Y的方差在28到39之间变化,用于敏感性分析的参数值的OLS估计值在1到−1之间变化。这是因为残差——未被观察到的——在他们的方法中是固定的。相反,观察到的表型差异应该是固定的。如果他们在未测量的混杂下对MR Steiger的一般性能进行模拟,那么模拟参数与在特定经验分析中观察到的参数无关紧要。然而,在敏感性分析中,允许观察到的参数变化对分析人员没有任何价值。假设Y的方差也急剧变化,那么说未测量的混杂可以逆转因果方向,对于拥有观测方差为Y的数据集的研究人员来说是没有多大用处的。如果观察到一些数量(即Y对X的回归系数、仪器在X中解释的方差、X和Y的方差以及IV效应估计都观察到),只允许β uy ${beta}_{{uy}}$和β ux ${beta}_{{ux}}$变化并通过改变残差方差进行补偿,snp结果r2 ${R}^{2}$在任何β y ${beta}_{{y}}$和下都不会改变β ux ${beta}_{{ux}}$ parameters(支持信息说明)。简要介绍Lutz等人。 在分析中,他们指出,对于β xy =1$ {beta}_{{xy}}=1$的因果效应,β ux =−5$ {beta}_{{ux}}=-5$和β y ${beta有特定的未测量的混杂参数}_{{y}}$的取值范围在0到11之间。使用这些参数,他们建议MR Steiger方法有~90%的机会返回错误的因果方向。但是如果β ux ${beta}_{{ux}}$和β uy ${beta}_{{uy}}$被允许使用相同的值范围(例如:−11至11),那么Steiger方法只会在36%的混淆情况下返回不正确的结果。如果β ux ${beta}_{{ux}}$和β uy ${beta}_{{uy}}$的范围被限制为- 1到1,那么错误的结果只会出现在0.02%的场景中。在我们2017年的论文中(支持信息:注3)我们分析了更广泛的场景范围,以全面评估不可测量的混杂因素通常可能引入问题的程度,并得出结论,在大多数实际情况下,rxy 2 &lt;0.2$ {&lt; mpaddxmlns ="http://www.w3.org/1998/Math/MathML"&gt;R&lt;/mpadded&gt;}_{{xy}}^{2}lt 0.2$,未测量的混淆导致错误因果方向的机会非常小。如果分析人员有动机检查MR Steiger对未测量混杂的敏感性,则需要采用不同的方法,询问未测量混杂的哪些值支持对给定经验观察数量(X, Y和工具的方差,工具对X和Y的影响,以及X对Y的OLS估计)的推断因果方向。然后分析人员可以确定对其结论提出怀疑所需的混杂值是否合理。或者,可以确定可能的混杂参数空间的多少部分支持推断的因果方向。在补充说明中,我们提供了这个问题的分析解决方案。我们说明,在分析特定水平上,当混杂因素解释X和y中的大部分方差时,未测量的混杂因素逆转MR Steiger推断的因果方向的概率仅超过低概率。该方法包含在TwoSampleMR包中,并且速度很快,因为它使用封闭形式计算而不是UCRMS实现的随机模拟方法。
{"title":"Sensitivity analyses gain relevance by fixing parameters observable during the empirical analyses","authors":"Gibran Hemani,&nbsp;Apostolos Gkatzionis,&nbsp;Kate Tilling,&nbsp;George Davey Smith","doi":"10.1002/gepi.22530","DOIUrl":"10.1002/gepi.22530","url":null,"abstract":"&lt;p&gt;In 2017 we presented the MR Steiger method, a sensitivity analysis in Mendelian randomization (MR) for inferring causal directions between variables (Hemani et al., &lt;span&gt;2017&lt;/span&gt;). We discussed many of its potential limitations including that unmeasured confounding under certain extreme circumstances could lead to the wrong inferred causal direction. Lutz et al. (&lt;span&gt;2022&lt;/span&gt;) propose an R package (UCRMS) for performing sensitivity analysis of the MR Steiger method, and use it in an illustration to suggest that the MR Steiger method has a ~90% chance of giving the wrong answer due to unmeasured confounding. In this note we will show that an error in their approach to sensitivity analysis leads to the wrong conclusion about the validity of the MR Steiger test. We provide a valid alternative which uses the observed data to investigate sensitivity to unmeasured confounding.&lt;/p&gt;&lt;p&gt;A sensitivity analysis aims to understand the degree to which a result can change due to uncertainties in the inputs (Saltelli, &lt;span&gt;2002&lt;/span&gt;). In this case for the MR Steiger test, we need to ask how sensitive is the inference of the causal direction between X and Y to possible values of unmeasured confounders influencing X and Y. Importantly, there is relative certainty in many of the parameters of this system because they are easily observed, for example, the variances of X, Y and the instrumental variables (IVs), the estimated effect of the IVs on X and Y, and therefore the IV estimate of the effect of X on Y. Often the ordinary least squares (OLS) association between X and Y is also available either due to the analysis being performed using individual level data, or by sourcing the estimate from other published results. Therefore, an appropriate sensitivity analysis must explore the extent to which the inferred causal direction between X and Y can change due to unmeasured confounding, without causing these observed parameters to change.&lt;/p&gt;&lt;p&gt;Lutz et al.'s proposed method does not attempt to fix all observable parameters. In the simple example provided by Lutz et al. the variance of Y varies between 28 and 39, and the OLS estimate varies between 1 and −1 across the parameter values used for the sensitivity analysis. This arises because the residual variance—which is unobserved—is fixed in their approach. Instead the phenotypic variance—which is observed—should be fixed. If they were presenting a simulation of the general performance of MR Steiger under unmeasured confounding then it would not matter that the simulated parameters are not tied to those observed in a particular empirical analysis. However in a sensitivity analysis, allowing observed parameters to vary provides no value to the analyst. To say that unmeasured confounding could reverse the causal direction, provided that the variance of Y also changes drastically, is of little use to the researcher who has a data set with an observed variance of Y. If some quantities are observed (i.e. the re","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"461-462"},"PeriodicalIF":2.1,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22530","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10001501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of regmed and BayesNetty for exploring causal models with many variables regmed和贝叶斯网络在探索多变量因果模型方面的比较。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-06-27 DOI: 10.1002/gepi.22532
Richard Howey, Heather J. Cordell

Here we compare a recently proposed method and software package, regmed, with our own previously developed package, BayesNetty, designed to allow exploratory analysis of complex causal relationships between biological variables. We find that regmed generally has poorer recall but much better precision than BayesNetty. This is perhaps not too surprising as regmed is specifically designed for use with high-dimensional data. BayesNetty is found to be more sensitive to the resulting multiple testing problem encountered in these circumstances. However, as regmed is not designed to handle missing data, its performance is severely affected when missing data is present, whereas the performance of BayesNetty is only slightly affected. The performance of regmed can be rescued in this situation by first using BayesNetty to impute the missing data, and then applying regmed to the resulting “filled-in” data set.

在这里,我们将最近提出的方法和软件包regmed与我们之前开发的包BayesNetty进行了比较,该包旨在对生物变量之间的复杂因果关系进行探索性分析。我们发现regmed通常比BayesNetty具有较差的召回率,但精度要好得多。这也许并不太令人惊讶,因为regmed是专门为高维数据而设计的。BayesNetty被发现对在这些情况下遇到的多重测试问题更敏感。然而,由于regmed不是为处理丢失的数据而设计的,因此当存在丢失的数据时,其性能会受到严重影响,而BayesNetty的性能只会受到轻微影响。在这种情况下,可以通过首先使用BayesNetty来估算丢失的数据,然后将regmed应用于生成的“填充”数据集来挽救regmed的性能。
{"title":"Comparison of regmed and BayesNetty for exploring causal models with many variables","authors":"Richard Howey,&nbsp;Heather J. Cordell","doi":"10.1002/gepi.22532","DOIUrl":"10.1002/gepi.22532","url":null,"abstract":"<p>Here we compare a recently proposed method and software package, <span>regmed</span>, with our own previously developed package, BayesNetty, designed to allow exploratory analysis of complex causal relationships between biological variables. We find that \u0000<span>regmed</span> generally has poorer recall but much better precision than BayesNetty. This is perhaps not too surprising as \u0000<span>regmed</span> is specifically designed for use with high-dimensional data. BayesNetty is found to be more sensitive to the resulting multiple testing problem encountered in these circumstances. However, as \u0000<span>regmed</span> is not designed to handle missing data, its performance is severely affected when missing data is present, whereas the performance of BayesNetty is only slightly affected. The performance of \u0000<span>regmed</span> can be rescued in this situation by first using BayesNetty to impute the missing data, and then applying \u0000<span>regmed</span> to the resulting “filled-in” data set.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"496-502"},"PeriodicalIF":2.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22532","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9689871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A gene-based association test of interactions for maternal–fetal genotypes identifies genes associated with nonsyndromic congenital heart defects 一项基于基因的母婴基因型相互作用关联测试确定了与非综合征性先天性心脏缺陷相关的基因。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-06-21 DOI: 10.1002/gepi.22533
Manyan Huang, Chen Lyu, Nianjun Liu, Wendy N. Nembhard, John S. Witte, Charlotte A. Hobbs, Ming Li, the National Birth Defects Prevention Study

The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal–fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother–infant pairs and 1306 control mother–infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, TMEM107 (p = 1.64e−06) and CTC1 (p = 2.0e−06), were identified for significant association with CHD in common variants analysis. Gene TMEM107 regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene CTC1 plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of TMEM107 and CTC1 with CHDs.

先天性心脏缺陷(CHDs)的风险可能受到母体基因、胎儿基因及其相互作用的影响。现有的方法通常一次一个地测试母体和胎儿变异的影响,并且可能降低了检测具有较低次要等位基因频率的遗传变异的统计能力。在这篇文章中,我们提出了一种基于基因的母婴基因型相互作用关联测试(GATI-MFG),使用病例-母亲和对照-母亲设计。GATI-MFG可以整合基因或基因组区域内多种变体的影响,并评估母体和胎儿基因型的联合效应,同时考虑它们的相互作用。在模拟研究中,GATI-MFG比其他方法提高了统计能力,如在各种疾病情况下的单一变体测试和功能数据分析(FDA)。我们进一步将GATI-MFG应用于CHD的两阶段全基因组关联研究,以测试常见变异和罕见变异,使用来自国家出生缺陷预防研究(NBDPS)的947对CHD病例母婴对和1306对对照母婴对。在对23035个基因进行Bonferroni调整后,17号染色体TMEM107上的两个基因(p = 1.64e-06)和CTC1(p = 2.0e-06)在常见变异分析中被鉴定为与CHD显著相关。TMEM107基因调节纤毛生成和纤毛蛋白组成,并被发现与异位相关。CTC1基因在保护端粒免受降解方面发挥着重要作用,这被认为与心脏发生有关。总体而言,GATI-MFG在模拟中优于单一变体测试和美国食品药品监督管理局,应用于NBDPS样本的结果与支持TMEM107和CTC1与CHDs关联的现有文献一致。
{"title":"A gene-based association test of interactions for maternal–fetal genotypes identifies genes associated with nonsyndromic congenital heart defects","authors":"Manyan Huang,&nbsp;Chen Lyu,&nbsp;Nianjun Liu,&nbsp;Wendy N. Nembhard,&nbsp;John S. Witte,&nbsp;Charlotte A. Hobbs,&nbsp;Ming Li,&nbsp;the National Birth Defects Prevention Study","doi":"10.1002/gepi.22533","DOIUrl":"10.1002/gepi.22533","url":null,"abstract":"<p>The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal–fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother–infant pairs and 1306 control mother–infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, <i>TMEM107</i> (<i>p</i> = 1.64e−06) and <i>CTC1</i> (<i>p</i> = 2.0e−06), were identified for significant association with CHD in common variants analysis. Gene <i>TMEM107</i> regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene <i>CTC1</i> plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of <i>TMEM107</i> and <i>CTC1</i> with CHDs.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"475-495"},"PeriodicalIF":2.1,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22533","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9669966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phenotypic variance partitioning by transcriptomic gene expression levels and environmental variables for anthropometric traits using GTEx data 使用GTEx数据通过转录组基因表达水平和人体测量特征的环境变量进行表型方差划分。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-06-15 DOI: 10.1002/gepi.22531
Pastor Jullian Fabres, S. Hong Lee

Phenotypic variation in human is the results of genetic variation and environmental influences. Understanding the contribution of genetic and environmental components to phenotypic variation is of great interest. The variance explained by genome-wide single nucleotide polymorphisms (SNPs) typically represents a small proportion of the phenotypic variance for complex traits, which may be because the genome is only a part of the whole biological process to shape the phenotypes. In this study, we propose to partition the phenotypic variance of three anthropometric traits, using gene expression levels and environmental variables from GTEx data. We use the gene expression of four tissues that are deemed relevant for the anthropometric traits (two adipose tissues, skeletal muscle tissue and blood tissue). Additionally, we estimate the transcriptome–environment correlation that partly underlies the phenotypes of the anthropometric traits. We found that genetic factors play a significant role in determining body mass index (BMI), with the proportion of phenotypic variance explained by gene expression levels of visceral adipose tissue being 0.68 (SE = 0.06). However, we also observed that environmental factors such as age, sex, ancestry, smoking status, and drinking alcohol status have a small but significant impact (0.005, SE = 0.001). Interestingly, we found a significant negative correlation between the transcriptomic and environmental effects on BMI (transcriptome–environment correlation = −0.54, SE = 0.14), suggesting an antagonistic relationship. This implies that individuals with lower genetic profiles may be more susceptible to the effects of environmental factors on BMI, while those with higher genetic profiles may be less susceptible. We also show that the estimated transcriptomic variance varies across tissues, e.g., the gene expression levels of whole blood tissue and environmental variables explain a lower proportion of BMI phenotypic variance (0.16, SE = 0.05 and 0.04, SE = 0.004 respectively). We observed a significant positive correlation between transcriptomic and environmental effects (1.21, SE = 0.23) for this tissue. In conclusion, phenotypic variance partitioning can be done using gene expression and environmental data even with a small sample size (n = 838 from GTEx data), which can provide insights into how the transcriptomic and environmental effects contribute to the phenotypes of the anthropometric traits.

人类表型变异是遗传变异和环境影响的结果。了解遗传和环境成分对表型变异的贡献具有重要意义。全基因组单核苷酸多态性(SNPs)解释的变异通常代表复杂性状表型变异的一小部分,这可能是因为基因组只是形成表型的整个生物过程的一部分。在这项研究中,我们建议使用GTEx数据中的基因表达水平和环境变量来划分三个人体测量特征的表型方差。我们使用了四种被认为与人体测量特征相关的组织(两种脂肪组织、骨骼肌组织和血液组织)的基因表达。此外,我们估计了转录组与环境的相关性,这在一定程度上是人体测量特征表型的基础。我们发现遗传因素在决定体重指数(BMI)中起着重要作用,内脏脂肪组织基因表达水平解释的表型变异比例为0.68(SE = 0.06)。然而,我们还观察到,年龄、性别、祖先、吸烟状况和饮酒状况等环境因素的影响较小但显著(0.005,SE = 0.001)。有趣的是,我们发现转录组和环境对BMI的影响之间存在显著的负相关(转录组-环境相关性 = -0.54,SE = 0.14),表明存在拮抗关系。这意味着,基因图谱较低的个体可能更容易受到环境因素对BMI的影响,而基因图谱较高的个体可能不太容易受到影响。我们还表明,估计的转录组变异在不同组织中有所不同,例如,全血组织的基因表达水平和环境变量解释了BMI表型变异的较低比例(0.16,SE = 0.05和0.04,SE = 0.004)。我们观察到转录组和环境效应之间存在显著的正相关(1.21,SE = 0.23)。总之,即使样本量很小(n = 来自GTEx数据的838),其可以深入了解转录组学和环境效应如何对人体测量特征的表型做出贡献。
{"title":"Phenotypic variance partitioning by transcriptomic gene expression levels and environmental variables for anthropometric traits using GTEx data","authors":"Pastor Jullian Fabres,&nbsp;S. Hong Lee","doi":"10.1002/gepi.22531","DOIUrl":"10.1002/gepi.22531","url":null,"abstract":"<p>Phenotypic variation in human is the results of genetic variation and environmental influences. Understanding the contribution of genetic and environmental components to phenotypic variation is of great interest. The variance explained by genome-wide single nucleotide polymorphisms (SNPs) typically represents a small proportion of the phenotypic variance for complex traits, which may be because the genome is only a part of the whole biological process to shape the phenotypes. In this study, we propose to partition the phenotypic variance of three anthropometric traits, using gene expression levels and environmental variables from GTEx data. We use the gene expression of four tissues that are deemed relevant for the anthropometric traits (two adipose tissues, skeletal muscle tissue and blood tissue). Additionally, we estimate the transcriptome–environment correlation that partly underlies the phenotypes of the anthropometric traits. We found that genetic factors play a significant role in determining body mass index (BMI), with the proportion of phenotypic variance explained by gene expression levels of visceral adipose tissue being 0.68 (SE = 0.06). However, we also observed that environmental factors such as age, sex, ancestry, smoking status, and drinking alcohol status have a small but significant impact (0.005, SE = 0.001). Interestingly, we found a significant negative correlation between the transcriptomic and environmental effects on BMI (transcriptome–environment correlation = −0.54, SE = 0.14), suggesting an antagonistic relationship. This implies that individuals with lower genetic profiles may be more susceptible to the effects of environmental factors on BMI, while those with higher genetic profiles may be less susceptible. We also show that the estimated transcriptomic variance varies across tissues, e.g., the gene expression levels of whole blood tissue and environmental variables explain a lower proportion of BMI phenotypic variance (0.16, SE = 0.05 and 0.04, SE = 0.004 respectively). We observed a significant positive correlation between transcriptomic and environmental effects (1.21, SE = 0.23) for this tissue. In conclusion, phenotypic variance partitioning can be done using gene expression and environmental data even with a small sample size (<i>n</i> = 838 from GTEx data), which can provide insights into how the transcriptomic and environmental effects contribute to the phenotypes of the anthropometric traits.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"465-474"},"PeriodicalIF":2.1,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22531","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9687294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ravages: An R package for the simulation and analysis of rare variants in multicategory phenotypes Ravages:一个R软件包,用于模拟和分析多类别表型中的罕见变异
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-05-09 DOI: 10.1002/gepi.22529
Ozvan Bocher, Gaëlle Marenne, Emmanuelle Génin, Hervé Perdry

Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.

目前用于分析和模拟稀有变异的软件包仅适用于二进制和连续特征。Ravages在单个R包中提供解决方案,用于执行多类别、二元和连续表型的罕见变异关联测试,模拟不同场景下的数据集,并计算统计功率。关联测试可以在整个基因组中运行,这要归功于c++实现的大部分功能,使用RAVA-FIRST(一种最近开发的过滤和分析全基因组罕见变异的策略)或用户定义的候选区域。Ravages还包括一个模拟模块,该模块可以生成病例的遗传数据,这些病例可以分层为几个子组和对照组。通过与现有程序的比较,我们表明Ravages补充了现有工具,并将有助于研究复杂疾病的遗传结构。Ravages可以在CRAN (https://cran.r-project.org/web/packages/Ravages/)上获得,并在Github (https://github.com/genostats/Ravages)上维护。
{"title":"Ravages: An R package for the simulation and analysis of rare variants in multicategory phenotypes","authors":"Ozvan Bocher,&nbsp;Gaëlle Marenne,&nbsp;Emmanuelle Génin,&nbsp;Hervé Perdry","doi":"10.1002/gepi.22529","DOIUrl":"10.1002/gepi.22529","url":null,"abstract":"<p>Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"450-460"},"PeriodicalIF":2.1,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22529","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10385156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Brief History behind the journal Genetic Epidemiology and the International Genetic Epidemiology Society 遗传流行病学和国际遗传流行病学学会杂志背后的简史
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-05-05 DOI: 10.1002/gepi.22528
Dabeeru C. Rao

This commentary briefly describes the process and steps that underlie the launching of the journal Genetic Epidemiology in 1984 and the International Genetic Epidemiology Society (IGES, to be pronounced as “I guess”) in 1992.

这篇评论简要描述了1984年创办《遗传流行病学》杂志和1992年创办国际遗传流行病学学会(IGES,发音为“我想”)的过程和步骤。
{"title":"A Brief History behind the journal Genetic Epidemiology and the International Genetic Epidemiology Society","authors":"Dabeeru C. Rao","doi":"10.1002/gepi.22528","DOIUrl":"10.1002/gepi.22528","url":null,"abstract":"<p>This commentary briefly describes the process and steps that underlie the launching of the journal Genetic Epidemiology in 1984 and the International Genetic Epidemiology Society (IGES, to be pronounced as “I guess”) in 1992.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 5","pages":"361-364"},"PeriodicalIF":2.1,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22528","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9673429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gene-level association analysis of bivariate ordinal traits with functional regressions 双变量有序性状的基因水平关联分析及功能回归
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-04-26 DOI: 10.1002/gepi.22524
Shuqi Wang, Chi-Yang Chiu, Alexander F. Wilson, Joan E. Bailey-Wilson, Elvira Agron, Emily Y. Chew, Jaeil Ahn, Momiao Xiong, Ruzong Fan

In genetic studies, many phenotypes have multiple naturally ordered discrete values. The phenotypes can be correlated with each other. If multiple correlated ordinal traits are analyzed simultaneously, the power of analysis may increase significantly while the false positives can be controlled well. In this study, we propose bivariate functional ordinal linear regression (BFOLR) models using latent regressions with cumulative logit link or probit link to perform a gene-based analysis for bivariate ordinal traits and sequencing data. In the proposed BFOLR models, genetic variant data are viewed as stochastic functions of physical positions, and the genetic effects are treated as a function of physical positions. The BFOLR models take the correlation of the two ordinal traits into account via latent variables. The BFOLR models are built upon functional data analysis which can be revised to analyze the bivariate ordinal traits and high-dimension genetic data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Extensive simulation studies show that the likelihood ratio tests of the BFOLR models control type I errors well and have good power performance. The BFOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes, CFH and ARMS2, are found to strongly associate with eye drusen size, drusen area, age-related macular degeneration (AMD) categories, and AMD severity scale.

在遗传学研究中,许多表型具有多个自然有序的离散值。表型可以相互关联。如果同时分析多个相关的有序特征,可以显著提高分析能力,同时可以很好地控制假阳性。在这项研究中,我们提出了二元功能有序线性回归(BFOLR)模型,使用具有累积logit链接或probit链接的潜在回归对二元有序性状和测序数据进行基于基因的分析。在提出的BFOLR模型中,遗传变异数据被视为物理位置的随机函数,遗传效应被视为物理位置的函数。BFOLR模型通过潜在变量考虑了两个有序性状之间的相关性。BFOLR模型建立在功能数据分析的基础上,可用于分析二元有序性状和高维遗传数据。该方法灵活,可以分析三种类型的遗传数据:(1)仅罕见变异,(2)仅常见变异,(3)罕见和常见变异的组合。大量的仿真研究表明,BFOLR模型的似然比检验能很好地控制I类误差,具有良好的功率性能。BFOLR模型用于分析年龄相关性眼病研究数据,其中发现两个基因CFH和ARMS2与眼膜大小、眼膜面积、年龄相关性黄斑变性(AMD)类别和AMD严重程度密切相关。
{"title":"Gene-level association analysis of bivariate ordinal traits with functional regressions","authors":"Shuqi Wang,&nbsp;Chi-Yang Chiu,&nbsp;Alexander F. Wilson,&nbsp;Joan E. Bailey-Wilson,&nbsp;Elvira Agron,&nbsp;Emily Y. Chew,&nbsp;Jaeil Ahn,&nbsp;Momiao Xiong,&nbsp;Ruzong Fan","doi":"10.1002/gepi.22524","DOIUrl":"10.1002/gepi.22524","url":null,"abstract":"<p>In genetic studies, many phenotypes have multiple naturally ordered discrete values. The phenotypes can be correlated with each other. If multiple correlated ordinal traits are analyzed simultaneously, the power of analysis may increase significantly while the false positives can be controlled well. In this study, we propose bivariate functional ordinal linear regression (BFOLR) models using latent regressions with cumulative logit link or probit link to perform a gene-based analysis for bivariate ordinal traits and sequencing data. In the proposed BFOLR models, genetic variant data are viewed as stochastic functions of physical positions, and the genetic effects are treated as a function of physical positions. The BFOLR models take the correlation of the two ordinal traits into account via latent variables. The BFOLR models are built upon functional data analysis which can be revised to analyze the bivariate ordinal traits and high-dimension genetic data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Extensive simulation studies show that the likelihood ratio tests of the BFOLR models control type I errors well and have good power performance. The BFOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes, CFH and ARMS2, are found to strongly associate with eye drusen size, drusen area, age-related macular degeneration (AMD) categories, and AMD severity scale.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"409-431"},"PeriodicalIF":2.1,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10065139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Genetic Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1