首页 > 最新文献

Genetic Epidemiology最新文献

英文 中文
Correction to “Abstracts” 对“摘要”的更正
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-11-09 DOI: 10.1002/gepi.22543

(2023), Abstracts. Genetic Epidemiology, 47: 520–581. https://doi.org/10.1002/gepi.22539

In the originally published Abstracts, there were authors missing for “Two-sample Mendelian Randomization Study of Circulating Metabolites and Prostate Cancer Risk in Hispanic Populations” (abstract 49). The correct authors and affiliations appear below and have been updated on the online version of the abstracts.

Harriett Fuller1, Rebecca Rohde2, Heather Highland2, Jiayi Shen3, Bing Yu4, Eric Boerwinkle4, Megan Grove4, Kari E. North2, David V. Conti3, Christopher A. Haiman3, Kristin Young2, Burcu F. Darst1

1Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA

2Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA

3Department of Population and Public Health Sciences, Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, California, USA

4School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA

We apologize for this error.

(2023),摘要。中华流行病学杂志,47(2):521 - 521。https://doi.org/10.1002/gepi.22539In在最初发表的摘要中,“西班牙裔人群循环代谢物和前列腺癌风险的两样本孟德尔随机化研究”(摘要49)缺少作者。正确的作者和所属机构如下所示,并已在摘要的在线版本上更新。Harriett Fuller1, Rebecca Rohde2, Heather Highland2, shenjiayi 3, yubing 4, Eric Boerwinkle4, Megan Grove4, Kari E. North2, David V. Conti3, Christopher A. Haiman3, Kristin Young2, Burcu F. darst11公共卫生科学部,Fred Hutchinson癌症中心,西雅图,华盛顿州,usa2,北卡罗来纳大学教堂山分校,教堂山,北卡罗来纳州,usa2人口与公共卫生科学部,遗传流行病学中心,南加州大学凯克医学院、美国公共卫生学院、德克萨斯大学休斯敦健康科学中心、美国德克萨斯州休斯敦我们为这个错误道歉。
{"title":"Correction to “Abstracts”","authors":"","doi":"10.1002/gepi.22543","DOIUrl":"10.1002/gepi.22543","url":null,"abstract":"<p>(2023), Abstracts. Genetic Epidemiology, 47: 520–581. https://doi.org/10.1002/gepi.22539</p><p>In the originally published Abstracts, there were authors missing for “Two-sample Mendelian Randomization Study of Circulating Metabolites and Prostate Cancer Risk in Hispanic Populations” (abstract 49). The correct authors and affiliations appear below and have been updated on the online version of the abstracts.</p><p>Harriett Fuller<sup>1</sup>, Rebecca Rohde<sup>2</sup>, Heather Highland<sup>2</sup>, Jiayi Shen<sup>3</sup>, Bing Yu<sup>4</sup>, Eric Boerwinkle<sup>4</sup>, Megan Grove<sup>4</sup>, Kari E. North<sup>2</sup>, David V. Conti<sup>3</sup>, Christopher A. Haiman<sup>3</sup>, Kristin Young<sup>2</sup>, Burcu F. Darst<sup>1</sup></p><p><sup>1</sup>Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA</p><p><sup>2</sup>Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA</p><p><sup>3</sup>Department of Population and Public Health Sciences, Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, California, USA</p><p><sup>4</sup>School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA</p><p>We apologize for this error.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"642"},"PeriodicalIF":2.1,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22543","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135241247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Haplotype reconstruction for genetically complex regions with ambiguous genotype calls: Illustration by the KIR gene region 具有模糊基因型调用的遗传复杂区域的单倍型重建:KIR基因区域的说明。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-10-13 DOI: 10.1002/gepi.22538
Lars L. J. van der Burg, Liesbeth C. de Wreede, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Hein Putter, Stefan Böhringer

Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (KIR) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation–maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of KIR genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the KIR gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.

DNA测序技术的进步使得能够对表现出拷贝数变异和高等位基因多样性的复杂遗传区域进行基因分型,但不可能在所有情况下都得出确切的基因型,这往往导致基因型调用不明确,即部分缺失数据。这种基因区域的一个例子是杀伤细胞免疫球蛋白样受体(KIR)基因。这些基因在异基因造血干细胞移植中具有特殊的意义。对于这样复杂的基因区域,目前的单倍型重建方法是不可行的,因为它们无法应对数据的复杂性。我们提出了一种期望最大化(EM)算法来估计单倍型频率(HTF),该算法处理缺失的数据成分,并考虑基因之间的连锁不平衡(LD)。为了应对基因添加后单倍型数量的指数增长,我们在标准EM算法实现中添加了三个组件。首先,重复进行重建,一次添加一个基因。其次,在每一步之后,频率低于阈值的单倍型在一个罕见的单倍型组中崩溃。第三,在随后的迭代中对罕见单倍型组的HTF进行了分析,以改进估计。一项模拟研究评估了组合多个基因的信息对这些频率估计的影响。我们证明估计的传热函数是近似无偏的。我们的模拟研究表明,当LD高时,EM算法能够组合来自多个基因的信息,而模糊度的增加会增加偏差。基于该EM的线性回归模型表明,大量单倍型对于无偏效应大小估计可能存在问题,并且模型需要稀疏。在KIR基因型的真实数据分析中,我们将HTFs与独立研究中获得的HTFs进行了比较。我们新的基于EM算法的方法是第一个考虑复杂基因区域(如KIR基因区域)的完整遗传结构的方法。该算法可以处理大量观察到的模糊性,并允许单倍型的折叠来执行隐式降维。结合来自多个基因的信息可以改善单倍型的重建。
{"title":"Haplotype reconstruction for genetically complex regions with ambiguous genotype calls: Illustration by the KIR gene region","authors":"Lars L. J. van der Burg,&nbsp;Liesbeth C. de Wreede,&nbsp;Henning Baldauf,&nbsp;Jürgen Sauter,&nbsp;Johannes Schetelig,&nbsp;Hein Putter,&nbsp;Stefan Böhringer","doi":"10.1002/gepi.22538","DOIUrl":"10.1002/gepi.22538","url":null,"abstract":"<p>Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (<i>KIR</i>) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation–maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of <i>KIR</i> genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the <i>KIR</i> gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 1","pages":"3-26"},"PeriodicalIF":2.1,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22538","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41198906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers 用于人类癌症体细胞突变和种系变异之间关联研究的数据适应性和基于途径的测试。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-10-11 DOI: 10.1002/gepi.22537
Zhongyuan Chen, Han Liang, Peng Wei

Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case–control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels. p $p$-values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.

癌症是一种由遗传基因变异和体细胞突变共同驱动的疾病。最近可获得的癌症基因组的大规模测序数据为研究它们之间的相互作用提供了前所未有的机会。然而,以前关于这一主题的研究受到简单、低统计幂检验(如Fisher精确检验)的限制。在本文中,我们设计了基于得分统计的数据自适应和基于路径的测试,用于体细胞突变和种系变异之间的关联研究。先前的研究表明,在一项病例对照研究中,两种基于单核苷酸多态性(SNP)集的关联测试,即自适应总分(aSPU)和基于数据自适应通路(aSUPath)的测试,提高了与单一疾病特征的全基因组关联研究(GWAS)的能力。我们将aSPU和aSUPath扩展到多性状,即队列研究中多个基因的体细胞突变,允许在SNP和基因水平上进行广泛的信息聚合。p$p$-假设不同的遗传结构,将来自不同参数的值组合起来,以产生体细胞突变和种系变异的数据适应性测试。广泛的模拟表明,与一些常用的方法相比,我们的数据适应性体细胞突变/种系变异测试可以应用于多个种系SNPs/基因/途径,并且通常具有更高的统计能力,同时保持适当的I型误差。拟议的测试应用于一个由2583名受试者组成的大型现实世界国际癌症基因组联合会全基因组测序数据集,在基因和途径水平上检测到与其他现有方法相比更显著和生物学相关的关联。我们的研究系统地确定了不同癌症类型的各种种系变异和体细胞突变之间的关联,这可能为癌症风险预测、预后和治疗提供有价值的实用性。
{"title":"Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers","authors":"Zhongyuan Chen,&nbsp;Han Liang,&nbsp;Peng Wei","doi":"10.1002/gepi.22537","DOIUrl":"10.1002/gepi.22537","url":null,"abstract":"<p>Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case–control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels. <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 </mrow>\u0000 <annotation> $p$</annotation>\u0000 </semantics></math>-values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"617-636"},"PeriodicalIF":2.1,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41198905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets ioSearch:一种使用新算法识别相互作用的多组学生物标志物的方法,应用于癌症数据集。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-10-05 DOI: 10.1002/gepi.22536
Sarmistha Das, Deo Kumar Srivastava

Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of p values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.

通过将多种组学整合在一起来鉴定生物标志物是很重要的,因为复杂的疾病是由于各种遗传物质的复杂相互作用而发生的。由于多重测试负担,传统的单组学关联测试既没有探索这种关键的组间依赖性,也没有识别出中等弱的信号。相反,多组学数据集成提供了互补的信息,但会增加多重测试负担、不同组学特征固有的数据多样性、高维度等。大多数可用的方法使用降维技术来解决亚型分类问题,以避免样本量问题,但相互作用的多组学生物标志物识别方法不可用。我们提出了一个两步模型,首先使用逻辑回归研究表型-组学关联。然后,使用稀疏主成分选择疾病相关组学,该主成分在多变量多元回归框架中从两个组学中探索多个变量的相互关系。在这个模型的基础上,我们开发了一种多组学生物标志物识别算法,即相互作用组学搜索(ioSearch),该算法通过使用通路信息来联合测试多个组学与疾病以及组学之间关联的影响,从而减少多重测试负担。此外,根据p值进行推断可能使其成为一种易于解释的生物标志物识别工具。广泛的模拟表明ioSearch在统计上是强大的,具有可控的I型错误率。它在公开的癌症数据集中的应用确定了重要途径中的相关组学特征。
{"title":"ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets","authors":"Sarmistha Das,&nbsp;Deo Kumar Srivastava","doi":"10.1002/gepi.22536","DOIUrl":"10.1002/gepi.22536","url":null,"abstract":"<p>Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of <i>p</i> values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"600-616"},"PeriodicalIF":2.1,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41108946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical methods to detect mother–father genetic interaction effects on risk of infertility: A genome-wide approach 检测父母遗传相互作用对不孕风险影响的统计方法:全基因组方法。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-08-28 DOI: 10.1002/gepi.22534
Siri N. Skodvin, Håkon K. Gjessing, Astanand Jugessur, Julia Romanowska, Christian M. Page, Elizabeth C. Corfield, Yunsung Lee, Siri E. Håberg, Miriam Gjerdevik

Infertility is a heterogeneous phenotype, and for many couples, the causes of fertility problems remain unknown. One understudied hypothesis is that allelic interactions between the genotypes of the two parents may influence the risk of infertility. Our aim was, therefore, to investigate how allelic interactions can be modeled using parental genotype data linked to 15,789 pregnancies selected from the Norwegian Mother, Father, and Child Cohort Study. The newborns in 1304 of these pregnancies were conceived using assisted reproductive technologies (ART), and the remainder were conceived naturally. Treating the use of ART as a proxy for infertility, different parameterizations were implemented in a genome-wide screen for interaction effects between maternal and paternal alleles at the same locus. Some of the models were more similar in the way they were parameterized, and some produced similar results when implemented on a genome-wide scale. The results showed near-significant interaction effects in genes relevant to the phenotype under study, such as Dynein axonemal heavy chain 17 (DNAH17) with a recognized role in male infertility. More generally, the interaction models presented here are readily adaptable to the study of other phenotypes in which maternal and paternal allelic interactions are likely to be involved.

不孕是一种异质性表型,对许多夫妇来说,生育问题的原因仍然未知。一个研究不足的假设是,父母双方基因型之间的等位基因相互作用可能会影响不孕的风险。因此,我们的目的是研究如何使用从挪威母亲、父亲和儿童队列研究中选择的15789例妊娠的父母基因型数据来模拟等位基因相互作用。其中1304例新生儿是使用辅助生殖技术(ART)受孕的,其余为自然受孕。将抗逆转录病毒疗法作为不孕不育的替代品,在全基因组筛查中对同一基因座的母亲和父亲等位基因之间的相互作用效应进行了不同的参数化。其中一些模型在参数化方面更为相似,有些模型在全基因组范围内实施时产生了类似的结果。结果显示,与所研究表型相关的基因,如Dynein轴索重链17(DNAH17),在男性不育中具有公认的作用,具有近乎显著的相互作用效应。更普遍地说,本文提出的相互作用模型很容易适用于研究可能涉及母体和父系等位基因相互作用的其他表型。
{"title":"Statistical methods to detect mother–father genetic interaction effects on risk of infertility: A genome-wide approach","authors":"Siri N. Skodvin,&nbsp;Håkon K. Gjessing,&nbsp;Astanand Jugessur,&nbsp;Julia Romanowska,&nbsp;Christian M. Page,&nbsp;Elizabeth C. Corfield,&nbsp;Yunsung Lee,&nbsp;Siri E. Håberg,&nbsp;Miriam Gjerdevik","doi":"10.1002/gepi.22534","DOIUrl":"10.1002/gepi.22534","url":null,"abstract":"<p>Infertility is a heterogeneous phenotype, and for many couples, the causes of fertility problems remain unknown. One understudied hypothesis is that allelic interactions between the genotypes of the two parents may influence the risk of infertility. Our aim was, therefore, to investigate how allelic interactions can be modeled using parental genotype data linked to 15,789 pregnancies selected from the Norwegian Mother, Father, and Child Cohort Study. The newborns in 1304 of these pregnancies were conceived using assisted reproductive technologies (ART), and the remainder were conceived naturally. Treating the use of ART as a proxy for infertility, different parameterizations were implemented in a genome-wide screen for interaction effects between maternal and paternal alleles at the same locus. Some of the models were more similar in the way they were parameterized, and some produced similar results when implemented on a genome-wide scale. The results showed near-significant interaction effects in genes relevant to the phenotype under study, such as Dynein axonemal heavy chain 17 (<i>DNAH17</i>) with a recognized role in male infertility. More generally, the interaction models presented here are readily adaptable to the study of other phenotypes in which maternal and paternal allelic interactions are likely to be involved.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"503-519"},"PeriodicalIF":2.1,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22534","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10084980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data GWAS汇总数据中无效工具变量的因果代谢物网络推断。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-08-13 DOI: 10.1002/gepi.22535
Siyi Chen, Zhaotong Lin, Xiaotong Shen, Ling Li, Wei Pan

We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.

我们提出结构方程模型(sem)作为一般框架来推断代谢物和其他复杂性状的因果网络。传统上,sem仅在假设所有工具变量(IVs)都有效的情况下用于个人层面的数据。为了克服这些限制,我们提出了基于SEMs的单样本和双样本因果网络推理方法,它们可以:(1)进行因果分析并发现多个特征之间的因果关系;(2)考虑到可能存在的一些无效的IVs;(3)在没有个人水平数据时,允许仅使用全基因组关联研究(GWAS)汇总统计数据进行数据分析;(4)考虑性状之间存在双向关系的可能性。我们的方法采用简单的逐步选择来识别无效的IVs,从而避免假阳性,同时可能增加基于两阶段最小二乘法(2SLS)的真实发现。我们使用真实的GWAS数据和模拟数据来证明我们的方法优于标准的2SLS/ sem。对于真实的数据分析,我们提出的方法应用于人类血液代谢物GWAS汇总数据集,以揭示代谢物之间假定的因果关系;我们还发现了一些(假定的)导致阿尔茨海默病(AD)的代谢物,这些代谢物与推断的因果代谢物网络一起,提示了一些可能参与AD的代谢物途径。
{"title":"Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data","authors":"Siyi Chen,&nbsp;Zhaotong Lin,&nbsp;Xiaotong Shen,&nbsp;Ling Li,&nbsp;Wei Pan","doi":"10.1002/gepi.22535","DOIUrl":"10.1002/gepi.22535","url":null,"abstract":"<p>We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"585-599"},"PeriodicalIF":2.1,"publicationDate":"2023-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22535","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10158155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity analyses gain relevance by fixing parameters observable during the empirical analyses 敏感性分析通过确定经验分析中可观察到的参数来获得相关性
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-07-07 DOI: 10.1002/gepi.22530
Gibran Hemani, Apostolos Gkatzionis, Kate Tilling, George Davey Smith
<p>In 2017 we presented the MR Steiger method, a sensitivity analysis in Mendelian randomization (MR) for inferring causal directions between variables (Hemani et al., <span>2017</span>). We discussed many of its potential limitations including that unmeasured confounding under certain extreme circumstances could lead to the wrong inferred causal direction. Lutz et al. (<span>2022</span>) propose an R package (UCRMS) for performing sensitivity analysis of the MR Steiger method, and use it in an illustration to suggest that the MR Steiger method has a ~90% chance of giving the wrong answer due to unmeasured confounding. In this note we will show that an error in their approach to sensitivity analysis leads to the wrong conclusion about the validity of the MR Steiger test. We provide a valid alternative which uses the observed data to investigate sensitivity to unmeasured confounding.</p><p>A sensitivity analysis aims to understand the degree to which a result can change due to uncertainties in the inputs (Saltelli, <span>2002</span>). In this case for the MR Steiger test, we need to ask how sensitive is the inference of the causal direction between X and Y to possible values of unmeasured confounders influencing X and Y. Importantly, there is relative certainty in many of the parameters of this system because they are easily observed, for example, the variances of X, Y and the instrumental variables (IVs), the estimated effect of the IVs on X and Y, and therefore the IV estimate of the effect of X on Y. Often the ordinary least squares (OLS) association between X and Y is also available either due to the analysis being performed using individual level data, or by sourcing the estimate from other published results. Therefore, an appropriate sensitivity analysis must explore the extent to which the inferred causal direction between X and Y can change due to unmeasured confounding, without causing these observed parameters to change.</p><p>Lutz et al.'s proposed method does not attempt to fix all observable parameters. In the simple example provided by Lutz et al. the variance of Y varies between 28 and 39, and the OLS estimate varies between 1 and −1 across the parameter values used for the sensitivity analysis. This arises because the residual variance—which is unobserved—is fixed in their approach. Instead the phenotypic variance—which is observed—should be fixed. If they were presenting a simulation of the general performance of MR Steiger under unmeasured confounding then it would not matter that the simulated parameters are not tied to those observed in a particular empirical analysis. However in a sensitivity analysis, allowing observed parameters to vary provides no value to the analyst. To say that unmeasured confounding could reverse the causal direction, provided that the variance of Y also changes drastically, is of little use to the researcher who has a data set with an observed variance of Y. If some quantities are observed (i.e. the re
2017年,我们提出了MR Steiger方法,这是一种孟德尔随机化(MR)的敏感性分析,用于推断变量之间的因果方向(Hemani et al., 2017)。我们讨论了它的许多潜在局限性,包括在某些极端情况下无法测量的混淆可能导致错误的推断因果方向。Lutz等人(2022)提出了一个R包(UCRMS),用于对MR Steiger方法进行敏感性分析,并在一个插图中使用它来表明MR Steiger方法有90%的机会由于未测量的混杂而给出错误的答案。在本笔记中,我们将表明,在他们的方法敏感性分析的错误导致错误的结论关于MR Steiger测试的有效性。我们提供了一种有效的替代方法,它使用观察到的数据来研究对未测量混杂的敏感性。敏感性分析旨在了解由于输入中的不确定性而导致结果变化的程度(Saltelli, 2002)。在MR Steiger检验的这种情况下,我们需要问X和Y之间因果方向的推断对影响X和Y的未测量混杂因素的可能值有多敏感。重要的是,该系统的许多参数具有相对确定性,因为它们很容易观察到,例如,X、Y和工具变量(IVs)的方差,IVs对X和Y的估计影响,因此,X对Y的影响的IV估计。通常,X和Y之间的普通最小二乘(OLS)关联也可以通过使用个人水平数据进行分析或通过从其他已发表的结果中获取估计而获得。因此,适当的敏感性分析必须探讨在不引起这些观测参数变化的情况下,推断出的X和Y之间的因果方向在多大程度上可能由于未测量的混杂而发生变化。Lutz等人提出的方法并不试图固定所有可观察的参数。在Lutz等人提供的简单示例中,Y的方差在28到39之间变化,用于敏感性分析的参数值的OLS估计值在1到−1之间变化。这是因为残差——未被观察到的——在他们的方法中是固定的。相反,观察到的表型差异应该是固定的。如果他们在未测量的混杂下对MR Steiger的一般性能进行模拟,那么模拟参数与在特定经验分析中观察到的参数无关紧要。然而,在敏感性分析中,允许观察到的参数变化对分析人员没有任何价值。假设Y的方差也急剧变化,那么说未测量的混杂可以逆转因果方向,对于拥有观测方差为Y的数据集的研究人员来说是没有多大用处的。如果观察到一些数量(即Y对X的回归系数、仪器在X中解释的方差、X和Y的方差以及IV效应估计都观察到),只允许β uy ${beta}_{{uy}}$和β ux ${beta}_{{ux}}$变化并通过改变残差方差进行补偿,snp结果r2 ${R}^{2}$在任何β y ${beta}_{{y}}$和下都不会改变β ux ${beta}_{{ux}}$ parameters(支持信息说明)。简要介绍Lutz等人。 在分析中,他们指出,对于β xy =1$ {beta}_{{xy}}=1$的因果效应,β ux =−5$ {beta}_{{ux}}=-5$和β y ${beta有特定的未测量的混杂参数}_{{y}}$的取值范围在0到11之间。使用这些参数,他们建议MR Steiger方法有~90%的机会返回错误的因果方向。但是如果β ux ${beta}_{{ux}}$和β uy ${beta}_{{uy}}$被允许使用相同的值范围(例如:−11至11),那么Steiger方法只会在36%的混淆情况下返回不正确的结果。如果β ux ${beta}_{{ux}}$和β uy ${beta}_{{uy}}$的范围被限制为- 1到1,那么错误的结果只会出现在0.02%的场景中。在我们2017年的论文中(支持信息:注3)我们分析了更广泛的场景范围,以全面评估不可测量的混杂因素通常可能引入问题的程度,并得出结论,在大多数实际情况下,rxy 2 &lt;0.2$ {&lt; mpaddxmlns ="http://www.w3.org/1998/Math/MathML"&gt;R&lt;/mpadded&gt;}_{{xy}}^{2}lt 0.2$,未测量的混淆导致错误因果方向的机会非常小。如果分析人员有动机检查MR Steiger对未测量混杂的敏感性,则需要采用不同的方法,询问未测量混杂的哪些值支持对给定经验观察数量(X, Y和工具的方差,工具对X和Y的影响,以及X对Y的OLS估计)的推断因果方向。然后分析人员可以确定对其结论提出怀疑所需的混杂值是否合理。或者,可以确定可能的混杂参数空间的多少部分支持推断的因果方向。在补充说明中,我们提供了这个问题的分析解决方案。我们说明,在分析特定水平上,当混杂因素解释X和y中的大部分方差时,未测量的混杂因素逆转MR Steiger推断的因果方向的概率仅超过低概率。该方法包含在TwoSampleMR包中,并且速度很快,因为它使用封闭形式计算而不是UCRMS实现的随机模拟方法。
{"title":"Sensitivity analyses gain relevance by fixing parameters observable during the empirical analyses","authors":"Gibran Hemani,&nbsp;Apostolos Gkatzionis,&nbsp;Kate Tilling,&nbsp;George Davey Smith","doi":"10.1002/gepi.22530","DOIUrl":"10.1002/gepi.22530","url":null,"abstract":"&lt;p&gt;In 2017 we presented the MR Steiger method, a sensitivity analysis in Mendelian randomization (MR) for inferring causal directions between variables (Hemani et al., &lt;span&gt;2017&lt;/span&gt;). We discussed many of its potential limitations including that unmeasured confounding under certain extreme circumstances could lead to the wrong inferred causal direction. Lutz et al. (&lt;span&gt;2022&lt;/span&gt;) propose an R package (UCRMS) for performing sensitivity analysis of the MR Steiger method, and use it in an illustration to suggest that the MR Steiger method has a ~90% chance of giving the wrong answer due to unmeasured confounding. In this note we will show that an error in their approach to sensitivity analysis leads to the wrong conclusion about the validity of the MR Steiger test. We provide a valid alternative which uses the observed data to investigate sensitivity to unmeasured confounding.&lt;/p&gt;&lt;p&gt;A sensitivity analysis aims to understand the degree to which a result can change due to uncertainties in the inputs (Saltelli, &lt;span&gt;2002&lt;/span&gt;). In this case for the MR Steiger test, we need to ask how sensitive is the inference of the causal direction between X and Y to possible values of unmeasured confounders influencing X and Y. Importantly, there is relative certainty in many of the parameters of this system because they are easily observed, for example, the variances of X, Y and the instrumental variables (IVs), the estimated effect of the IVs on X and Y, and therefore the IV estimate of the effect of X on Y. Often the ordinary least squares (OLS) association between X and Y is also available either due to the analysis being performed using individual level data, or by sourcing the estimate from other published results. Therefore, an appropriate sensitivity analysis must explore the extent to which the inferred causal direction between X and Y can change due to unmeasured confounding, without causing these observed parameters to change.&lt;/p&gt;&lt;p&gt;Lutz et al.'s proposed method does not attempt to fix all observable parameters. In the simple example provided by Lutz et al. the variance of Y varies between 28 and 39, and the OLS estimate varies between 1 and −1 across the parameter values used for the sensitivity analysis. This arises because the residual variance—which is unobserved—is fixed in their approach. Instead the phenotypic variance—which is observed—should be fixed. If they were presenting a simulation of the general performance of MR Steiger under unmeasured confounding then it would not matter that the simulated parameters are not tied to those observed in a particular empirical analysis. However in a sensitivity analysis, allowing observed parameters to vary provides no value to the analyst. To say that unmeasured confounding could reverse the causal direction, provided that the variance of Y also changes drastically, is of little use to the researcher who has a data set with an observed variance of Y. If some quantities are observed (i.e. the re","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"461-462"},"PeriodicalIF":2.1,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22530","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10001501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of regmed and BayesNetty for exploring causal models with many variables regmed和贝叶斯网络在探索多变量因果模型方面的比较。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-06-27 DOI: 10.1002/gepi.22532
Richard Howey, Heather J. Cordell

Here we compare a recently proposed method and software package, regmed, with our own previously developed package, BayesNetty, designed to allow exploratory analysis of complex causal relationships between biological variables. We find that regmed generally has poorer recall but much better precision than BayesNetty. This is perhaps not too surprising as regmed is specifically designed for use with high-dimensional data. BayesNetty is found to be more sensitive to the resulting multiple testing problem encountered in these circumstances. However, as regmed is not designed to handle missing data, its performance is severely affected when missing data is present, whereas the performance of BayesNetty is only slightly affected. The performance of regmed can be rescued in this situation by first using BayesNetty to impute the missing data, and then applying regmed to the resulting “filled-in” data set.

在这里,我们将最近提出的方法和软件包regmed与我们之前开发的包BayesNetty进行了比较,该包旨在对生物变量之间的复杂因果关系进行探索性分析。我们发现regmed通常比BayesNetty具有较差的召回率,但精度要好得多。这也许并不太令人惊讶,因为regmed是专门为高维数据而设计的。BayesNetty被发现对在这些情况下遇到的多重测试问题更敏感。然而,由于regmed不是为处理丢失的数据而设计的,因此当存在丢失的数据时,其性能会受到严重影响,而BayesNetty的性能只会受到轻微影响。在这种情况下,可以通过首先使用BayesNetty来估算丢失的数据,然后将regmed应用于生成的“填充”数据集来挽救regmed的性能。
{"title":"Comparison of regmed and BayesNetty for exploring causal models with many variables","authors":"Richard Howey,&nbsp;Heather J. Cordell","doi":"10.1002/gepi.22532","DOIUrl":"10.1002/gepi.22532","url":null,"abstract":"<p>Here we compare a recently proposed method and software package, <span>regmed</span>, with our own previously developed package, BayesNetty, designed to allow exploratory analysis of complex causal relationships between biological variables. We find that \u0000<span>regmed</span> generally has poorer recall but much better precision than BayesNetty. This is perhaps not too surprising as \u0000<span>regmed</span> is specifically designed for use with high-dimensional data. BayesNetty is found to be more sensitive to the resulting multiple testing problem encountered in these circumstances. However, as \u0000<span>regmed</span> is not designed to handle missing data, its performance is severely affected when missing data is present, whereas the performance of BayesNetty is only slightly affected. The performance of \u0000<span>regmed</span> can be rescued in this situation by first using BayesNetty to impute the missing data, and then applying \u0000<span>regmed</span> to the resulting “filled-in” data set.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"496-502"},"PeriodicalIF":2.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22532","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9689871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A gene-based association test of interactions for maternal–fetal genotypes identifies genes associated with nonsyndromic congenital heart defects 一项基于基因的母婴基因型相互作用关联测试确定了与非综合征性先天性心脏缺陷相关的基因。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-06-21 DOI: 10.1002/gepi.22533
Manyan Huang, Chen Lyu, Nianjun Liu, Wendy N. Nembhard, John S. Witte, Charlotte A. Hobbs, Ming Li, the National Birth Defects Prevention Study

The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal–fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother–infant pairs and 1306 control mother–infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, TMEM107 (p = 1.64e−06) and CTC1 (p = 2.0e−06), were identified for significant association with CHD in common variants analysis. Gene TMEM107 regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene CTC1 plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of TMEM107 and CTC1 with CHDs.

先天性心脏缺陷(CHDs)的风险可能受到母体基因、胎儿基因及其相互作用的影响。现有的方法通常一次一个地测试母体和胎儿变异的影响,并且可能降低了检测具有较低次要等位基因频率的遗传变异的统计能力。在这篇文章中,我们提出了一种基于基因的母婴基因型相互作用关联测试(GATI-MFG),使用病例-母亲和对照-母亲设计。GATI-MFG可以整合基因或基因组区域内多种变体的影响,并评估母体和胎儿基因型的联合效应,同时考虑它们的相互作用。在模拟研究中,GATI-MFG比其他方法提高了统计能力,如在各种疾病情况下的单一变体测试和功能数据分析(FDA)。我们进一步将GATI-MFG应用于CHD的两阶段全基因组关联研究,以测试常见变异和罕见变异,使用来自国家出生缺陷预防研究(NBDPS)的947对CHD病例母婴对和1306对对照母婴对。在对23035个基因进行Bonferroni调整后,17号染色体TMEM107上的两个基因(p = 1.64e-06)和CTC1(p = 2.0e-06)在常见变异分析中被鉴定为与CHD显著相关。TMEM107基因调节纤毛生成和纤毛蛋白组成,并被发现与异位相关。CTC1基因在保护端粒免受降解方面发挥着重要作用,这被认为与心脏发生有关。总体而言,GATI-MFG在模拟中优于单一变体测试和美国食品药品监督管理局,应用于NBDPS样本的结果与支持TMEM107和CTC1与CHDs关联的现有文献一致。
{"title":"A gene-based association test of interactions for maternal–fetal genotypes identifies genes associated with nonsyndromic congenital heart defects","authors":"Manyan Huang,&nbsp;Chen Lyu,&nbsp;Nianjun Liu,&nbsp;Wendy N. Nembhard,&nbsp;John S. Witte,&nbsp;Charlotte A. Hobbs,&nbsp;Ming Li,&nbsp;the National Birth Defects Prevention Study","doi":"10.1002/gepi.22533","DOIUrl":"10.1002/gepi.22533","url":null,"abstract":"<p>The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal–fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother–infant pairs and 1306 control mother–infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, <i>TMEM107</i> (<i>p</i> = 1.64e−06) and <i>CTC1</i> (<i>p</i> = 2.0e−06), were identified for significant association with CHD in common variants analysis. Gene <i>TMEM107</i> regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene <i>CTC1</i> plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of <i>TMEM107</i> and <i>CTC1</i> with CHDs.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"475-495"},"PeriodicalIF":2.1,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22533","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9669966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phenotypic variance partitioning by transcriptomic gene expression levels and environmental variables for anthropometric traits using GTEx data 使用GTEx数据通过转录组基因表达水平和人体测量特征的环境变量进行表型方差划分。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-06-15 DOI: 10.1002/gepi.22531
Pastor Jullian Fabres, S. Hong Lee

Phenotypic variation in human is the results of genetic variation and environmental influences. Understanding the contribution of genetic and environmental components to phenotypic variation is of great interest. The variance explained by genome-wide single nucleotide polymorphisms (SNPs) typically represents a small proportion of the phenotypic variance for complex traits, which may be because the genome is only a part of the whole biological process to shape the phenotypes. In this study, we propose to partition the phenotypic variance of three anthropometric traits, using gene expression levels and environmental variables from GTEx data. We use the gene expression of four tissues that are deemed relevant for the anthropometric traits (two adipose tissues, skeletal muscle tissue and blood tissue). Additionally, we estimate the transcriptome–environment correlation that partly underlies the phenotypes of the anthropometric traits. We found that genetic factors play a significant role in determining body mass index (BMI), with the proportion of phenotypic variance explained by gene expression levels of visceral adipose tissue being 0.68 (SE = 0.06). However, we also observed that environmental factors such as age, sex, ancestry, smoking status, and drinking alcohol status have a small but significant impact (0.005, SE = 0.001). Interestingly, we found a significant negative correlation between the transcriptomic and environmental effects on BMI (transcriptome–environment correlation = −0.54, SE = 0.14), suggesting an antagonistic relationship. This implies that individuals with lower genetic profiles may be more susceptible to the effects of environmental factors on BMI, while those with higher genetic profiles may be less susceptible. We also show that the estimated transcriptomic variance varies across tissues, e.g., the gene expression levels of whole blood tissue and environmental variables explain a lower proportion of BMI phenotypic variance (0.16, SE = 0.05 and 0.04, SE = 0.004 respectively). We observed a significant positive correlation between transcriptomic and environmental effects (1.21, SE = 0.23) for this tissue. In conclusion, phenotypic variance partitioning can be done using gene expression and environmental data even with a small sample size (n = 838 from GTEx data), which can provide insights into how the transcriptomic and environmental effects contribute to the phenotypes of the anthropometric traits.

人类表型变异是遗传变异和环境影响的结果。了解遗传和环境成分对表型变异的贡献具有重要意义。全基因组单核苷酸多态性(SNPs)解释的变异通常代表复杂性状表型变异的一小部分,这可能是因为基因组只是形成表型的整个生物过程的一部分。在这项研究中,我们建议使用GTEx数据中的基因表达水平和环境变量来划分三个人体测量特征的表型方差。我们使用了四种被认为与人体测量特征相关的组织(两种脂肪组织、骨骼肌组织和血液组织)的基因表达。此外,我们估计了转录组与环境的相关性,这在一定程度上是人体测量特征表型的基础。我们发现遗传因素在决定体重指数(BMI)中起着重要作用,内脏脂肪组织基因表达水平解释的表型变异比例为0.68(SE = 0.06)。然而,我们还观察到,年龄、性别、祖先、吸烟状况和饮酒状况等环境因素的影响较小但显著(0.005,SE = 0.001)。有趣的是,我们发现转录组和环境对BMI的影响之间存在显著的负相关(转录组-环境相关性 = -0.54,SE = 0.14),表明存在拮抗关系。这意味着,基因图谱较低的个体可能更容易受到环境因素对BMI的影响,而基因图谱较高的个体可能不太容易受到影响。我们还表明,估计的转录组变异在不同组织中有所不同,例如,全血组织的基因表达水平和环境变量解释了BMI表型变异的较低比例(0.16,SE = 0.05和0.04,SE = 0.004)。我们观察到转录组和环境效应之间存在显著的正相关(1.21,SE = 0.23)。总之,即使样本量很小(n = 来自GTEx数据的838),其可以深入了解转录组学和环境效应如何对人体测量特征的表型做出贡献。
{"title":"Phenotypic variance partitioning by transcriptomic gene expression levels and environmental variables for anthropometric traits using GTEx data","authors":"Pastor Jullian Fabres,&nbsp;S. Hong Lee","doi":"10.1002/gepi.22531","DOIUrl":"10.1002/gepi.22531","url":null,"abstract":"<p>Phenotypic variation in human is the results of genetic variation and environmental influences. Understanding the contribution of genetic and environmental components to phenotypic variation is of great interest. The variance explained by genome-wide single nucleotide polymorphisms (SNPs) typically represents a small proportion of the phenotypic variance for complex traits, which may be because the genome is only a part of the whole biological process to shape the phenotypes. In this study, we propose to partition the phenotypic variance of three anthropometric traits, using gene expression levels and environmental variables from GTEx data. We use the gene expression of four tissues that are deemed relevant for the anthropometric traits (two adipose tissues, skeletal muscle tissue and blood tissue). Additionally, we estimate the transcriptome–environment correlation that partly underlies the phenotypes of the anthropometric traits. We found that genetic factors play a significant role in determining body mass index (BMI), with the proportion of phenotypic variance explained by gene expression levels of visceral adipose tissue being 0.68 (SE = 0.06). However, we also observed that environmental factors such as age, sex, ancestry, smoking status, and drinking alcohol status have a small but significant impact (0.005, SE = 0.001). Interestingly, we found a significant negative correlation between the transcriptomic and environmental effects on BMI (transcriptome–environment correlation = −0.54, SE = 0.14), suggesting an antagonistic relationship. This implies that individuals with lower genetic profiles may be more susceptible to the effects of environmental factors on BMI, while those with higher genetic profiles may be less susceptible. We also show that the estimated transcriptomic variance varies across tissues, e.g., the gene expression levels of whole blood tissue and environmental variables explain a lower proportion of BMI phenotypic variance (0.16, SE = 0.05 and 0.04, SE = 0.004 respectively). We observed a significant positive correlation between transcriptomic and environmental effects (1.21, SE = 0.23) for this tissue. In conclusion, phenotypic variance partitioning can be done using gene expression and environmental data even with a small sample size (<i>n</i> = 838 from GTEx data), which can provide insights into how the transcriptomic and environmental effects contribute to the phenotypes of the anthropometric traits.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"465-474"},"PeriodicalIF":2.1,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22531","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9687294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genetic Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1