首页 > 最新文献

Genetic Epidemiology最新文献

英文 中文
DYNATE: Localizing rare-variant association regions via multiple testing embedded in an aggregation tree DYNATE:通过嵌入在聚合树中的多个测试来定位稀有的关联区域。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-11-28 DOI: 10.1002/gepi.22542
Xuechan Li, John Pura, Andrew Allen, Kouros Owzar, Jianfeng Lu, Matthew Harms, Jichun Xie

Rare-variants (RVs) genetic association studies enable researchers to uncover the variation in phenotypic traits left unexplained by common variation. Traditional single-variant analysis lacks power; thus, researchers have developed various methods to aggregate the effects of RVs across genomic regions to study their collective impact. Some existing methods utilize a static delineation of genomic regions, often resulting in suboptimal effect aggregation, as neutral subregions within the test region will result in an attenuation of signal. Other methods use varying windows to search for signals but often result in long regions containing many neutral RVs. To pinpoint short genomic regions enriched for disease-associated RVs, we developed a novel method, DYNamic Aggregation TEsting (DYNATE). DYNATE dynamically and hierarchically aggregates smaller genomic regions into larger ones and performs multiple testing for disease associations with a controlled weighted false discovery rate. DYNATE's main advantage lies in its strong ability to identify short genomic regions highly enriched for disease-associated RVs. Extensive numerical simulations demonstrate the superior performance of DYNATE under various scenarios compared with existing methods. We applied DYNATE to an amyotrophic lateral sclerosis study and identified a new gene, EPG5, harboring possibly pathogenic mutations.

罕见变异(RVs)遗传关联研究使研究人员能够发现常见变异无法解释的表型性状变异。传统的单变量分析缺乏效力;因此,研究人员开发了各种方法来汇总rv在基因组区域的影响,以研究它们的集体影响。一些现有的方法利用基因组区域的静态描述,通常导致次优效应聚集,因为测试区域内的中性子区域将导致信号的衰减。其他方法使用不同的窗口来搜索信号,但往往导致包含许多中性rv的长区域。为了精确定位与疾病相关的rv富集的短基因组区域,我们开发了一种新的方法,动态聚合测试(DYNATE)。DYNATE动态地、分层地将较小的基因组区域聚合为较大的基因组区域,并在控制加权错误发现率的情况下对疾病关联进行多次测试。DYNATE的主要优势在于其识别疾病相关rv高度富集的短基因组区域的强大能力。大量的数值模拟表明,与现有方法相比,DYNATE在各种场景下都具有优越的性能。我们将DYNATE应用于肌萎缩性侧索硬化症的研究中,发现了一个新的基因EPG5,该基因可能具有致病性突变。
{"title":"DYNATE: Localizing rare-variant association regions via multiple testing embedded in an aggregation tree","authors":"Xuechan Li,&nbsp;John Pura,&nbsp;Andrew Allen,&nbsp;Kouros Owzar,&nbsp;Jianfeng Lu,&nbsp;Matthew Harms,&nbsp;Jichun Xie","doi":"10.1002/gepi.22542","DOIUrl":"10.1002/gepi.22542","url":null,"abstract":"<p>Rare-variants (RVs) genetic association studies enable researchers to uncover the variation in phenotypic traits left unexplained by common variation. Traditional single-variant analysis lacks power; thus, researchers have developed various methods to aggregate the effects of RVs across genomic regions to study their collective impact. Some existing methods utilize a static delineation of genomic regions, often resulting in suboptimal effect aggregation, as neutral subregions within the test region will result in an attenuation of signal. Other methods use varying windows to search for signals but often result in long regions containing many neutral RVs. To pinpoint short genomic regions enriched for disease-associated RVs, we developed a novel method, DYNamic Aggregation TEsting (DYNATE). DYNATE dynamically and hierarchically aggregates smaller genomic regions into larger ones and performs multiple testing for disease associations with a controlled weighted false discovery rate. DYNATE's main advantage lies in its strong ability to identify short genomic regions highly enriched for disease-associated RVs. Extensive numerical simulations demonstrate the superior performance of DYNATE under various scenarios compared with existing methods. We applied DYNATE to an amyotrophic lateral sclerosis study and identified a new gene, <i>EPG5</i>, harboring possibly pathogenic mutations.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 1","pages":"42-55"},"PeriodicalIF":2.1,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138444394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bias and mean squared error in Mendelian randomization with invalid instrumental variables 无效工具变量的孟德尔随机化中的偏差和均方误差。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-11-16 DOI: 10.1002/gepi.22541
Lu Deng, Sheng Fu, Kai Yu

Mendelian randomization (MR) is a statistical method that utilizes genetic variants as instrumental variables (IVs) to investigate causal relationships between risk factors and outcomes. Although MR has gained popularity in recent years due to its ability to analyze summary statistics from genome-wide association studies (GWAS), it requires a substantial number of single nucleotide polymorphisms (SNPs) as IVs to ensure sufficient power for detecting causal effects. Unfortunately, the complex genetic heritability of many traits can lead to the use of invalid IVs that affect both the risk factor and the outcome directly or through an unobserved confounder. This can result in biased and imprecise estimates, as reflected by a larger mean squared error (MSE). In this study, we focus on the widely used two-stage least squares (2SLS) method and derive formulas for its bias and MSE when estimating causal effects using invalid IVs. Using those formulas, we identify conditions under which the 2SLS estimate is unbiased and reveal how the independent or correlated pleiotropic effects influence the accuracy and precision of the 2SLS estimate. We validate these formulas through extensive simulation studies and demonstrate the application of those formulas in an MR study to evaluate the causal effect of the waist-to-hip ratio on various sleeping patterns. Our results can aid in designing future MR studies and serve as benchmarks for assessing more sophisticated MR methods.

孟德尔随机化(MR)是一种利用遗传变异作为工具变量(IVs)来研究风险因素与结果之间因果关系的统计方法。尽管MR近年来因其分析全基因组关联研究(GWAS)汇总统计数据的能力而受到欢迎,但它需要大量的单核苷酸多态性(snp)作为iv来确保检测因果效应的足够能力。不幸的是,许多性状的复杂遗传可导致使用无效的静脉注射,直接或通过未观察到的混杂因素影响风险因素和结果。这可能导致有偏差和不精确的估计,反映在较大的均方误差(MSE)上。在本研究中,我们重点研究了广泛使用的两阶段最小二乘(2SLS)方法,并推导了在使用无效IVs估计因果效应时其偏差和MSE的公式。利用这些公式,我们确定了2SLS估计无偏的条件,并揭示了独立或相关的多效效应如何影响2SLS估计的准确性和精度。我们通过广泛的模拟研究验证了这些公式,并演示了这些公式在MR研究中的应用,以评估腰臀比对各种睡眠模式的因果关系。我们的结果可以帮助设计未来的核磁共振研究,并作为评估更复杂的核磁共振方法的基准。
{"title":"Bias and mean squared error in Mendelian randomization with invalid instrumental variables","authors":"Lu Deng,&nbsp;Sheng Fu,&nbsp;Kai Yu","doi":"10.1002/gepi.22541","DOIUrl":"10.1002/gepi.22541","url":null,"abstract":"<p>Mendelian randomization (MR) is a statistical method that utilizes genetic variants as instrumental variables (IVs) to investigate causal relationships between risk factors and outcomes. Although MR has gained popularity in recent years due to its ability to analyze summary statistics from genome-wide association studies (GWAS), it requires a substantial number of single nucleotide polymorphisms (SNPs) as IVs to ensure sufficient power for detecting causal effects. Unfortunately, the complex genetic heritability of many traits can lead to the use of invalid IVs that affect both the risk factor and the outcome directly or through an unobserved confounder. This can result in biased and imprecise estimates, as reflected by a larger mean squared error (MSE). In this study, we focus on the widely used two-stage least squares (2SLS) method and derive formulas for its bias and MSE when estimating causal effects using invalid IVs. Using those formulas, we identify conditions under which the 2SLS estimate is unbiased and reveal how the independent or correlated pleiotropic effects influence the accuracy and precision of the 2SLS estimate. We validate these formulas through extensive simulation studies and demonstrate the application of those formulas in an MR study to evaluate the causal effect of the waist-to-hip ratio on various sleeping patterns. Our results can aid in designing future MR studies and serve as benchmarks for assessing more sophisticated MR methods.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 1","pages":"27-41"},"PeriodicalIF":2.1,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136397137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Limitation of permutation-based differential correlation analysis 基于排列的差分相关分析的局限性。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-11-10 DOI: 10.1002/gepi.22540
Hoseung Song, Michael C. Wu

The comparison of biological systems, through the analysis of molecular changes under different conditions, has played a crucial role in the progress of modern biological science. Specifically, differential correlation analysis (DCA) has been employed to determine whether relationships between genomic features differ across conditions or outcomes. Because ascertaining the null distribution of test statistics to capture variations in correlation is challenging, several DCA methods utilize permutation which can loosen parametric (e.g., normality) assumptions. However, permutation is often problematic for DCA due to violating the assumption that samples are exchangeable under the null. Here, we examine the limitations of permutation-based DCA and investigate instances where the permutation-based DCA exhibits poor performance. Experimental results show that the permutation-based DCA often fails to control the type I error under the null hypothesis of equal correlation structures.

通过分析不同条件下分子的变化,对生物系统进行比较,在现代生物科学的进步中发挥了至关重要的作用。具体而言,差异相关分析(DCA)已被用于确定基因组特征之间的关系是否因条件或结果而异。由于确定测试统计的零分布以捕捉相关性的变化是具有挑战性的,因此几种DCA方法利用排列来放松参数(例如,正态性)假设。然而,由于违反了样本在零下是可交换的假设,排列对于DCA来说往往是有问题的。在这里,我们研究了基于置换的DCA的局限性,并研究了基于排列的DCA表现出较差性能的实例。实验结果表明,在等相关结构的零假设下,基于置换的DCA往往无法控制I型误差。
{"title":"Limitation of permutation-based differential correlation analysis","authors":"Hoseung Song,&nbsp;Michael C. Wu","doi":"10.1002/gepi.22540","DOIUrl":"10.1002/gepi.22540","url":null,"abstract":"<p>The comparison of biological systems, through the analysis of molecular changes under different conditions, has played a crucial role in the progress of modern biological science. Specifically, differential correlation analysis (DCA) has been employed to determine whether relationships between genomic features differ across conditions or outcomes. Because ascertaining the null distribution of test statistics to capture variations in correlation is challenging, several DCA methods utilize permutation which can loosen parametric (e.g., normality) assumptions. However, permutation is often problematic for DCA due to violating the assumption that samples are exchangeable under the null. Here, we examine the limitations of permutation-based DCA and investigate instances where the permutation-based DCA exhibits poor performance. Experimental results show that the permutation-based DCA often fails to control the type I error under the null hypothesis of equal correlation structures.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"637-641"},"PeriodicalIF":2.1,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72014121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to “Abstracts” 对“摘要”的更正
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-11-09 DOI: 10.1002/gepi.22543

(2023), Abstracts. Genetic Epidemiology, 47: 520–581. https://doi.org/10.1002/gepi.22539

In the originally published Abstracts, there were authors missing for “Two-sample Mendelian Randomization Study of Circulating Metabolites and Prostate Cancer Risk in Hispanic Populations” (abstract 49). The correct authors and affiliations appear below and have been updated on the online version of the abstracts.

Harriett Fuller1, Rebecca Rohde2, Heather Highland2, Jiayi Shen3, Bing Yu4, Eric Boerwinkle4, Megan Grove4, Kari E. North2, David V. Conti3, Christopher A. Haiman3, Kristin Young2, Burcu F. Darst1

1Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA

2Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA

3Department of Population and Public Health Sciences, Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, California, USA

4School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA

We apologize for this error.

(2023),摘要。中华流行病学杂志,47(2):521 - 521。https://doi.org/10.1002/gepi.22539In在最初发表的摘要中,“西班牙裔人群循环代谢物和前列腺癌风险的两样本孟德尔随机化研究”(摘要49)缺少作者。正确的作者和所属机构如下所示,并已在摘要的在线版本上更新。Harriett Fuller1, Rebecca Rohde2, Heather Highland2, shenjiayi 3, yubing 4, Eric Boerwinkle4, Megan Grove4, Kari E. North2, David V. Conti3, Christopher A. Haiman3, Kristin Young2, Burcu F. darst11公共卫生科学部,Fred Hutchinson癌症中心,西雅图,华盛顿州,usa2,北卡罗来纳大学教堂山分校,教堂山,北卡罗来纳州,usa2人口与公共卫生科学部,遗传流行病学中心,南加州大学凯克医学院、美国公共卫生学院、德克萨斯大学休斯敦健康科学中心、美国德克萨斯州休斯敦我们为这个错误道歉。
{"title":"Correction to “Abstracts”","authors":"","doi":"10.1002/gepi.22543","DOIUrl":"10.1002/gepi.22543","url":null,"abstract":"<p>(2023), Abstracts. Genetic Epidemiology, 47: 520–581. https://doi.org/10.1002/gepi.22539</p><p>In the originally published Abstracts, there were authors missing for “Two-sample Mendelian Randomization Study of Circulating Metabolites and Prostate Cancer Risk in Hispanic Populations” (abstract 49). The correct authors and affiliations appear below and have been updated on the online version of the abstracts.</p><p>Harriett Fuller<sup>1</sup>, Rebecca Rohde<sup>2</sup>, Heather Highland<sup>2</sup>, Jiayi Shen<sup>3</sup>, Bing Yu<sup>4</sup>, Eric Boerwinkle<sup>4</sup>, Megan Grove<sup>4</sup>, Kari E. North<sup>2</sup>, David V. Conti<sup>3</sup>, Christopher A. Haiman<sup>3</sup>, Kristin Young<sup>2</sup>, Burcu F. Darst<sup>1</sup></p><p><sup>1</sup>Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA</p><p><sup>2</sup>Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA</p><p><sup>3</sup>Department of Population and Public Health Sciences, Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, California, USA</p><p><sup>4</sup>School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA</p><p>We apologize for this error.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"642"},"PeriodicalIF":2.1,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22543","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135241247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Haplotype reconstruction for genetically complex regions with ambiguous genotype calls: Illustration by the KIR gene region 具有模糊基因型调用的遗传复杂区域的单倍型重建:KIR基因区域的说明。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-10-13 DOI: 10.1002/gepi.22538
Lars L. J. van der Burg, Liesbeth C. de Wreede, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Hein Putter, Stefan Böhringer

Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (KIR) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation–maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of KIR genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the KIR gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.

DNA测序技术的进步使得能够对表现出拷贝数变异和高等位基因多样性的复杂遗传区域进行基因分型,但不可能在所有情况下都得出确切的基因型,这往往导致基因型调用不明确,即部分缺失数据。这种基因区域的一个例子是杀伤细胞免疫球蛋白样受体(KIR)基因。这些基因在异基因造血干细胞移植中具有特殊的意义。对于这样复杂的基因区域,目前的单倍型重建方法是不可行的,因为它们无法应对数据的复杂性。我们提出了一种期望最大化(EM)算法来估计单倍型频率(HTF),该算法处理缺失的数据成分,并考虑基因之间的连锁不平衡(LD)。为了应对基因添加后单倍型数量的指数增长,我们在标准EM算法实现中添加了三个组件。首先,重复进行重建,一次添加一个基因。其次,在每一步之后,频率低于阈值的单倍型在一个罕见的单倍型组中崩溃。第三,在随后的迭代中对罕见单倍型组的HTF进行了分析,以改进估计。一项模拟研究评估了组合多个基因的信息对这些频率估计的影响。我们证明估计的传热函数是近似无偏的。我们的模拟研究表明,当LD高时,EM算法能够组合来自多个基因的信息,而模糊度的增加会增加偏差。基于该EM的线性回归模型表明,大量单倍型对于无偏效应大小估计可能存在问题,并且模型需要稀疏。在KIR基因型的真实数据分析中,我们将HTFs与独立研究中获得的HTFs进行了比较。我们新的基于EM算法的方法是第一个考虑复杂基因区域(如KIR基因区域)的完整遗传结构的方法。该算法可以处理大量观察到的模糊性,并允许单倍型的折叠来执行隐式降维。结合来自多个基因的信息可以改善单倍型的重建。
{"title":"Haplotype reconstruction for genetically complex regions with ambiguous genotype calls: Illustration by the KIR gene region","authors":"Lars L. J. van der Burg,&nbsp;Liesbeth C. de Wreede,&nbsp;Henning Baldauf,&nbsp;Jürgen Sauter,&nbsp;Johannes Schetelig,&nbsp;Hein Putter,&nbsp;Stefan Böhringer","doi":"10.1002/gepi.22538","DOIUrl":"10.1002/gepi.22538","url":null,"abstract":"<p>Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (<i>KIR</i>) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation–maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of <i>KIR</i> genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the <i>KIR</i> gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 1","pages":"3-26"},"PeriodicalIF":2.1,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22538","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41198906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers 用于人类癌症体细胞突变和种系变异之间关联研究的数据适应性和基于途径的测试。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-10-11 DOI: 10.1002/gepi.22537
Zhongyuan Chen, Han Liang, Peng Wei

Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case–control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels. p $p$-values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.

癌症是一种由遗传基因变异和体细胞突变共同驱动的疾病。最近可获得的癌症基因组的大规模测序数据为研究它们之间的相互作用提供了前所未有的机会。然而,以前关于这一主题的研究受到简单、低统计幂检验(如Fisher精确检验)的限制。在本文中,我们设计了基于得分统计的数据自适应和基于路径的测试,用于体细胞突变和种系变异之间的关联研究。先前的研究表明,在一项病例对照研究中,两种基于单核苷酸多态性(SNP)集的关联测试,即自适应总分(aSPU)和基于数据自适应通路(aSUPath)的测试,提高了与单一疾病特征的全基因组关联研究(GWAS)的能力。我们将aSPU和aSUPath扩展到多性状,即队列研究中多个基因的体细胞突变,允许在SNP和基因水平上进行广泛的信息聚合。p$p$-假设不同的遗传结构,将来自不同参数的值组合起来,以产生体细胞突变和种系变异的数据适应性测试。广泛的模拟表明,与一些常用的方法相比,我们的数据适应性体细胞突变/种系变异测试可以应用于多个种系SNPs/基因/途径,并且通常具有更高的统计能力,同时保持适当的I型误差。拟议的测试应用于一个由2583名受试者组成的大型现实世界国际癌症基因组联合会全基因组测序数据集,在基因和途径水平上检测到与其他现有方法相比更显著和生物学相关的关联。我们的研究系统地确定了不同癌症类型的各种种系变异和体细胞突变之间的关联,这可能为癌症风险预测、预后和治疗提供有价值的实用性。
{"title":"Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers","authors":"Zhongyuan Chen,&nbsp;Han Liang,&nbsp;Peng Wei","doi":"10.1002/gepi.22537","DOIUrl":"10.1002/gepi.22537","url":null,"abstract":"<p>Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case–control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels. <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 </mrow>\u0000 <annotation> $p$</annotation>\u0000 </semantics></math>-values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"617-636"},"PeriodicalIF":2.1,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41198905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets ioSearch:一种使用新算法识别相互作用的多组学生物标志物的方法,应用于癌症数据集。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-10-05 DOI: 10.1002/gepi.22536
Sarmistha Das, Deo Kumar Srivastava

Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of p values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.

通过将多种组学整合在一起来鉴定生物标志物是很重要的,因为复杂的疾病是由于各种遗传物质的复杂相互作用而发生的。由于多重测试负担,传统的单组学关联测试既没有探索这种关键的组间依赖性,也没有识别出中等弱的信号。相反,多组学数据集成提供了互补的信息,但会增加多重测试负担、不同组学特征固有的数据多样性、高维度等。大多数可用的方法使用降维技术来解决亚型分类问题,以避免样本量问题,但相互作用的多组学生物标志物识别方法不可用。我们提出了一个两步模型,首先使用逻辑回归研究表型-组学关联。然后,使用稀疏主成分选择疾病相关组学,该主成分在多变量多元回归框架中从两个组学中探索多个变量的相互关系。在这个模型的基础上,我们开发了一种多组学生物标志物识别算法,即相互作用组学搜索(ioSearch),该算法通过使用通路信息来联合测试多个组学与疾病以及组学之间关联的影响,从而减少多重测试负担。此外,根据p值进行推断可能使其成为一种易于解释的生物标志物识别工具。广泛的模拟表明ioSearch在统计上是强大的,具有可控的I型错误率。它在公开的癌症数据集中的应用确定了重要途径中的相关组学特征。
{"title":"ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets","authors":"Sarmistha Das,&nbsp;Deo Kumar Srivastava","doi":"10.1002/gepi.22536","DOIUrl":"10.1002/gepi.22536","url":null,"abstract":"<p>Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of <i>p</i> values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"600-616"},"PeriodicalIF":2.1,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41108946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical methods to detect mother–father genetic interaction effects on risk of infertility: A genome-wide approach 检测父母遗传相互作用对不孕风险影响的统计方法:全基因组方法。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-08-28 DOI: 10.1002/gepi.22534
Siri N. Skodvin, Håkon K. Gjessing, Astanand Jugessur, Julia Romanowska, Christian M. Page, Elizabeth C. Corfield, Yunsung Lee, Siri E. Håberg, Miriam Gjerdevik

Infertility is a heterogeneous phenotype, and for many couples, the causes of fertility problems remain unknown. One understudied hypothesis is that allelic interactions between the genotypes of the two parents may influence the risk of infertility. Our aim was, therefore, to investigate how allelic interactions can be modeled using parental genotype data linked to 15,789 pregnancies selected from the Norwegian Mother, Father, and Child Cohort Study. The newborns in 1304 of these pregnancies were conceived using assisted reproductive technologies (ART), and the remainder were conceived naturally. Treating the use of ART as a proxy for infertility, different parameterizations were implemented in a genome-wide screen for interaction effects between maternal and paternal alleles at the same locus. Some of the models were more similar in the way they were parameterized, and some produced similar results when implemented on a genome-wide scale. The results showed near-significant interaction effects in genes relevant to the phenotype under study, such as Dynein axonemal heavy chain 17 (DNAH17) with a recognized role in male infertility. More generally, the interaction models presented here are readily adaptable to the study of other phenotypes in which maternal and paternal allelic interactions are likely to be involved.

不孕是一种异质性表型,对许多夫妇来说,生育问题的原因仍然未知。一个研究不足的假设是,父母双方基因型之间的等位基因相互作用可能会影响不孕的风险。因此,我们的目的是研究如何使用从挪威母亲、父亲和儿童队列研究中选择的15789例妊娠的父母基因型数据来模拟等位基因相互作用。其中1304例新生儿是使用辅助生殖技术(ART)受孕的,其余为自然受孕。将抗逆转录病毒疗法作为不孕不育的替代品,在全基因组筛查中对同一基因座的母亲和父亲等位基因之间的相互作用效应进行了不同的参数化。其中一些模型在参数化方面更为相似,有些模型在全基因组范围内实施时产生了类似的结果。结果显示,与所研究表型相关的基因,如Dynein轴索重链17(DNAH17),在男性不育中具有公认的作用,具有近乎显著的相互作用效应。更普遍地说,本文提出的相互作用模型很容易适用于研究可能涉及母体和父系等位基因相互作用的其他表型。
{"title":"Statistical methods to detect mother–father genetic interaction effects on risk of infertility: A genome-wide approach","authors":"Siri N. Skodvin,&nbsp;Håkon K. Gjessing,&nbsp;Astanand Jugessur,&nbsp;Julia Romanowska,&nbsp;Christian M. Page,&nbsp;Elizabeth C. Corfield,&nbsp;Yunsung Lee,&nbsp;Siri E. Håberg,&nbsp;Miriam Gjerdevik","doi":"10.1002/gepi.22534","DOIUrl":"10.1002/gepi.22534","url":null,"abstract":"<p>Infertility is a heterogeneous phenotype, and for many couples, the causes of fertility problems remain unknown. One understudied hypothesis is that allelic interactions between the genotypes of the two parents may influence the risk of infertility. Our aim was, therefore, to investigate how allelic interactions can be modeled using parental genotype data linked to 15,789 pregnancies selected from the Norwegian Mother, Father, and Child Cohort Study. The newborns in 1304 of these pregnancies were conceived using assisted reproductive technologies (ART), and the remainder were conceived naturally. Treating the use of ART as a proxy for infertility, different parameterizations were implemented in a genome-wide screen for interaction effects between maternal and paternal alleles at the same locus. Some of the models were more similar in the way they were parameterized, and some produced similar results when implemented on a genome-wide scale. The results showed near-significant interaction effects in genes relevant to the phenotype under study, such as Dynein axonemal heavy chain 17 (<i>DNAH17</i>) with a recognized role in male infertility. More generally, the interaction models presented here are readily adaptable to the study of other phenotypes in which maternal and paternal allelic interactions are likely to be involved.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"503-519"},"PeriodicalIF":2.1,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22534","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10084980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data GWAS汇总数据中无效工具变量的因果代谢物网络推断。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-08-13 DOI: 10.1002/gepi.22535
Siyi Chen, Zhaotong Lin, Xiaotong Shen, Ling Li, Wei Pan

We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.

我们提出结构方程模型(sem)作为一般框架来推断代谢物和其他复杂性状的因果网络。传统上,sem仅在假设所有工具变量(IVs)都有效的情况下用于个人层面的数据。为了克服这些限制,我们提出了基于SEMs的单样本和双样本因果网络推理方法,它们可以:(1)进行因果分析并发现多个特征之间的因果关系;(2)考虑到可能存在的一些无效的IVs;(3)在没有个人水平数据时,允许仅使用全基因组关联研究(GWAS)汇总统计数据进行数据分析;(4)考虑性状之间存在双向关系的可能性。我们的方法采用简单的逐步选择来识别无效的IVs,从而避免假阳性,同时可能增加基于两阶段最小二乘法(2SLS)的真实发现。我们使用真实的GWAS数据和模拟数据来证明我们的方法优于标准的2SLS/ sem。对于真实的数据分析,我们提出的方法应用于人类血液代谢物GWAS汇总数据集,以揭示代谢物之间假定的因果关系;我们还发现了一些(假定的)导致阿尔茨海默病(AD)的代谢物,这些代谢物与推断的因果代谢物网络一起,提示了一些可能参与AD的代谢物途径。
{"title":"Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data","authors":"Siyi Chen,&nbsp;Zhaotong Lin,&nbsp;Xiaotong Shen,&nbsp;Ling Li,&nbsp;Wei Pan","doi":"10.1002/gepi.22535","DOIUrl":"10.1002/gepi.22535","url":null,"abstract":"<p>We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"585-599"},"PeriodicalIF":2.1,"publicationDate":"2023-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22535","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10158155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity analyses gain relevance by fixing parameters observable during the empirical analyses 敏感性分析通过确定经验分析中可观察到的参数来获得相关性
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-07-07 DOI: 10.1002/gepi.22530
Gibran Hemani, Apostolos Gkatzionis, Kate Tilling, George Davey Smith
<p>In 2017 we presented the MR Steiger method, a sensitivity analysis in Mendelian randomization (MR) for inferring causal directions between variables (Hemani et al., <span>2017</span>). We discussed many of its potential limitations including that unmeasured confounding under certain extreme circumstances could lead to the wrong inferred causal direction. Lutz et al. (<span>2022</span>) propose an R package (UCRMS) for performing sensitivity analysis of the MR Steiger method, and use it in an illustration to suggest that the MR Steiger method has a ~90% chance of giving the wrong answer due to unmeasured confounding. In this note we will show that an error in their approach to sensitivity analysis leads to the wrong conclusion about the validity of the MR Steiger test. We provide a valid alternative which uses the observed data to investigate sensitivity to unmeasured confounding.</p><p>A sensitivity analysis aims to understand the degree to which a result can change due to uncertainties in the inputs (Saltelli, <span>2002</span>). In this case for the MR Steiger test, we need to ask how sensitive is the inference of the causal direction between X and Y to possible values of unmeasured confounders influencing X and Y. Importantly, there is relative certainty in many of the parameters of this system because they are easily observed, for example, the variances of X, Y and the instrumental variables (IVs), the estimated effect of the IVs on X and Y, and therefore the IV estimate of the effect of X on Y. Often the ordinary least squares (OLS) association between X and Y is also available either due to the analysis being performed using individual level data, or by sourcing the estimate from other published results. Therefore, an appropriate sensitivity analysis must explore the extent to which the inferred causal direction between X and Y can change due to unmeasured confounding, without causing these observed parameters to change.</p><p>Lutz et al.'s proposed method does not attempt to fix all observable parameters. In the simple example provided by Lutz et al. the variance of Y varies between 28 and 39, and the OLS estimate varies between 1 and −1 across the parameter values used for the sensitivity analysis. This arises because the residual variance—which is unobserved—is fixed in their approach. Instead the phenotypic variance—which is observed—should be fixed. If they were presenting a simulation of the general performance of MR Steiger under unmeasured confounding then it would not matter that the simulated parameters are not tied to those observed in a particular empirical analysis. However in a sensitivity analysis, allowing observed parameters to vary provides no value to the analyst. To say that unmeasured confounding could reverse the causal direction, provided that the variance of Y also changes drastically, is of little use to the researcher who has a data set with an observed variance of Y. If some quantities are observed (i.e. the re
2017年,我们提出了MR Steiger方法,这是一种孟德尔随机化(MR)的敏感性分析,用于推断变量之间的因果方向(Hemani et al., 2017)。我们讨论了它的许多潜在局限性,包括在某些极端情况下无法测量的混淆可能导致错误的推断因果方向。Lutz等人(2022)提出了一个R包(UCRMS),用于对MR Steiger方法进行敏感性分析,并在一个插图中使用它来表明MR Steiger方法有90%的机会由于未测量的混杂而给出错误的答案。在本笔记中,我们将表明,在他们的方法敏感性分析的错误导致错误的结论关于MR Steiger测试的有效性。我们提供了一种有效的替代方法,它使用观察到的数据来研究对未测量混杂的敏感性。敏感性分析旨在了解由于输入中的不确定性而导致结果变化的程度(Saltelli, 2002)。在MR Steiger检验的这种情况下,我们需要问X和Y之间因果方向的推断对影响X和Y的未测量混杂因素的可能值有多敏感。重要的是,该系统的许多参数具有相对确定性,因为它们很容易观察到,例如,X、Y和工具变量(IVs)的方差,IVs对X和Y的估计影响,因此,X对Y的影响的IV估计。通常,X和Y之间的普通最小二乘(OLS)关联也可以通过使用个人水平数据进行分析或通过从其他已发表的结果中获取估计而获得。因此,适当的敏感性分析必须探讨在不引起这些观测参数变化的情况下,推断出的X和Y之间的因果方向在多大程度上可能由于未测量的混杂而发生变化。Lutz等人提出的方法并不试图固定所有可观察的参数。在Lutz等人提供的简单示例中,Y的方差在28到39之间变化,用于敏感性分析的参数值的OLS估计值在1到−1之间变化。这是因为残差——未被观察到的——在他们的方法中是固定的。相反,观察到的表型差异应该是固定的。如果他们在未测量的混杂下对MR Steiger的一般性能进行模拟,那么模拟参数与在特定经验分析中观察到的参数无关紧要。然而,在敏感性分析中,允许观察到的参数变化对分析人员没有任何价值。假设Y的方差也急剧变化,那么说未测量的混杂可以逆转因果方向,对于拥有观测方差为Y的数据集的研究人员来说是没有多大用处的。如果观察到一些数量(即Y对X的回归系数、仪器在X中解释的方差、X和Y的方差以及IV效应估计都观察到),只允许β uy ${beta}_{{uy}}$和β ux ${beta}_{{ux}}$变化并通过改变残差方差进行补偿,snp结果r2 ${R}^{2}$在任何β y ${beta}_{{y}}$和下都不会改变β ux ${beta}_{{ux}}$ parameters(支持信息说明)。简要介绍Lutz等人。 在分析中,他们指出,对于β xy =1$ {beta}_{{xy}}=1$的因果效应,β ux =−5$ {beta}_{{ux}}=-5$和β y ${beta有特定的未测量的混杂参数}_{{y}}$的取值范围在0到11之间。使用这些参数,他们建议MR Steiger方法有~90%的机会返回错误的因果方向。但是如果β ux ${beta}_{{ux}}$和β uy ${beta}_{{uy}}$被允许使用相同的值范围(例如:−11至11),那么Steiger方法只会在36%的混淆情况下返回不正确的结果。如果β ux ${beta}_{{ux}}$和β uy ${beta}_{{uy}}$的范围被限制为- 1到1,那么错误的结果只会出现在0.02%的场景中。在我们2017年的论文中(支持信息:注3)我们分析了更广泛的场景范围,以全面评估不可测量的混杂因素通常可能引入问题的程度,并得出结论,在大多数实际情况下,rxy 2 &lt;0.2$ {&lt; mpaddxmlns ="http://www.w3.org/1998/Math/MathML"&gt;R&lt;/mpadded&gt;}_{{xy}}^{2}lt 0.2$,未测量的混淆导致错误因果方向的机会非常小。如果分析人员有动机检查MR Steiger对未测量混杂的敏感性,则需要采用不同的方法,询问未测量混杂的哪些值支持对给定经验观察数量(X, Y和工具的方差,工具对X和Y的影响,以及X对Y的OLS估计)的推断因果方向。然后分析人员可以确定对其结论提出怀疑所需的混杂值是否合理。或者,可以确定可能的混杂参数空间的多少部分支持推断的因果方向。在补充说明中,我们提供了这个问题的分析解决方案。我们说明,在分析特定水平上,当混杂因素解释X和y中的大部分方差时,未测量的混杂因素逆转MR Steiger推断的因果方向的概率仅超过低概率。该方法包含在TwoSampleMR包中,并且速度很快,因为它使用封闭形式计算而不是UCRMS实现的随机模拟方法。
{"title":"Sensitivity analyses gain relevance by fixing parameters observable during the empirical analyses","authors":"Gibran Hemani,&nbsp;Apostolos Gkatzionis,&nbsp;Kate Tilling,&nbsp;George Davey Smith","doi":"10.1002/gepi.22530","DOIUrl":"10.1002/gepi.22530","url":null,"abstract":"&lt;p&gt;In 2017 we presented the MR Steiger method, a sensitivity analysis in Mendelian randomization (MR) for inferring causal directions between variables (Hemani et al., &lt;span&gt;2017&lt;/span&gt;). We discussed many of its potential limitations including that unmeasured confounding under certain extreme circumstances could lead to the wrong inferred causal direction. Lutz et al. (&lt;span&gt;2022&lt;/span&gt;) propose an R package (UCRMS) for performing sensitivity analysis of the MR Steiger method, and use it in an illustration to suggest that the MR Steiger method has a ~90% chance of giving the wrong answer due to unmeasured confounding. In this note we will show that an error in their approach to sensitivity analysis leads to the wrong conclusion about the validity of the MR Steiger test. We provide a valid alternative which uses the observed data to investigate sensitivity to unmeasured confounding.&lt;/p&gt;&lt;p&gt;A sensitivity analysis aims to understand the degree to which a result can change due to uncertainties in the inputs (Saltelli, &lt;span&gt;2002&lt;/span&gt;). In this case for the MR Steiger test, we need to ask how sensitive is the inference of the causal direction between X and Y to possible values of unmeasured confounders influencing X and Y. Importantly, there is relative certainty in many of the parameters of this system because they are easily observed, for example, the variances of X, Y and the instrumental variables (IVs), the estimated effect of the IVs on X and Y, and therefore the IV estimate of the effect of X on Y. Often the ordinary least squares (OLS) association between X and Y is also available either due to the analysis being performed using individual level data, or by sourcing the estimate from other published results. Therefore, an appropriate sensitivity analysis must explore the extent to which the inferred causal direction between X and Y can change due to unmeasured confounding, without causing these observed parameters to change.&lt;/p&gt;&lt;p&gt;Lutz et al.'s proposed method does not attempt to fix all observable parameters. In the simple example provided by Lutz et al. the variance of Y varies between 28 and 39, and the OLS estimate varies between 1 and −1 across the parameter values used for the sensitivity analysis. This arises because the residual variance—which is unobserved—is fixed in their approach. Instead the phenotypic variance—which is observed—should be fixed. If they were presenting a simulation of the general performance of MR Steiger under unmeasured confounding then it would not matter that the simulated parameters are not tied to those observed in a particular empirical analysis. However in a sensitivity analysis, allowing observed parameters to vary provides no value to the analyst. To say that unmeasured confounding could reverse the causal direction, provided that the variance of Y also changes drastically, is of little use to the researcher who has a data set with an observed variance of Y. If some quantities are observed (i.e. the re","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"461-462"},"PeriodicalIF":2.1,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22530","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10001501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genetic Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1