首页 > 最新文献

Genetic Epidemiology最新文献

英文 中文
Are trait-associated genes clustered together in a gene network? 与性状相关的基因是否在基因网络中聚集在一起?
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-03-12 DOI: 10.1002/gepi.22557
Hyun Jung Koo, Wei Pan

Genome-wide association studies (GWAS) have provided an abundance of information about the genetic variants and their loci that are associated to complex traits and diseases. However, due to linkage disequilibrium (LD) and noncoding regions of loci, it remains a challenge to pinpoint the causal genes. Gene network-based approaches, paired with network diffusion methods, have been proposed to prioritize causal genes and to boost statistical power in GWAS based on the assumption that trait-associated genes are clustered in a gene network. Due to the difficulty in mapping trait-associated variants to genes in GWAS, this assumption has never been directly or rigorously tested empirically. On the other hand, whole exome sequencing (WES) data focuses on the protein-coding regions, directly identifying trait-associated genes. In this study, we tested the assumption by leveraging the recently available exome-based association statistics from the UK Biobank WES data along with two types of networks. We found that almost all trait-associated genes were significantly more proximal to each other than randomly selected genes within both networks. These results support the assumption that trait-associated genes are clustered in gene networks, which can be further leveraged to boost the power of GWAS such as by introducing less stringent p value thresholds.

全基因组关联研究(GWAS)提供了大量关于与复杂性状和疾病相关的基因变异及其位点的信息。然而,由于基因位点的连锁不平衡(LD)和非编码区,要精确定位致病基因仍是一项挑战。基于性状相关基因聚集在基因网络中这一假设,人们提出了基于基因网络的方法,并将其与网络扩散方法相结合,以确定因果基因的优先顺序,并提高 GWAS 的统计能力。由于在 GWAS 中很难将性状相关变异映射到基因上,因此这一假设从未经过直接或严格的实证检验。另一方面,全外显子组测序(WES)数据侧重于蛋白质编码区,可直接识别性状相关基因。在本研究中,我们利用最近从英国生物库 WES 数据中获得的基于外显子组的关联统计以及两种类型的网络,对这一假设进行了检验。我们发现,在这两种网络中,几乎所有性状相关基因之间的距离都明显比随机选择的基因更近。这些结果支持了性状相关基因聚集在基因网络中的假设,可以进一步利用基因网络来提高 GWAS 的能力,如引入不那么严格的 p 值阈值。
{"title":"Are trait-associated genes clustered together in a gene network?","authors":"Hyun Jung Koo,&nbsp;Wei Pan","doi":"10.1002/gepi.22557","DOIUrl":"10.1002/gepi.22557","url":null,"abstract":"<p>Genome-wide association studies (GWAS) have provided an abundance of information about the genetic variants and their loci that are associated to complex traits and diseases. However, due to linkage disequilibrium (LD) and noncoding regions of loci, it remains a challenge to pinpoint the causal genes. Gene network-based approaches, paired with network diffusion methods, have been proposed to prioritize causal genes and to boost statistical power in GWAS based on the assumption that trait-associated genes are clustered in a gene network. Due to the difficulty in mapping trait-associated variants to genes in GWAS, this assumption has never been directly or rigorously tested empirically. On the other hand, whole exome sequencing (WES) data focuses on the protein-coding regions, directly identifying trait-associated genes. In this study, we tested the assumption by leveraging the recently available exome-based association statistics from the UK Biobank WES data along with two types of networks. We found that almost all trait-associated genes were significantly more proximal to each other than randomly selected genes within both networks. These results support the assumption that trait-associated genes are clustered in gene networks, which can be further leveraged to boost the power of GWAS such as by introducing less stringent <i>p</i> value thresholds.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 5","pages":"203-213"},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22557","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140109761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling challenges in Mendelian randomization for gene–environment interaction 揭示基因与环境相互作用的孟德尔随机化所面临的挑战。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-02-29 DOI: 10.1002/gepi.22552
Malka Gorfine, Conghui Qu, Ulrike Peters, Li Hsu

Gene–environment (GxE) interactions play a crucial role in understanding the complex etiology of various traits, but assessing them using observational data can be challenging due to unmeasured confounders for lifestyle and environmental risk factors. Mendelian randomization (MR) has emerged as a valuable method for assessing causal relationships based on observational data. This approach utilizes genetic variants as instrumental variables (IVs) with the aim of providing a valid statistical test and estimation of causal effects in the presence of unmeasured confounders. MR has gained substantial popularity in recent years largely due to the success of genome-wide association studies. Many methods have been developed for MR; however, limited work has been done on evaluating GxE interaction. In this paper, we focus on two primary IV approaches: the two-stage predictor substitution and the two-stage residual inclusion, and extend them to accommodate GxE interaction under both the linear and logistic regression models for continuous and binary outcomes, respectively. Comprehensive simulation study and analytical derivations reveal that resolving the linear regression model is relatively straightforward. In contrast, the logistic regression model presents a considerably more intricate challenge, which demands additional effort.

基因与环境(GxE)之间的相互作用在了解各种性状的复杂病因方面起着至关重要的作用,但由于生活方式和环境风险因素的混杂因素无法测量,因此利用观察数据评估这些相互作用具有挑战性。孟德尔随机化(MR)已成为一种基于观察数据评估因果关系的重要方法。这种方法利用遗传变异作为工具变量(IV),目的是在存在未测量混杂因素的情况下提供有效的统计检验和因果效应估计。近年来,主要由于全基因组关联研究的成功,MR 得到了广泛的推广。目前已开发出许多 MR 方法,但评估 GxE 相互作用的工作还很有限。在本文中,我们重点讨论了两种主要的 IV 方法:两阶段预测因子替换法和两阶段残差包含法,并将它们扩展到线性回归模型和逻辑回归模型下,分别用于连续结果和二元结果的 GxE 交互作用。综合模拟研究和分析推导表明,线性回归模型的解析相对简单。相比之下,逻辑回归模型面临的挑战要复杂得多,需要付出更多的努力。
{"title":"Unveiling challenges in Mendelian randomization for gene–environment interaction","authors":"Malka Gorfine,&nbsp;Conghui Qu,&nbsp;Ulrike Peters,&nbsp;Li Hsu","doi":"10.1002/gepi.22552","DOIUrl":"10.1002/gepi.22552","url":null,"abstract":"<p>Gene–environment (GxE) interactions play a crucial role in understanding the complex etiology of various traits, but assessing them using observational data can be challenging due to unmeasured confounders for lifestyle and environmental risk factors. Mendelian randomization (MR) has emerged as a valuable method for assessing causal relationships based on observational data. This approach utilizes genetic variants as instrumental variables (IVs) with the aim of providing a valid statistical test and estimation of causal effects in the presence of unmeasured confounders. MR has gained substantial popularity in recent years largely due to the success of genome-wide association studies. Many methods have been developed for MR; however, limited work has been done on evaluating GxE interaction. In this paper, we focus on two primary IV approaches: the two-stage predictor substitution and the two-stage residual inclusion, and extend them to accommodate GxE interaction under both the linear and logistic regression models for continuous and binary outcomes, respectively. Comprehensive simulation study and analytical derivations reveal that resolving the linear regression model is relatively straightforward. In contrast, the logistic regression model presents a considerably more intricate challenge, which demands additional effort.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 4","pages":"164-189"},"PeriodicalIF":2.1,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139989797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust use of phenotypic heterogeneity at drug target genes for mechanistic insights: Application of cis-multivariable Mendelian randomization to GLP1R gene region 利用药物靶基因的表型异质性深入了解机理:顺式多变量孟德尔随机化在 GLP1R 基因区域的应用。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-02-20 DOI: 10.1002/gepi.22551
Ashish Patel, Dipender Gill, Dmitry Shungin, Christos S. Mantzoros, Lotte Bjerre Knudsen, Jack Bowden, Stephen Burgess

Phenotypic heterogeneity at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight into the pathways by which pharmacological interventions may affect disease risk. However, statistical inference in such investigations may be poor if overdispersion heterogeneity in measured genetic associations is unaccounted for. In this work, we first develop conditional F statistics for dimension-reduced genetic associations that enable more accurate measurement of phenotypic heterogeneity. We then develop a novel extension for two-sample multivariable Mendelian randomization that accounts for overdispersion heterogeneity in dimension-reduced genetic associations. Our empirical focus is to use genetic variants in the GLP1R gene region to understand the mechanism by which GLP1R agonism affects coronary artery disease (CAD) risk. Colocalization analyses indicate that distinct variants in the GLP1R gene region are associated with body mass index and type 2 diabetes (T2D). Multivariable Mendelian randomization analyses that were corrected for overdispersion heterogeneity suggest that bodyweight lowering rather than T2D liability lowering effects of GLP1R agonism are more likely contributing to reduced CAD risk. Tissue-specific analyses prioritized brain tissue as the most likely to be relevant for CAD risk, of the tissues considered. We hope the multivariable Mendelian randomization approach illustrated here is widely applicable to better understand mechanisms linking drug targets to diseases outcomes, and hence to guide drug development efforts.

多变量孟德尔随机化法可以利用编码药物靶点的基因组位点的表型异质性,深入了解药物干预可能影响疾病风险的途径。然而,如果不考虑测得的遗传关联中的过度分散异质性,此类研究的统计推断可能会很差。在这项工作中,我们首先开发了降维遗传关联的条件 F 统计量,从而能够更准确地测量表型异质性。然后,我们为双样本多变量孟德尔随机化开发了一种新的扩展方法,以考虑降维遗传关联中的过度分散异质性。我们的实证重点是利用 GLP1R 基因区域的遗传变异来了解 GLP1R 激动作用影响冠状动脉疾病(CAD)风险的机制。共定位分析表明,GLP1R 基因区的不同变异与体重指数和 2 型糖尿病(T2D)有关。校正了过度分散异质性的多变量孟德尔随机分析表明,GLP1R激动剂降低体重而非T2D责任的作用更有可能降低CAD风险。组织特异性分析认为,在所考虑的组织中,脑组织最有可能与冠心病风险相关。我们希望本文介绍的多变量孟德尔随机化方法能广泛应用于更好地理解药物靶点与疾病结果之间的关联机制,从而指导药物开发工作。
{"title":"Robust use of phenotypic heterogeneity at drug target genes for mechanistic insights: Application of cis-multivariable Mendelian randomization to GLP1R gene region","authors":"Ashish Patel,&nbsp;Dipender Gill,&nbsp;Dmitry Shungin,&nbsp;Christos S. Mantzoros,&nbsp;Lotte Bjerre Knudsen,&nbsp;Jack Bowden,&nbsp;Stephen Burgess","doi":"10.1002/gepi.22551","DOIUrl":"10.1002/gepi.22551","url":null,"abstract":"<p>Phenotypic heterogeneity at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight into the pathways by which pharmacological interventions may affect disease risk. However, statistical inference in such investigations may be poor if overdispersion heterogeneity in measured genetic associations is unaccounted for. In this work, we first develop conditional <i>F</i> statistics for dimension-reduced genetic associations that enable more accurate measurement of phenotypic heterogeneity. We then develop a novel extension for two-sample multivariable Mendelian randomization that accounts for overdispersion heterogeneity in dimension-reduced genetic associations. Our empirical focus is to use genetic variants in the <i>GLP1R</i> gene region to understand the mechanism by which GLP1R agonism affects coronary artery disease (CAD) risk. Colocalization analyses indicate that distinct variants in the <i>GLP1R</i> gene region are associated with body mass index and type 2 diabetes (T2D). Multivariable Mendelian randomization analyses that were corrected for overdispersion heterogeneity suggest that bodyweight lowering rather than T2D liability lowering effects of GLP1R agonism are more likely contributing to reduced CAD risk. Tissue-specific analyses prioritized brain tissue as the most likely to be relevant for CAD risk, of the tissues considered. We hope the multivariable Mendelian randomization approach illustrated here is widely applicable to better understand mechanisms linking drug targets to diseases outcomes, and hence to guide drug development efforts.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 4","pages":"151-163"},"PeriodicalIF":2.1,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139912418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making sense of breast cancer risk estimates 合理估算乳腺癌风险。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-02-09 DOI: 10.1002/gepi.22550
John O'Quigley

Individual probabilistic assessments on the risk of cancer, primary or secondary, will not be understood by most patients. That is the essence of our arguments in this paper. Greater understanding can be achieved by extensive, intensive, and detailed counseling. But since probability itself is a concept that easily escapes our everyday intuition—consider the famous Monte Hall paradox—then it would also be wise to advise patients and potential patients, to not put undue weight on any probabilistic assessment. Such assessments can be of value to the epidemiologist in the investigation of different potential etiologies describing cancer evolution or to the clinical trialist as a way to maximize design efficiency. But to an ordinary individual we cannot anticipate that these assessments will be correctly interpreted.

大多数患者无法理解对癌症风险(原发性或继发性)的个别概率评估。这就是我们本文论点的实质。通过广泛、深入和详细的咨询可以加深理解。但是,由于概率本身是一个很容易脱离我们日常直觉的概念--想想著名的蒙特-霍尔悖论--因此,建议患者和潜在患者不要过分看重任何概率评估也是明智之举。这种评估对于流行病学家调查描述癌症演变的不同潜在病因,或者对于临床试验人员最大限度地提高设计效率,都是有价值的。但对于普通人来说,我们无法预料这些评估会得到正确的解释。
{"title":"Making sense of breast cancer risk estimates","authors":"John O'Quigley","doi":"10.1002/gepi.22550","DOIUrl":"10.1002/gepi.22550","url":null,"abstract":"<p>Individual probabilistic assessments on the risk of cancer, primary or secondary, will not be understood by most patients. That is the essence of our arguments in this paper. Greater understanding can be achieved by extensive, intensive, and detailed counseling. But since probability itself is a concept that easily escapes our everyday intuition—consider the famous Monte Hall paradox—then it would also be wise to advise patients and potential patients, to not put undue weight on any probabilistic assessment. Such assessments can be of value to the epidemiologist in the investigation of different potential etiologies describing cancer evolution or to the clinical trialist as a way to maximize design efficiency. But to an ordinary individual we cannot anticipate that these assessments will be correctly interpreted.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 3","pages":"141-147"},"PeriodicalIF":2.1,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139706548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revealing genomic heterogeneity and commonality: A penalized integrative analysis approach accounting for the adjacency structure of measurements 揭示基因组异质性和共性:一种考虑到测量邻接结构的惩罚性综合分析方法。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-02-05 DOI: 10.1002/gepi.22549
Xindi Wang, Yu Jiang, Yifan Sun

Advancements in high-throughput genomic technologies have revolutionized the field of disease biomarker identification by providing large-scale genomic data. There is an increasing focus on understanding the relationships among diverse patient groups with distinct disease subtypes and characteristics. Complex diseases exhibit both heterogeneity and shared genomic factors, making it essential to investigate these patterns to accurately detect markers and comprehensively understand the diseases. Integrative analysis has emerged as a promising approach to address this challenge. However, existing studies have been limited by ignoring the adjacency structure of genomic measurements, such as single nucleotide polymorphisms (SNPs) and DNA methylations. In this study, we propose a structured integrative analysis method that incorporates a spline type penalty to accommodate this adjacency structure. We utilize a fused lasso type penalty to identify both heterogeneity and commonality across the groups. Extensive simulations demonstrate its superiority compared to several direct competing methods. The analysis of The Cancer Genome Atlas melanoma data with DNA methylation measurements and GENEVA diabetes data with SNP measurements exhibit that the proposed analysis lead to meaningful findings with better prediction performance and higher selection stability.

高通量基因组技术的进步提供了大规模的基因组数据,从而彻底改变了疾病生物标志物鉴定领域。人们越来越重视了解具有不同疾病亚型和特征的不同患者群体之间的关系。复杂的疾病既有异质性,也有共同的基因组因素,因此必须研究这些模式,以准确检测标记物,全面了解疾病。整合分析已成为应对这一挑战的一种有前途的方法。然而,现有的研究由于忽略了单核苷酸多态性(SNP)和 DNA 甲基化等基因组测量的邻接结构而受到限制。在本研究中,我们提出了一种结构化综合分析方法,该方法结合了样条线型惩罚,以适应这种邻接结构。我们利用融合套索型惩罚来识别各组间的异质性和共性。大量的模拟证明,与几种直接竞争的方法相比,这种方法更胜一筹。对癌症基因组图谱黑色素瘤数据(DNA 甲基化测量)和 GENEVA 糖尿病数据(SNP 测量)的分析表明,所提出的分析方法具有更好的预测性能和更高的选择稳定性,能带来有意义的发现。
{"title":"Revealing genomic heterogeneity and commonality: A penalized integrative analysis approach accounting for the adjacency structure of measurements","authors":"Xindi Wang,&nbsp;Yu Jiang,&nbsp;Yifan Sun","doi":"10.1002/gepi.22549","DOIUrl":"10.1002/gepi.22549","url":null,"abstract":"<p>Advancements in high-throughput genomic technologies have revolutionized the field of disease biomarker identification by providing large-scale genomic data. There is an increasing focus on understanding the relationships among diverse patient groups with distinct disease subtypes and characteristics. Complex diseases exhibit both heterogeneity and shared genomic factors, making it essential to investigate these patterns to accurately detect markers and comprehensively understand the diseases. Integrative analysis has emerged as a promising approach to address this challenge. However, existing studies have been limited by ignoring the adjacency structure of genomic measurements, such as single nucleotide polymorphisms (SNPs) and DNA methylations. In this study, we propose a structured integrative analysis method that incorporates a spline type penalty to accommodate this adjacency structure. We utilize a fused lasso type penalty to identify both heterogeneity and commonality across the groups. Extensive simulations demonstrate its superiority compared to several direct competing methods. The analysis of The Cancer Genome Atlas melanoma data with DNA methylation measurements and GENEVA diabetes data with SNP measurements exhibit that the proposed analysis lead to meaningful findings with better prediction performance and higher selection stability.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 3","pages":"114-140"},"PeriodicalIF":2.1,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139691643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gene-based association tests in family samples using GWAS summary statistics 使用 GWAS 概要统计在家族样本中进行基于基因的关联测试。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-02-05 DOI: 10.1002/gepi.22548
Peng Wang, Xiao Xu, Ming Li, Xiang-Yang Lou, Siqi Xu, Baolin Wu, Guimin Gao, Ping Yin, Nianjun Liu

Genome-wide association studies (GWAS) have led to rapid growth in detecting genetic variants associated with various phenotypes. Owing to a great number of publicly accessible GWAS summary statistics, and the difficulty in obtaining individual-level genotype data, many existing gene-based association tests have been adapted to require only GWAS summary statistics rather than individual-level data. However, these association tests are restricted to unrelated individuals and thus do not apply to family samples directly. Moreover, due to its flexibility and effectiveness, the linear mixed model has been increasingly utilized in GWAS to handle correlated data, such as family samples. However, it remains unknown how to perform gene-based association tests in family samples using the GWAS summary statistics estimated from the linear mixed model. In this study, we show that, when family size is negligible compared to the total sample size, the diagonal block structure of the kinship matrix makes it possible to approximate the correlation matrix of marginal Z scores by linkage disequilibrium matrix. Based on this result, current methods utilizing summary statistics for unrelated individuals can be directly applied to family data without any modifications. Our simulation results demonstrate that this proposed strategy controls the type 1 error rate well in various situations. Finally, we exemplify the usefulness of the proposed approach with a dental caries GWAS data set.

全基因组关联研究(GWAS)在检测与各种表型相关的基因变异方面发展迅速。由于有大量可公开获取的全基因组关联研究摘要统计数据,而获取个体水平的基因型数据又十分困难,因此许多现有的基于基因的关联检验已被调整为只需要全基因组关联研究摘要统计数据,而不需要个体水平的数据。然而,这些关联检验仅限于非相关个体,因此不能直接应用于家族样本。此外,线性混合模型因其灵活性和有效性,越来越多地被用于 GWAS,以处理家族样本等相关数据。然而,如何利用线性混合模型估算出的 GWAS 概要统计量在家族样本中进行基于基因的关联检验仍是一个未知数。在本研究中,我们发现当家族规模与总样本规模相比可以忽略不计时,亲缘关系矩阵的对角块结构可以通过连锁不平衡矩阵近似得到边际 Z 分数的相关矩阵。基于这一结果,目前利用非亲属关系个体汇总统计的方法可以直接应用于家族数据,无需做任何修改。我们的模拟结果表明,所提出的这一策略在各种情况下都能很好地控制类型 1 错误率。最后,我们用一个龋齿 GWAS 数据集举例说明了所提方法的实用性。
{"title":"Gene-based association tests in family samples using GWAS summary statistics","authors":"Peng Wang,&nbsp;Xiao Xu,&nbsp;Ming Li,&nbsp;Xiang-Yang Lou,&nbsp;Siqi Xu,&nbsp;Baolin Wu,&nbsp;Guimin Gao,&nbsp;Ping Yin,&nbsp;Nianjun Liu","doi":"10.1002/gepi.22548","DOIUrl":"10.1002/gepi.22548","url":null,"abstract":"<p>Genome-wide association studies (GWAS) have led to rapid growth in detecting genetic variants associated with various phenotypes. Owing to a great number of publicly accessible GWAS summary statistics, and the difficulty in obtaining individual-level genotype data, many existing gene-based association tests have been adapted to require only GWAS summary statistics rather than individual-level data. However, these association tests are restricted to unrelated individuals and thus do not apply to family samples directly. Moreover, due to its flexibility and effectiveness, the linear mixed model has been increasingly utilized in GWAS to handle correlated data, such as family samples. However, it remains unknown how to perform gene-based association tests in family samples using the GWAS summary statistics estimated from the linear mixed model. In this study, we show that, when family size is negligible compared to the total sample size, the diagonal block structure of the kinship matrix makes it possible to approximate the correlation matrix of marginal <i>Z</i> scores by linkage disequilibrium matrix. Based on this result, current methods utilizing summary statistics for unrelated individuals can be directly applied to family data without any modifications. Our simulation results demonstrate that this proposed strategy controls the type 1 error rate well in various situations. Finally, we exemplify the usefulness of the proposed approach with a dental caries GWAS data set.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 3","pages":"103-113"},"PeriodicalIF":2.1,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22548","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139691642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating type 1 error inflation and power loss in GxE PRS: Genotype–environment interaction in polygenic risk score models 缓解 GxE PRS 中的 1 型错误膨胀和功率损失:多基因风险评分模型中基因型与环境的相互作用。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-02-01 DOI: 10.1002/gepi.22546
Dovini Jayasinghe, Md. Moksedul Momin, Kerri Beckmann, Elina Hyppönen, Beben Benyamin, S. Hong Lee

The use of polygenic risk score (PRS) models has transformed the field of genetics by enabling the prediction of complex traits and diseases based on an individual's genetic profile. However, the impact of genotype–environment interaction (GxE) on the performance and applicability of PRS models remains a crucial aspect to be explored. Currently, existing genotype–environment interaction polygenic risk score (GxE PRS) models are often inappropriately used, which can result in inflated type 1 error rates and compromised results. In this study, we propose novel GxE PRS models that jointly incorporate additive and interaction genetic effects although also including an additional quadratic term for nongenetic covariates, enhancing their robustness against model misspecification. Through extensive simulations, we demonstrate that our proposed models outperform existing models in terms of controlling type 1 error rates and enhancing statistical power. Furthermore, we apply the proposed models to real data, and report significant GxE effects. Specifically, we highlight the impact of our models on both quantitative and binary traits. For quantitative traits, we uncover the GxE modulation of genetic effects on body mass index by alcohol intake frequency. In the case of binary traits, we identify the GxE modulation of genetic effects on hypertension by waist-to-hip ratio. These findings underscore the importance of employing a robust model that effectively controls type 1 error rates, thus preventing the occurrence of spurious GxE signals. To facilitate the implementation of our approach, we have developed an innovative R software package called GxEprs, specifically designed to detect and estimate GxE effects. Overall, our study highlights the importance of accurate GxE modeling and its implications for genetic risk prediction, although providing a practical tool to support further research in this area.

多基因风险评分(PRS)模型的使用改变了遗传学领域,它可以根据个体的遗传特征预测复杂的性状和疾病。然而,基因型-环境交互作用(GxE)对多基因风险评分模型的性能和适用性的影响仍然是一个有待探索的重要方面。目前,现有的基因型-环境交互作用多基因风险评分(GxE PRS)模型经常被不恰当地使用,这可能会导致1型错误率升高,结果大打折扣。在本研究中,我们提出了新的 GxE PRS 模型,该模型联合了加性遗传效应和交互遗传效应,同时还包括一个额外的二次项,用于非遗传协变量,从而增强了模型对模型错误规范的稳健性。通过大量模拟,我们证明我们提出的模型在控制 1 类错误率和提高统计能力方面优于现有模型。此外,我们将提出的模型应用于真实数据,并报告了显著的 GxE 效果。具体来说,我们强调了我们的模型对定量特征和二元特征的影响。在数量性状方面,我们发现了酒精摄入频率对体重指数遗传效应的 GxE 调节作用。在二元性状方面,我们发现了腰臀比对高血压遗传效应的 GxE 调节作用。这些发现强调了采用稳健模型的重要性,该模型可有效控制 1 类错误率,从而防止出现虚假的 GxE 信号。为了便于实施我们的方法,我们开发了一个名为 GxEprs 的创新 R 软件包,专门用于检测和估计 GxE 效应。总之,我们的研究强调了准确 GxE 建模的重要性及其对遗传风险预测的影响,同时提供了一种实用工具来支持该领域的进一步研究。
{"title":"Mitigating type 1 error inflation and power loss in GxE PRS: Genotype–environment interaction in polygenic risk score models","authors":"Dovini Jayasinghe,&nbsp;Md. Moksedul Momin,&nbsp;Kerri Beckmann,&nbsp;Elina Hyppönen,&nbsp;Beben Benyamin,&nbsp;S. Hong Lee","doi":"10.1002/gepi.22546","DOIUrl":"10.1002/gepi.22546","url":null,"abstract":"<p>The use of polygenic risk score (PRS) models has transformed the field of genetics by enabling the prediction of complex traits and diseases based on an individual's genetic profile. However, the impact of genotype–environment interaction (GxE) on the performance and applicability of PRS models remains a crucial aspect to be explored. Currently, existing genotype–environment interaction polygenic risk score (GxE PRS) models are often inappropriately used, which can result in inflated type 1 error rates and compromised results. In this study, we propose novel GxE PRS models that jointly incorporate additive and interaction genetic effects although also including an additional quadratic term for nongenetic covariates, enhancing their robustness against model misspecification. Through extensive simulations, we demonstrate that our proposed models outperform existing models in terms of controlling type 1 error rates and enhancing statistical power. Furthermore, we apply the proposed models to real data, and report significant GxE effects. Specifically, we highlight the impact of our models on both quantitative and binary traits. For quantitative traits, we uncover the GxE modulation of genetic effects on body mass index by alcohol intake frequency. In the case of binary traits, we identify the GxE modulation of genetic effects on hypertension by waist-to-hip ratio. These findings underscore the importance of employing a robust model that effectively controls type 1 error rates, thus preventing the occurrence of spurious GxE signals. To facilitate the implementation of our approach, we have developed an innovative R software package called GxEprs, specifically designed to detect and estimate GxE effects. Overall, our study highlights the importance of accurate GxE modeling and its implications for genetic risk prediction, although providing a practical tool to support further research in this area.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 2","pages":"85-100"},"PeriodicalIF":2.1,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22546","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139671559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interval estimate of causal effect in summary data based Mendelian randomization in the presence of winner's curse 在存在赢家诅咒的情况下,基于孟德尔随机化的汇总数据中因果效应的区间估计。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-01-28 DOI: 10.1002/gepi.22545
Kai Wang

This research focuses on the interval estimation of the causal effect of an exposure on an outcome using the summary data-based Mendelian randomization (SMR) method while accounting for the winner's curse caused by the selection of single nucleotide polymorphism instruments. This issue is understudied and is important as the point estimate is biased. Since Fieller's theorem and its variations are not suitable for constructing a confidence interval, we use the box method. This box method is known to be conservative and thus provides a lower bound on the coverage level. To assess the performance of the box method, we use simulation studies and compare it with the support interval we proposed earlier and the Wald interval derived from the SMR method. All three methods are applied to a study of causal genes for Alzheimer's disease. Overall, the box method presents an alternative for constructing interval estimates for a causal effect while addressing the winner's curse issue.

这项研究的重点是利用基于汇总数据的孟德尔随机化(SMR)方法,对暴露对结果的因果效应进行区间估计,同时考虑到单核苷酸多态性工具的选择所导致的赢家诅咒。这一问题研究不足,但由于点估计值存在偏差,因此非常重要。由于 Fieller 定理及其变式不适合构建置信区间,我们采用了盒式方法。众所周知,方框法是一种保守的方法,因此可以为覆盖水平提供一个下限。为了评估方框法的性能,我们使用了模拟研究,并将其与我们之前提出的支持区间和由 SMR 方法得出的 Wald 区间进行了比较。这三种方法都应用于阿尔茨海默病因果基因的研究。总之,方框方法为构建因果效应的区间估计值提供了一种替代方法,同时解决了赢家诅咒问题。
{"title":"Interval estimate of causal effect in summary data based Mendelian randomization in the presence of winner's curse","authors":"Kai Wang","doi":"10.1002/gepi.22545","DOIUrl":"10.1002/gepi.22545","url":null,"abstract":"<p>This research focuses on the interval estimation of the causal effect of an exposure on an outcome using the summary data-based Mendelian randomization (SMR) method while accounting for the winner's curse caused by the selection of single nucleotide polymorphism instruments. This issue is understudied and is important as the point estimate is biased. Since Fieller's theorem and its variations are not suitable for constructing a confidence interval, we use the box method. This box method is known to be conservative and thus provides a lower bound on the coverage level. To assess the performance of the box method, we use simulation studies and compare it with the support interval we proposed earlier and the Wald interval derived from the SMR method. All three methods are applied to a study of causal genes for Alzheimer's disease. Overall, the box method presents an alternative for constructing interval estimates for a causal effect while addressing the winner's curse issue.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 2","pages":"74-84"},"PeriodicalIF":2.1,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22545","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139570440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
simmrd: An open-source tool to perform simulations in Mendelian randomization simmrd:用于执行孟德尔随机化模拟的开源工具。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-01-23 DOI: 10.1002/gepi.22544
Noah Lorincz-Comi, Yihe Yang, Xiaofeng Zhu

Mendelian randomization (MR) has become a popular tool for inferring causality of risk factors on disease. There are currently over 45 different methods available to perform MR, reflecting this extremely active research area. It would be desirable to have a standard simulation environment to objectively evaluate the existing and future methods. We present simmrd, an open-source software for performing simulations to evaluate the performance of MR methods in a range of scenarios encountered in practice. Researchers can directly modify the simmrd source code so that the research community may arrive at a widely accepted framework for researchers to evaluate the performance of different MR methods.

孟德尔随机化(MR)已成为推断疾病风险因素因果关系的常用工具。目前有超过 45 种不同的方法可用于执行 MR,反映出这一研究领域极为活跃。我们希望有一个标准的模拟环境来客观地评估现有和未来的方法。我们推出的 simmrd 是一款开源软件,用于进行模拟,评估 MR 方法在实际应用中的各种情况下的性能。研究人员可以直接修改 simmrd 的源代码,这样研究界就可以形成一个广为接受的框架,供研究人员评估不同磁共振方法的性能。
{"title":"simmrd: An open-source tool to perform simulations in Mendelian randomization","authors":"Noah Lorincz-Comi,&nbsp;Yihe Yang,&nbsp;Xiaofeng Zhu","doi":"10.1002/gepi.22544","DOIUrl":"10.1002/gepi.22544","url":null,"abstract":"<p>Mendelian randomization (MR) has become a popular tool for inferring causality of risk factors on disease. There are currently over 45 different methods available to perform MR, reflecting this extremely active research area. It would be desirable to have a standard simulation environment to objectively evaluate the existing and future methods. We present <span>simmrd</span>, an open-source software for performing simulations to evaluate the performance of MR methods in a range of scenarios encountered in practice. Researchers can directly modify the <span>simmrd</span> source code so that the research community may arrive at a widely accepted framework for researchers to evaluate the performance of different MR methods.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 2","pages":"59-73"},"PeriodicalIF":2.1,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22544","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139542107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DYNATE: Localizing rare-variant association regions via multiple testing embedded in an aggregation tree DYNATE:通过嵌入在聚合树中的多个测试来定位稀有的关联区域。
IF 2.1 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2023-11-28 DOI: 10.1002/gepi.22542
Xuechan Li, John Pura, Andrew Allen, Kouros Owzar, Jianfeng Lu, Matthew Harms, Jichun Xie

Rare-variants (RVs) genetic association studies enable researchers to uncover the variation in phenotypic traits left unexplained by common variation. Traditional single-variant analysis lacks power; thus, researchers have developed various methods to aggregate the effects of RVs across genomic regions to study their collective impact. Some existing methods utilize a static delineation of genomic regions, often resulting in suboptimal effect aggregation, as neutral subregions within the test region will result in an attenuation of signal. Other methods use varying windows to search for signals but often result in long regions containing many neutral RVs. To pinpoint short genomic regions enriched for disease-associated RVs, we developed a novel method, DYNamic Aggregation TEsting (DYNATE). DYNATE dynamically and hierarchically aggregates smaller genomic regions into larger ones and performs multiple testing for disease associations with a controlled weighted false discovery rate. DYNATE's main advantage lies in its strong ability to identify short genomic regions highly enriched for disease-associated RVs. Extensive numerical simulations demonstrate the superior performance of DYNATE under various scenarios compared with existing methods. We applied DYNATE to an amyotrophic lateral sclerosis study and identified a new gene, EPG5, harboring possibly pathogenic mutations.

罕见变异(RVs)遗传关联研究使研究人员能够发现常见变异无法解释的表型性状变异。传统的单变量分析缺乏效力;因此,研究人员开发了各种方法来汇总rv在基因组区域的影响,以研究它们的集体影响。一些现有的方法利用基因组区域的静态描述,通常导致次优效应聚集,因为测试区域内的中性子区域将导致信号的衰减。其他方法使用不同的窗口来搜索信号,但往往导致包含许多中性rv的长区域。为了精确定位与疾病相关的rv富集的短基因组区域,我们开发了一种新的方法,动态聚合测试(DYNATE)。DYNATE动态地、分层地将较小的基因组区域聚合为较大的基因组区域,并在控制加权错误发现率的情况下对疾病关联进行多次测试。DYNATE的主要优势在于其识别疾病相关rv高度富集的短基因组区域的强大能力。大量的数值模拟表明,与现有方法相比,DYNATE在各种场景下都具有优越的性能。我们将DYNATE应用于肌萎缩性侧索硬化症的研究中,发现了一个新的基因EPG5,该基因可能具有致病性突变。
{"title":"DYNATE: Localizing rare-variant association regions via multiple testing embedded in an aggregation tree","authors":"Xuechan Li,&nbsp;John Pura,&nbsp;Andrew Allen,&nbsp;Kouros Owzar,&nbsp;Jianfeng Lu,&nbsp;Matthew Harms,&nbsp;Jichun Xie","doi":"10.1002/gepi.22542","DOIUrl":"10.1002/gepi.22542","url":null,"abstract":"<p>Rare-variants (RVs) genetic association studies enable researchers to uncover the variation in phenotypic traits left unexplained by common variation. Traditional single-variant analysis lacks power; thus, researchers have developed various methods to aggregate the effects of RVs across genomic regions to study their collective impact. Some existing methods utilize a static delineation of genomic regions, often resulting in suboptimal effect aggregation, as neutral subregions within the test region will result in an attenuation of signal. Other methods use varying windows to search for signals but often result in long regions containing many neutral RVs. To pinpoint short genomic regions enriched for disease-associated RVs, we developed a novel method, DYNamic Aggregation TEsting (DYNATE). DYNATE dynamically and hierarchically aggregates smaller genomic regions into larger ones and performs multiple testing for disease associations with a controlled weighted false discovery rate. DYNATE's main advantage lies in its strong ability to identify short genomic regions highly enriched for disease-associated RVs. Extensive numerical simulations demonstrate the superior performance of DYNATE under various scenarios compared with existing methods. We applied DYNATE to an amyotrophic lateral sclerosis study and identified a new gene, <i>EPG5</i>, harboring possibly pathogenic mutations.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 1","pages":"42-55"},"PeriodicalIF":2.1,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138444394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genetic Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1