Genome-wide association studies (GWAS) have provided an abundance of information about the genetic variants and their loci that are associated to complex traits and diseases. However, due to linkage disequilibrium (LD) and noncoding regions of loci, it remains a challenge to pinpoint the causal genes. Gene network-based approaches, paired with network diffusion methods, have been proposed to prioritize causal genes and to boost statistical power in GWAS based on the assumption that trait-associated genes are clustered in a gene network. Due to the difficulty in mapping trait-associated variants to genes in GWAS, this assumption has never been directly or rigorously tested empirically. On the other hand, whole exome sequencing (WES) data focuses on the protein-coding regions, directly identifying trait-associated genes. In this study, we tested the assumption by leveraging the recently available exome-based association statistics from the UK Biobank WES data along with two types of networks. We found that almost all trait-associated genes were significantly more proximal to each other than randomly selected genes within both networks. These results support the assumption that trait-associated genes are clustered in gene networks, which can be further leveraged to boost the power of GWAS such as by introducing less stringent p value thresholds.
{"title":"Are trait-associated genes clustered together in a gene network?","authors":"Hyun Jung Koo, Wei Pan","doi":"10.1002/gepi.22557","DOIUrl":"10.1002/gepi.22557","url":null,"abstract":"<p>Genome-wide association studies (GWAS) have provided an abundance of information about the genetic variants and their loci that are associated to complex traits and diseases. However, due to linkage disequilibrium (LD) and noncoding regions of loci, it remains a challenge to pinpoint the causal genes. Gene network-based approaches, paired with network diffusion methods, have been proposed to prioritize causal genes and to boost statistical power in GWAS based on the assumption that trait-associated genes are clustered in a gene network. Due to the difficulty in mapping trait-associated variants to genes in GWAS, this assumption has never been directly or rigorously tested empirically. On the other hand, whole exome sequencing (WES) data focuses on the protein-coding regions, directly identifying trait-associated genes. In this study, we tested the assumption by leveraging the recently available exome-based association statistics from the UK Biobank WES data along with two types of networks. We found that almost all trait-associated genes were significantly more proximal to each other than randomly selected genes within both networks. These results support the assumption that trait-associated genes are clustered in gene networks, which can be further leveraged to boost the power of GWAS such as by introducing less stringent <i>p</i> value thresholds.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 5","pages":"203-213"},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22557","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140109761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gene–environment (GxE) interactions play a crucial role in understanding the complex etiology of various traits, but assessing them using observational data can be challenging due to unmeasured confounders for lifestyle and environmental risk factors. Mendelian randomization (MR) has emerged as a valuable method for assessing causal relationships based on observational data. This approach utilizes genetic variants as instrumental variables (IVs) with the aim of providing a valid statistical test and estimation of causal effects in the presence of unmeasured confounders. MR has gained substantial popularity in recent years largely due to the success of genome-wide association studies. Many methods have been developed for MR; however, limited work has been done on evaluating GxE interaction. In this paper, we focus on two primary IV approaches: the two-stage predictor substitution and the two-stage residual inclusion, and extend them to accommodate GxE interaction under both the linear and logistic regression models for continuous and binary outcomes, respectively. Comprehensive simulation study and analytical derivations reveal that resolving the linear regression model is relatively straightforward. In contrast, the logistic regression model presents a considerably more intricate challenge, which demands additional effort.
基因与环境(GxE)之间的相互作用在了解各种性状的复杂病因方面起着至关重要的作用,但由于生活方式和环境风险因素的混杂因素无法测量,因此利用观察数据评估这些相互作用具有挑战性。孟德尔随机化(MR)已成为一种基于观察数据评估因果关系的重要方法。这种方法利用遗传变异作为工具变量(IV),目的是在存在未测量混杂因素的情况下提供有效的统计检验和因果效应估计。近年来,主要由于全基因组关联研究的成功,MR 得到了广泛的推广。目前已开发出许多 MR 方法,但评估 GxE 相互作用的工作还很有限。在本文中,我们重点讨论了两种主要的 IV 方法:两阶段预测因子替换法和两阶段残差包含法,并将它们扩展到线性回归模型和逻辑回归模型下,分别用于连续结果和二元结果的 GxE 交互作用。综合模拟研究和分析推导表明,线性回归模型的解析相对简单。相比之下,逻辑回归模型面临的挑战要复杂得多,需要付出更多的努力。
{"title":"Unveiling challenges in Mendelian randomization for gene–environment interaction","authors":"Malka Gorfine, Conghui Qu, Ulrike Peters, Li Hsu","doi":"10.1002/gepi.22552","DOIUrl":"10.1002/gepi.22552","url":null,"abstract":"<p>Gene–environment (GxE) interactions play a crucial role in understanding the complex etiology of various traits, but assessing them using observational data can be challenging due to unmeasured confounders for lifestyle and environmental risk factors. Mendelian randomization (MR) has emerged as a valuable method for assessing causal relationships based on observational data. This approach utilizes genetic variants as instrumental variables (IVs) with the aim of providing a valid statistical test and estimation of causal effects in the presence of unmeasured confounders. MR has gained substantial popularity in recent years largely due to the success of genome-wide association studies. Many methods have been developed for MR; however, limited work has been done on evaluating GxE interaction. In this paper, we focus on two primary IV approaches: the two-stage predictor substitution and the two-stage residual inclusion, and extend them to accommodate GxE interaction under both the linear and logistic regression models for continuous and binary outcomes, respectively. Comprehensive simulation study and analytical derivations reveal that resolving the linear regression model is relatively straightforward. In contrast, the logistic regression model presents a considerably more intricate challenge, which demands additional effort.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 4","pages":"164-189"},"PeriodicalIF":2.1,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139989797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashish Patel, Dipender Gill, Dmitry Shungin, Christos S. Mantzoros, Lotte Bjerre Knudsen, Jack Bowden, Stephen Burgess
Phenotypic heterogeneity at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight into the pathways by which pharmacological interventions may affect disease risk. However, statistical inference in such investigations may be poor if overdispersion heterogeneity in measured genetic associations is unaccounted for. In this work, we first develop conditional F statistics for dimension-reduced genetic associations that enable more accurate measurement of phenotypic heterogeneity. We then develop a novel extension for two-sample multivariable Mendelian randomization that accounts for overdispersion heterogeneity in dimension-reduced genetic associations. Our empirical focus is to use genetic variants in the GLP1R gene region to understand the mechanism by which GLP1R agonism affects coronary artery disease (CAD) risk. Colocalization analyses indicate that distinct variants in the GLP1R gene region are associated with body mass index and type 2 diabetes (T2D). Multivariable Mendelian randomization analyses that were corrected for overdispersion heterogeneity suggest that bodyweight lowering rather than T2D liability lowering effects of GLP1R agonism are more likely contributing to reduced CAD risk. Tissue-specific analyses prioritized brain tissue as the most likely to be relevant for CAD risk, of the tissues considered. We hope the multivariable Mendelian randomization approach illustrated here is widely applicable to better understand mechanisms linking drug targets to diseases outcomes, and hence to guide drug development efforts.
多变量孟德尔随机化法可以利用编码药物靶点的基因组位点的表型异质性,深入了解药物干预可能影响疾病风险的途径。然而,如果不考虑测得的遗传关联中的过度分散异质性,此类研究的统计推断可能会很差。在这项工作中,我们首先开发了降维遗传关联的条件 F 统计量,从而能够更准确地测量表型异质性。然后,我们为双样本多变量孟德尔随机化开发了一种新的扩展方法,以考虑降维遗传关联中的过度分散异质性。我们的实证重点是利用 GLP1R 基因区域的遗传变异来了解 GLP1R 激动作用影响冠状动脉疾病(CAD)风险的机制。共定位分析表明,GLP1R 基因区的不同变异与体重指数和 2 型糖尿病(T2D)有关。校正了过度分散异质性的多变量孟德尔随机分析表明,GLP1R激动剂降低体重而非T2D责任的作用更有可能降低CAD风险。组织特异性分析认为,在所考虑的组织中,脑组织最有可能与冠心病风险相关。我们希望本文介绍的多变量孟德尔随机化方法能广泛应用于更好地理解药物靶点与疾病结果之间的关联机制,从而指导药物开发工作。
{"title":"Robust use of phenotypic heterogeneity at drug target genes for mechanistic insights: Application of cis-multivariable Mendelian randomization to GLP1R gene region","authors":"Ashish Patel, Dipender Gill, Dmitry Shungin, Christos S. Mantzoros, Lotte Bjerre Knudsen, Jack Bowden, Stephen Burgess","doi":"10.1002/gepi.22551","DOIUrl":"10.1002/gepi.22551","url":null,"abstract":"<p>Phenotypic heterogeneity at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight into the pathways by which pharmacological interventions may affect disease risk. However, statistical inference in such investigations may be poor if overdispersion heterogeneity in measured genetic associations is unaccounted for. In this work, we first develop conditional <i>F</i> statistics for dimension-reduced genetic associations that enable more accurate measurement of phenotypic heterogeneity. We then develop a novel extension for two-sample multivariable Mendelian randomization that accounts for overdispersion heterogeneity in dimension-reduced genetic associations. Our empirical focus is to use genetic variants in the <i>GLP1R</i> gene region to understand the mechanism by which GLP1R agonism affects coronary artery disease (CAD) risk. Colocalization analyses indicate that distinct variants in the <i>GLP1R</i> gene region are associated with body mass index and type 2 diabetes (T2D). Multivariable Mendelian randomization analyses that were corrected for overdispersion heterogeneity suggest that bodyweight lowering rather than T2D liability lowering effects of GLP1R agonism are more likely contributing to reduced CAD risk. Tissue-specific analyses prioritized brain tissue as the most likely to be relevant for CAD risk, of the tissues considered. We hope the multivariable Mendelian randomization approach illustrated here is widely applicable to better understand mechanisms linking drug targets to diseases outcomes, and hence to guide drug development efforts.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 4","pages":"151-163"},"PeriodicalIF":2.1,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139912418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Individual probabilistic assessments on the risk of cancer, primary or secondary, will not be understood by most patients. That is the essence of our arguments in this paper. Greater understanding can be achieved by extensive, intensive, and detailed counseling. But since probability itself is a concept that easily escapes our everyday intuition—consider the famous Monte Hall paradox—then it would also be wise to advise patients and potential patients, to not put undue weight on any probabilistic assessment. Such assessments can be of value to the epidemiologist in the investigation of different potential etiologies describing cancer evolution or to the clinical trialist as a way to maximize design efficiency. But to an ordinary individual we cannot anticipate that these assessments will be correctly interpreted.
{"title":"Making sense of breast cancer risk estimates","authors":"John O'Quigley","doi":"10.1002/gepi.22550","DOIUrl":"10.1002/gepi.22550","url":null,"abstract":"<p>Individual probabilistic assessments on the risk of cancer, primary or secondary, will not be understood by most patients. That is the essence of our arguments in this paper. Greater understanding can be achieved by extensive, intensive, and detailed counseling. But since probability itself is a concept that easily escapes our everyday intuition—consider the famous Monte Hall paradox—then it would also be wise to advise patients and potential patients, to not put undue weight on any probabilistic assessment. Such assessments can be of value to the epidemiologist in the investigation of different potential etiologies describing cancer evolution or to the clinical trialist as a way to maximize design efficiency. But to an ordinary individual we cannot anticipate that these assessments will be correctly interpreted.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 3","pages":"141-147"},"PeriodicalIF":2.1,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139706548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advancements in high-throughput genomic technologies have revolutionized the field of disease biomarker identification by providing large-scale genomic data. There is an increasing focus on understanding the relationships among diverse patient groups with distinct disease subtypes and characteristics. Complex diseases exhibit both heterogeneity and shared genomic factors, making it essential to investigate these patterns to accurately detect markers and comprehensively understand the diseases. Integrative analysis has emerged as a promising approach to address this challenge. However, existing studies have been limited by ignoring the adjacency structure of genomic measurements, such as single nucleotide polymorphisms (SNPs) and DNA methylations. In this study, we propose a structured integrative analysis method that incorporates a spline type penalty to accommodate this adjacency structure. We utilize a fused lasso type penalty to identify both heterogeneity and commonality across the groups. Extensive simulations demonstrate its superiority compared to several direct competing methods. The analysis of The Cancer Genome Atlas melanoma data with DNA methylation measurements and GENEVA diabetes data with SNP measurements exhibit that the proposed analysis lead to meaningful findings with better prediction performance and higher selection stability.
高通量基因组技术的进步提供了大规模的基因组数据,从而彻底改变了疾病生物标志物鉴定领域。人们越来越重视了解具有不同疾病亚型和特征的不同患者群体之间的关系。复杂的疾病既有异质性,也有共同的基因组因素,因此必须研究这些模式,以准确检测标记物,全面了解疾病。整合分析已成为应对这一挑战的一种有前途的方法。然而,现有的研究由于忽略了单核苷酸多态性(SNP)和 DNA 甲基化等基因组测量的邻接结构而受到限制。在本研究中,我们提出了一种结构化综合分析方法,该方法结合了样条线型惩罚,以适应这种邻接结构。我们利用融合套索型惩罚来识别各组间的异质性和共性。大量的模拟证明,与几种直接竞争的方法相比,这种方法更胜一筹。对癌症基因组图谱黑色素瘤数据(DNA 甲基化测量)和 GENEVA 糖尿病数据(SNP 测量)的分析表明,所提出的分析方法具有更好的预测性能和更高的选择稳定性,能带来有意义的发现。
{"title":"Revealing genomic heterogeneity and commonality: A penalized integrative analysis approach accounting for the adjacency structure of measurements","authors":"Xindi Wang, Yu Jiang, Yifan Sun","doi":"10.1002/gepi.22549","DOIUrl":"10.1002/gepi.22549","url":null,"abstract":"<p>Advancements in high-throughput genomic technologies have revolutionized the field of disease biomarker identification by providing large-scale genomic data. There is an increasing focus on understanding the relationships among diverse patient groups with distinct disease subtypes and characteristics. Complex diseases exhibit both heterogeneity and shared genomic factors, making it essential to investigate these patterns to accurately detect markers and comprehensively understand the diseases. Integrative analysis has emerged as a promising approach to address this challenge. However, existing studies have been limited by ignoring the adjacency structure of genomic measurements, such as single nucleotide polymorphisms (SNPs) and DNA methylations. In this study, we propose a structured integrative analysis method that incorporates a spline type penalty to accommodate this adjacency structure. We utilize a fused lasso type penalty to identify both heterogeneity and commonality across the groups. Extensive simulations demonstrate its superiority compared to several direct competing methods. The analysis of The Cancer Genome Atlas melanoma data with DNA methylation measurements and GENEVA diabetes data with SNP measurements exhibit that the proposed analysis lead to meaningful findings with better prediction performance and higher selection stability.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 3","pages":"114-140"},"PeriodicalIF":2.1,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139691643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Wang, Xiao Xu, Ming Li, Xiang-Yang Lou, Siqi Xu, Baolin Wu, Guimin Gao, Ping Yin, Nianjun Liu
Genome-wide association studies (GWAS) have led to rapid growth in detecting genetic variants associated with various phenotypes. Owing to a great number of publicly accessible GWAS summary statistics, and the difficulty in obtaining individual-level genotype data, many existing gene-based association tests have been adapted to require only GWAS summary statistics rather than individual-level data. However, these association tests are restricted to unrelated individuals and thus do not apply to family samples directly. Moreover, due to its flexibility and effectiveness, the linear mixed model has been increasingly utilized in GWAS to handle correlated data, such as family samples. However, it remains unknown how to perform gene-based association tests in family samples using the GWAS summary statistics estimated from the linear mixed model. In this study, we show that, when family size is negligible compared to the total sample size, the diagonal block structure of the kinship matrix makes it possible to approximate the correlation matrix of marginal Z scores by linkage disequilibrium matrix. Based on this result, current methods utilizing summary statistics for unrelated individuals can be directly applied to family data without any modifications. Our simulation results demonstrate that this proposed strategy controls the type 1 error rate well in various situations. Finally, we exemplify the usefulness of the proposed approach with a dental caries GWAS data set.
全基因组关联研究(GWAS)在检测与各种表型相关的基因变异方面发展迅速。由于有大量可公开获取的全基因组关联研究摘要统计数据,而获取个体水平的基因型数据又十分困难,因此许多现有的基于基因的关联检验已被调整为只需要全基因组关联研究摘要统计数据,而不需要个体水平的数据。然而,这些关联检验仅限于非相关个体,因此不能直接应用于家族样本。此外,线性混合模型因其灵活性和有效性,越来越多地被用于 GWAS,以处理家族样本等相关数据。然而,如何利用线性混合模型估算出的 GWAS 概要统计量在家族样本中进行基于基因的关联检验仍是一个未知数。在本研究中,我们发现当家族规模与总样本规模相比可以忽略不计时,亲缘关系矩阵的对角块结构可以通过连锁不平衡矩阵近似得到边际 Z 分数的相关矩阵。基于这一结果,目前利用非亲属关系个体汇总统计的方法可以直接应用于家族数据,无需做任何修改。我们的模拟结果表明,所提出的这一策略在各种情况下都能很好地控制类型 1 错误率。最后,我们用一个龋齿 GWAS 数据集举例说明了所提方法的实用性。
{"title":"Gene-based association tests in family samples using GWAS summary statistics","authors":"Peng Wang, Xiao Xu, Ming Li, Xiang-Yang Lou, Siqi Xu, Baolin Wu, Guimin Gao, Ping Yin, Nianjun Liu","doi":"10.1002/gepi.22548","DOIUrl":"10.1002/gepi.22548","url":null,"abstract":"<p>Genome-wide association studies (GWAS) have led to rapid growth in detecting genetic variants associated with various phenotypes. Owing to a great number of publicly accessible GWAS summary statistics, and the difficulty in obtaining individual-level genotype data, many existing gene-based association tests have been adapted to require only GWAS summary statistics rather than individual-level data. However, these association tests are restricted to unrelated individuals and thus do not apply to family samples directly. Moreover, due to its flexibility and effectiveness, the linear mixed model has been increasingly utilized in GWAS to handle correlated data, such as family samples. However, it remains unknown how to perform gene-based association tests in family samples using the GWAS summary statistics estimated from the linear mixed model. In this study, we show that, when family size is negligible compared to the total sample size, the diagonal block structure of the kinship matrix makes it possible to approximate the correlation matrix of marginal <i>Z</i> scores by linkage disequilibrium matrix. Based on this result, current methods utilizing summary statistics for unrelated individuals can be directly applied to family data without any modifications. Our simulation results demonstrate that this proposed strategy controls the type 1 error rate well in various situations. Finally, we exemplify the usefulness of the proposed approach with a dental caries GWAS data set.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 3","pages":"103-113"},"PeriodicalIF":2.1,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22548","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139691642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dovini Jayasinghe, Md. Moksedul Momin, Kerri Beckmann, Elina Hyppönen, Beben Benyamin, S. Hong Lee
The use of polygenic risk score (PRS) models has transformed the field of genetics by enabling the prediction of complex traits and diseases based on an individual's genetic profile. However, the impact of genotype–environment interaction (GxE) on the performance and applicability of PRS models remains a crucial aspect to be explored. Currently, existing genotype–environment interaction polygenic risk score (GxE PRS) models are often inappropriately used, which can result in inflated type 1 error rates and compromised results. In this study, we propose novel GxE PRS models that jointly incorporate additive and interaction genetic effects although also including an additional quadratic term for nongenetic covariates, enhancing their robustness against model misspecification. Through extensive simulations, we demonstrate that our proposed models outperform existing models in terms of controlling type 1 error rates and enhancing statistical power. Furthermore, we apply the proposed models to real data, and report significant GxE effects. Specifically, we highlight the impact of our models on both quantitative and binary traits. For quantitative traits, we uncover the GxE modulation of genetic effects on body mass index by alcohol intake frequency. In the case of binary traits, we identify the GxE modulation of genetic effects on hypertension by waist-to-hip ratio. These findings underscore the importance of employing a robust model that effectively controls type 1 error rates, thus preventing the occurrence of spurious GxE signals. To facilitate the implementation of our approach, we have developed an innovative R software package called GxEprs, specifically designed to detect and estimate GxE effects. Overall, our study highlights the importance of accurate GxE modeling and its implications for genetic risk prediction, although providing a practical tool to support further research in this area.
{"title":"Mitigating type 1 error inflation and power loss in GxE PRS: Genotype–environment interaction in polygenic risk score models","authors":"Dovini Jayasinghe, Md. Moksedul Momin, Kerri Beckmann, Elina Hyppönen, Beben Benyamin, S. Hong Lee","doi":"10.1002/gepi.22546","DOIUrl":"10.1002/gepi.22546","url":null,"abstract":"<p>The use of polygenic risk score (PRS) models has transformed the field of genetics by enabling the prediction of complex traits and diseases based on an individual's genetic profile. However, the impact of genotype–environment interaction (GxE) on the performance and applicability of PRS models remains a crucial aspect to be explored. Currently, existing genotype–environment interaction polygenic risk score (GxE PRS) models are often inappropriately used, which can result in inflated type 1 error rates and compromised results. In this study, we propose novel GxE PRS models that jointly incorporate additive and interaction genetic effects although also including an additional quadratic term for nongenetic covariates, enhancing their robustness against model misspecification. Through extensive simulations, we demonstrate that our proposed models outperform existing models in terms of controlling type 1 error rates and enhancing statistical power. Furthermore, we apply the proposed models to real data, and report significant GxE effects. Specifically, we highlight the impact of our models on both quantitative and binary traits. For quantitative traits, we uncover the GxE modulation of genetic effects on body mass index by alcohol intake frequency. In the case of binary traits, we identify the GxE modulation of genetic effects on hypertension by waist-to-hip ratio. These findings underscore the importance of employing a robust model that effectively controls type 1 error rates, thus preventing the occurrence of spurious GxE signals. To facilitate the implementation of our approach, we have developed an innovative R software package called GxEprs, specifically designed to detect and estimate GxE effects. Overall, our study highlights the importance of accurate GxE modeling and its implications for genetic risk prediction, although providing a practical tool to support further research in this area.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 2","pages":"85-100"},"PeriodicalIF":2.1,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22546","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139671559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This research focuses on the interval estimation of the causal effect of an exposure on an outcome using the summary data-based Mendelian randomization (SMR) method while accounting for the winner's curse caused by the selection of single nucleotide polymorphism instruments. This issue is understudied and is important as the point estimate is biased. Since Fieller's theorem and its variations are not suitable for constructing a confidence interval, we use the box method. This box method is known to be conservative and thus provides a lower bound on the coverage level. To assess the performance of the box method, we use simulation studies and compare it with the support interval we proposed earlier and the Wald interval derived from the SMR method. All three methods are applied to a study of causal genes for Alzheimer's disease. Overall, the box method presents an alternative for constructing interval estimates for a causal effect while addressing the winner's curse issue.
{"title":"Interval estimate of causal effect in summary data based Mendelian randomization in the presence of winner's curse","authors":"Kai Wang","doi":"10.1002/gepi.22545","DOIUrl":"10.1002/gepi.22545","url":null,"abstract":"<p>This research focuses on the interval estimation of the causal effect of an exposure on an outcome using the summary data-based Mendelian randomization (SMR) method while accounting for the winner's curse caused by the selection of single nucleotide polymorphism instruments. This issue is understudied and is important as the point estimate is biased. Since Fieller's theorem and its variations are not suitable for constructing a confidence interval, we use the box method. This box method is known to be conservative and thus provides a lower bound on the coverage level. To assess the performance of the box method, we use simulation studies and compare it with the support interval we proposed earlier and the Wald interval derived from the SMR method. All three methods are applied to a study of causal genes for Alzheimer's disease. Overall, the box method presents an alternative for constructing interval estimates for a causal effect while addressing the winner's curse issue.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 2","pages":"74-84"},"PeriodicalIF":2.1,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22545","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139570440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mendelian randomization (MR) has become a popular tool for inferring causality of risk factors on disease. There are currently over 45 different methods available to perform MR, reflecting this extremely active research area. It would be desirable to have a standard simulation environment to objectively evaluate the existing and future methods. We present simmrd, an open-source software for performing simulations to evaluate the performance of MR methods in a range of scenarios encountered in practice. Researchers can directly modify the simmrd source code so that the research community may arrive at a widely accepted framework for researchers to evaluate the performance of different MR methods.
{"title":"simmrd: An open-source tool to perform simulations in Mendelian randomization","authors":"Noah Lorincz-Comi, Yihe Yang, Xiaofeng Zhu","doi":"10.1002/gepi.22544","DOIUrl":"10.1002/gepi.22544","url":null,"abstract":"<p>Mendelian randomization (MR) has become a popular tool for inferring causality of risk factors on disease. There are currently over 45 different methods available to perform MR, reflecting this extremely active research area. It would be desirable to have a standard simulation environment to objectively evaluate the existing and future methods. We present <span>simmrd</span>, an open-source software for performing simulations to evaluate the performance of MR methods in a range of scenarios encountered in practice. Researchers can directly modify the <span>simmrd</span> source code so that the research community may arrive at a widely accepted framework for researchers to evaluate the performance of different MR methods.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 2","pages":"59-73"},"PeriodicalIF":2.1,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22544","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139542107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuechan Li, John Pura, Andrew Allen, Kouros Owzar, Jianfeng Lu, Matthew Harms, Jichun Xie
Rare-variants (RVs) genetic association studies enable researchers to uncover the variation in phenotypic traits left unexplained by common variation. Traditional single-variant analysis lacks power; thus, researchers have developed various methods to aggregate the effects of RVs across genomic regions to study their collective impact. Some existing methods utilize a static delineation of genomic regions, often resulting in suboptimal effect aggregation, as neutral subregions within the test region will result in an attenuation of signal. Other methods use varying windows to search for signals but often result in long regions containing many neutral RVs. To pinpoint short genomic regions enriched for disease-associated RVs, we developed a novel method, DYNamic Aggregation TEsting (DYNATE). DYNATE dynamically and hierarchically aggregates smaller genomic regions into larger ones and performs multiple testing for disease associations with a controlled weighted false discovery rate. DYNATE's main advantage lies in its strong ability to identify short genomic regions highly enriched for disease-associated RVs. Extensive numerical simulations demonstrate the superior performance of DYNATE under various scenarios compared with existing methods. We applied DYNATE to an amyotrophic lateral sclerosis study and identified a new gene, EPG5, harboring possibly pathogenic mutations.
{"title":"DYNATE: Localizing rare-variant association regions via multiple testing embedded in an aggregation tree","authors":"Xuechan Li, John Pura, Andrew Allen, Kouros Owzar, Jianfeng Lu, Matthew Harms, Jichun Xie","doi":"10.1002/gepi.22542","DOIUrl":"10.1002/gepi.22542","url":null,"abstract":"<p>Rare-variants (RVs) genetic association studies enable researchers to uncover the variation in phenotypic traits left unexplained by common variation. Traditional single-variant analysis lacks power; thus, researchers have developed various methods to aggregate the effects of RVs across genomic regions to study their collective impact. Some existing methods utilize a static delineation of genomic regions, often resulting in suboptimal effect aggregation, as neutral subregions within the test region will result in an attenuation of signal. Other methods use varying windows to search for signals but often result in long regions containing many neutral RVs. To pinpoint short genomic regions enriched for disease-associated RVs, we developed a novel method, DYNamic Aggregation TEsting (DYNATE). DYNATE dynamically and hierarchically aggregates smaller genomic regions into larger ones and performs multiple testing for disease associations with a controlled weighted false discovery rate. DYNATE's main advantage lies in its strong ability to identify short genomic regions highly enriched for disease-associated RVs. Extensive numerical simulations demonstrate the superior performance of DYNATE under various scenarios compared with existing methods. We applied DYNATE to an amyotrophic lateral sclerosis study and identified a new gene, <i>EPG5</i>, harboring possibly pathogenic mutations.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 1","pages":"42-55"},"PeriodicalIF":2.1,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138444394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}