Jiayi Shen, Lai Jiang, Kan Wang, Anqi Wang, Fei Chen, Paul J. Newcombe, Christopher A. Haiman, David V. Conti
Recent advancement in genome-wide association studies (GWAS) comes from not only increasingly larger sample sizes but also the shift in focus towards underrepresented populations. Multipopulation GWAS increase power to detect novel risk variants and improve fine-mapping resolution by leveraging evidence and differences in linkage disequilibrium (LD) from diverse populations. Here, we expand upon our previous approach for single-population fine-mapping through Joint Analysis of Marginal SNP Effects (JAM) to a multipopulation analysis (mJAM). Under the assumption that true causal variants are common across studies, we implement a hierarchical model framework that conditions on multiple SNPs while explicitly incorporating the different LD structures across populations. The mJAM framework can be used to first select index variants using the mJAM likelihood with different feature selection approaches. In addition, we present a novel approach leveraging the ideas of mediation to construct credible sets for these index variants. Construction of such credible sets can be performed given any existing index variants. We illustrate the implementation of the mJAM likelihood through two implementations: mJAM-SuSiE (a Bayesian approach) and mJAM-Forward selection. Through simulation studies based on realistic effect sizes and levels of LD, we demonstrated that mJAM performs well for constructing concise credible sets that include the underlying causal variants. In real data examples taken from the most recent multipopulation prostate cancer GWAS, we showed several practical advantages of mJAM over other existing multipopulation methods.
{"title":"Hierarchical joint analysis of marginal summary statistics—Part I: Multipopulation fine mapping and credible set construction","authors":"Jiayi Shen, Lai Jiang, Kan Wang, Anqi Wang, Fei Chen, Paul J. Newcombe, Christopher A. Haiman, David V. Conti","doi":"10.1002/gepi.22562","DOIUrl":"10.1002/gepi.22562","url":null,"abstract":"<p>Recent advancement in genome-wide association studies (GWAS) comes from not only increasingly larger sample sizes but also the shift in focus towards underrepresented populations. Multipopulation GWAS increase power to detect novel risk variants and improve fine-mapping resolution by leveraging evidence and differences in linkage disequilibrium (LD) from diverse populations. Here, we expand upon our previous approach for single-population fine-mapping through Joint Analysis of Marginal SNP Effects (JAM) to a multipopulation analysis (mJAM). Under the assumption that true causal variants are common across studies, we implement a hierarchical model framework that conditions on multiple SNPs while explicitly incorporating the different LD structures across populations. The mJAM framework can be used to first select index variants using the mJAM likelihood with different feature selection approaches. In addition, we present a novel approach leveraging the ideas of mediation to construct credible sets for these index variants. Construction of such credible sets can be performed given any existing index variants. We illustrate the implementation of the mJAM likelihood through two implementations: mJAM-SuSiE (a Bayesian approach) and mJAM-Forward selection. Through simulation studies based on realistic effect sizes and levels of LD, we demonstrated that mJAM performs well for constructing concise credible sets that include the underlying causal variants. In real data examples taken from the most recent multipopulation prostate cancer GWAS, we showed several practical advantages of mJAM over other existing multipopulation methods.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 6","pages":"241-257"},"PeriodicalIF":1.7,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22562","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140603455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Copy number variants (CNVs) are prevalent in the human genome and are found to have a profound effect on genomic organization and human diseases. Discovering disease-associated CNVs is critical for understanding the pathogenesis of diseases and aiding their diagnosis and treatment. However, traditional methods for assessing the association between CNVs and disease risks adopt a two-stage strategy conducting quantitative CNV measurements first and then testing for association, which may lead to biased association estimation and low statistical power, serving as a major barrier in routine genome-wide assessment of such variation. In this article, we developed One-Stage CNV–disease Association Analysis (OSCAA), a flexible algorithm to discover disease-associated CNVs for both quantitative and qualitative traits. OSCAA employs a two-dimensional Gaussian mixture model that is built upon the PCs from copy number intensities, accounting for technical biases in CNV detection while simultaneously testing for their effect on outcome traits. In OSCAA, CNVs are identified and their associations with disease risk are evaluated simultaneously in a single step, taking into account the uncertainty of CNV identification in the statistical model. Our simulations demonstrated that OSCAA outperformed the existing one-stage method and traditional two-stage methods by yielding a more accurate estimate of the CNV–disease association, especially for short CNVs or CNVs with weak signals. In conclusion, OSCAA is a powerful and flexible approach for CNV association testing with high sensitivity and specificity, which can be easily applied to different traits and clinical risk predictions.
{"title":"OSCAA: A two-dimensional Gaussian mixture model for copy number variation association analysis","authors":"Xuanxuan Yu, Xizhi Luo, Guoshuai Cai, Feifei Xiao","doi":"10.1002/gepi.22558","DOIUrl":"10.1002/gepi.22558","url":null,"abstract":"<p>Copy number variants (CNVs) are prevalent in the human genome and are found to have a profound effect on genomic organization and human diseases. Discovering disease-associated CNVs is critical for understanding the pathogenesis of diseases and aiding their diagnosis and treatment. However, traditional methods for assessing the association between CNVs and disease risks adopt a two-stage strategy conducting quantitative CNV measurements first and then testing for association, which may lead to biased association estimation and low statistical power, serving as a major barrier in routine genome-wide assessment of such variation. In this article, we developed One-Stage CNV–disease Association Analysis (OSCAA), a flexible algorithm to discover disease-associated CNVs for both quantitative and qualitative traits. OSCAA employs a two-dimensional Gaussian mixture model that is built upon the PCs from copy number intensities, accounting for technical biases in CNV detection while simultaneously testing for their effect on outcome traits. In OSCAA, CNVs are identified and their associations with disease risk are evaluated simultaneously in a single step, taking into account the uncertainty of CNV identification in the statistical model. Our simulations demonstrated that OSCAA outperformed the existing one-stage method and traditional two-stage methods by yielding a more accurate estimate of the CNV–disease association, especially for short CNVs or CNVs with weak signals. In conclusion, OSCAA is a powerful and flexible approach for CNV association testing with high sensitivity and specificity, which can be easily applied to different traits and clinical risk predictions.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 5","pages":"214-225"},"PeriodicalIF":1.7,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140293348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John L. Hopper, Shuai Li, Robert J. MacInnis, James G. Dowty, Tuong L. Nguyen, Minh Bui, Gillian S. Dite, Vivienne F. C. Esser, Zhoufeng Ye, Enes Makalic, Daniel F. Schmidt, Benjamin Goudey, Karen Alpen, Miroslaw Kapuscinski, Aung Ko Win, Pierre-Antoine Dugué, Roger L. Milne, Harindra Jayasekara, Jennifer D. Brooks, Sue Malta, Lucas Calais-Ferreira, Alexander C. Campbell, Jesse T. Young, Tu Nguyen-Dumont, Joohon Sung, Graham G. Giles, Daniel Buchanan, Ingrid Winship, Mary Beth Terry, Melissa C. Southey, Mark A. Jenkins
Young breast and bowel cancers (e.g., those diagnosed before age 40 or 50 years) have far greater morbidity and mortality in terms of years of life lost, and are increasing in incidence, but have been less studied. For breast and bowel cancers, the familial relative risks, and therefore the familial variances in age-specific log(incidence), are much greater at younger ages, but little of these familial variances has been explained. Studies of families and twins can address questions not easily answered by studies of unrelated individuals alone. We describe existing and emerging family and twin data that can provide special opportunities for discovery. We present designs and statistical analyses, including novel ideas such as the VALID (Variance in Age-specific Log Incidence Decomposition) model for causes of variation in risk, the DEPTH (DEPendency of association on the number of Top Hits) and other approaches to analyse genome-wide association study data, and the within-pair, ICE FALCON (Inference about Causation from Examining FAmiliaL CONfounding) and ICE CRISTAL (Inference about Causation from Examining Changes in Regression coefficients and Innovative STatistical AnaLysis) approaches to causation and familial confounding. Example applications to breast and colorectal cancer are presented. Motivated by the availability of the resources of the Breast and Colon Cancer Family Registries, we also present some ideas for future studies that could be applied to, and compared with, cancers diagnosed at older ages and address the challenges posed by young breast and bowel cancers.
{"title":"Breast and bowel cancers diagnosed in people ‘too young to have cancer’: A blueprint for research using family and twin studies","authors":"John L. Hopper, Shuai Li, Robert J. MacInnis, James G. Dowty, Tuong L. Nguyen, Minh Bui, Gillian S. Dite, Vivienne F. C. Esser, Zhoufeng Ye, Enes Makalic, Daniel F. Schmidt, Benjamin Goudey, Karen Alpen, Miroslaw Kapuscinski, Aung Ko Win, Pierre-Antoine Dugué, Roger L. Milne, Harindra Jayasekara, Jennifer D. Brooks, Sue Malta, Lucas Calais-Ferreira, Alexander C. Campbell, Jesse T. Young, Tu Nguyen-Dumont, Joohon Sung, Graham G. Giles, Daniel Buchanan, Ingrid Winship, Mary Beth Terry, Melissa C. Southey, Mark A. Jenkins","doi":"10.1002/gepi.22555","DOIUrl":"10.1002/gepi.22555","url":null,"abstract":"<p>Young breast and bowel cancers (e.g., those diagnosed before age 40 or 50 years) have far greater morbidity and mortality in terms of years of life lost, and are increasing in incidence, but have been less studied. For breast and bowel cancers, the familial relative risks, and therefore the familial variances in age-specific log(incidence), are much greater at younger ages, but little of these familial variances has been explained. Studies of families and twins can address questions not easily answered by studies of unrelated individuals alone. We describe existing and emerging family and twin data that can provide special opportunities for discovery. We present designs and statistical analyses, including novel ideas such as the VALID (Variance in Age-specific Log Incidence Decomposition) model for causes of variation in risk, the DEPTH (DEPendency of association on the number of Top Hits) and other approaches to analyse genome-wide association study data, and the within-pair, ICE FALCON (Inference about Causation from Examining FAmiliaL CONfounding) and ICE CRISTAL (Inference about Causation from Examining Changes in Regression coefficients and Innovative STatistical AnaLysis) approaches to causation and familial confounding. Example applications to breast and colorectal cancer are presented. Motivated by the availability of the resources of the Breast and Colon Cancer Family Registries, we also present some ideas for future studies that could be applied to, and compared with, cancers diagnosed at older ages and address the challenges posed by young breast and bowel cancers.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"433-447"},"PeriodicalIF":1.7,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22555","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victória Trindade Pons, Annique Claringbould, Priscilla Kamphuis, Albertine J. Oldehinkel, Hanna M. van Loo
We investigated indirect genetic effects (IGEs), also known as genetic nurture, in education with a novel approach that uses phased data to include parent-offspring pairs in the transmitted/nontransmitted study design. This method increases the power to detect IGEs, enhances the generalizability of the findings, and allows for the study of effects by parent-of-origin. We validated and applied this method in a family-based subsample of adolescents and adults from the Lifelines Cohort Study in the Netherlands (N = 6147), using the latest genome-wide association study data on educational attainment to construct polygenic scores (PGS). Our results indicated that IGEs play a role in education outcomes in the Netherlands: we found significant associations of the nontransmitted PGS with secondary school level in youth between 13 and 24 years old as well as with education attainment and years of education in adults over 25 years old (β = 0.14, 0.17 and 0.26, respectively), with tentative evidence for larger maternal IGEs. In conclusion, we replicated previous findings and showed that including parent-offspring pairs in addition to trios in the transmitted/nontransmitted design can benefit future studies of parental IGEs in a wide range of outcomes.
{"title":"Using parent-offspring pairs and trios to estimate indirect genetic effects in education","authors":"Victória Trindade Pons, Annique Claringbould, Priscilla Kamphuis, Albertine J. Oldehinkel, Hanna M. van Loo","doi":"10.1002/gepi.22554","DOIUrl":"10.1002/gepi.22554","url":null,"abstract":"<p>We investigated indirect genetic effects (IGEs), also known as genetic nurture, in education with a novel approach that uses phased data to include parent-offspring pairs in the transmitted/nontransmitted study design. This method increases the power to detect IGEs, enhances the generalizability of the findings, and allows for the study of effects by parent-of-origin. We validated and applied this method in a family-based subsample of adolescents and adults from the Lifelines Cohort Study in the Netherlands (<i>N</i> = 6147), using the latest genome-wide association study data on educational attainment to construct polygenic scores (PGS). Our results indicated that IGEs play a role in education outcomes in the Netherlands: we found significant associations of the nontransmitted PGS with secondary school level in youth between 13 and 24 years old as well as with education attainment and years of education in adults over 25 years old (<i>β</i> = 0.14, 0.17 and 0.26, respectively), with tentative evidence for larger maternal IGEs. In conclusion, we replicated previous findings and showed that including parent-offspring pairs in addition to trios in the transmitted/nontransmitted design can benefit future studies of parental IGEs in a wide range of outcomes.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 4","pages":"190-199"},"PeriodicalIF":2.1,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22554","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140109831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuai Li, Gillian S. Dite, Robert J. MacInnis, Minh Bui, Tuong L. Nguyen, Vivienne F. C. Esser, Zhoufeng Ye, James G. Dowty, Enes Makalic, Joohon Sung, Graham G. Giles, Melissa C. Southey, John L. Hopper
A polygenic risk score (PRS) combines the associations of multiple genetic variants that could be due to direct causal effects, indirect genetic effects, or other sources of familial confounding. We have developed new approaches to assess evidence for and against causation by using family data for pairs of relatives (Inference about Causation from Examination of FAmiliaL CONfounding [ICE FALCON]) or measures of family history (Inference about Causation from Examining Changes in Regression coefficients and Innovative STatistical AnaLyses [ICE CRISTAL]). Inference is made from the changes in regression coefficients of relatives' PRSs or PRS and family history before and after adjusting for each other. We applied these approaches to two breast cancer PRSs and multiple studies and found that (a) for breast cancer diagnosed at a young age, for example, <50 years, there was no evidence that the PRSs were causal, while (b) for breast cancer diagnosed at later ages, there was consistent evidence for causation explaining increasing amounts of the PRS-disease association. The genetic variants in the PRS might be in linkage disequilibrium with truly causal variants and not causal themselves. These PRSs cause minimal heritability of breast cancer at younger ages. There is also evidence for nongenetic factors shared by first-degree relatives that explain breast cancer familial aggregation. Familial associations are not necessarily due to genes, and genetic associations are not necessarily causal.
{"title":"Causation and familial confounding as explanations for the associations of polygenic risk scores with breast cancer: Evidence from innovative ICE FALCON and ICE CRISTAL analyses","authors":"Shuai Li, Gillian S. Dite, Robert J. MacInnis, Minh Bui, Tuong L. Nguyen, Vivienne F. C. Esser, Zhoufeng Ye, James G. Dowty, Enes Makalic, Joohon Sung, Graham G. Giles, Melissa C. Southey, John L. Hopper","doi":"10.1002/gepi.22556","DOIUrl":"10.1002/gepi.22556","url":null,"abstract":"<p>A polygenic risk score (PRS) combines the associations of multiple genetic variants that could be due to direct causal effects, indirect genetic effects, or other sources of familial confounding. We have developed new approaches to assess evidence for and against causation by using family data for pairs of relatives (Inference about Causation from Examination of FAmiliaL CONfounding [ICE FALCON]) or measures of family history (Inference about Causation from Examining Changes in Regression coefficients and Innovative STatistical AnaLyses [ICE CRISTAL]). Inference is made from the changes in regression coefficients of relatives' PRSs or PRS and family history before and after adjusting for each other. We applied these approaches to two breast cancer PRSs and multiple studies and found that (a) for breast cancer diagnosed at a young age, for example, <50 years, there was no evidence that the PRSs were causal, while (b) for breast cancer diagnosed at later ages, there was consistent evidence for causation explaining increasing amounts of the PRS-disease association. The genetic variants in the PRS might be in linkage disequilibrium with truly causal variants and not causal themselves. These PRSs cause minimal heritability of breast cancer at younger ages. There is also evidence for nongenetic factors shared by first-degree relatives that explain breast cancer familial aggregation. Familial associations are not necessarily due to genes, and genetic associations are not necessarily causal.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"401-413"},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22556","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140109830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome-wide association studies (GWAS) have provided an abundance of information about the genetic variants and their loci that are associated to complex traits and diseases. However, due to linkage disequilibrium (LD) and noncoding regions of loci, it remains a challenge to pinpoint the causal genes. Gene network-based approaches, paired with network diffusion methods, have been proposed to prioritize causal genes and to boost statistical power in GWAS based on the assumption that trait-associated genes are clustered in a gene network. Due to the difficulty in mapping trait-associated variants to genes in GWAS, this assumption has never been directly or rigorously tested empirically. On the other hand, whole exome sequencing (WES) data focuses on the protein-coding regions, directly identifying trait-associated genes. In this study, we tested the assumption by leveraging the recently available exome-based association statistics from the UK Biobank WES data along with two types of networks. We found that almost all trait-associated genes were significantly more proximal to each other than randomly selected genes within both networks. These results support the assumption that trait-associated genes are clustered in gene networks, which can be further leveraged to boost the power of GWAS such as by introducing less stringent p value thresholds.
{"title":"Are trait-associated genes clustered together in a gene network?","authors":"Hyun Jung Koo, Wei Pan","doi":"10.1002/gepi.22557","DOIUrl":"10.1002/gepi.22557","url":null,"abstract":"<p>Genome-wide association studies (GWAS) have provided an abundance of information about the genetic variants and their loci that are associated to complex traits and diseases. However, due to linkage disequilibrium (LD) and noncoding regions of loci, it remains a challenge to pinpoint the causal genes. Gene network-based approaches, paired with network diffusion methods, have been proposed to prioritize causal genes and to boost statistical power in GWAS based on the assumption that trait-associated genes are clustered in a gene network. Due to the difficulty in mapping trait-associated variants to genes in GWAS, this assumption has never been directly or rigorously tested empirically. On the other hand, whole exome sequencing (WES) data focuses on the protein-coding regions, directly identifying trait-associated genes. In this study, we tested the assumption by leveraging the recently available exome-based association statistics from the UK Biobank WES data along with two types of networks. We found that almost all trait-associated genes were significantly more proximal to each other than randomly selected genes within both networks. These results support the assumption that trait-associated genes are clustered in gene networks, which can be further leveraged to boost the power of GWAS such as by introducing less stringent <i>p</i> value thresholds.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 5","pages":"203-213"},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22557","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140109761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gene–environment (GxE) interactions play a crucial role in understanding the complex etiology of various traits, but assessing them using observational data can be challenging due to unmeasured confounders for lifestyle and environmental risk factors. Mendelian randomization (MR) has emerged as a valuable method for assessing causal relationships based on observational data. This approach utilizes genetic variants as instrumental variables (IVs) with the aim of providing a valid statistical test and estimation of causal effects in the presence of unmeasured confounders. MR has gained substantial popularity in recent years largely due to the success of genome-wide association studies. Many methods have been developed for MR; however, limited work has been done on evaluating GxE interaction. In this paper, we focus on two primary IV approaches: the two-stage predictor substitution and the two-stage residual inclusion, and extend them to accommodate GxE interaction under both the linear and logistic regression models for continuous and binary outcomes, respectively. Comprehensive simulation study and analytical derivations reveal that resolving the linear regression model is relatively straightforward. In contrast, the logistic regression model presents a considerably more intricate challenge, which demands additional effort.
基因与环境(GxE)之间的相互作用在了解各种性状的复杂病因方面起着至关重要的作用,但由于生活方式和环境风险因素的混杂因素无法测量,因此利用观察数据评估这些相互作用具有挑战性。孟德尔随机化(MR)已成为一种基于观察数据评估因果关系的重要方法。这种方法利用遗传变异作为工具变量(IV),目的是在存在未测量混杂因素的情况下提供有效的统计检验和因果效应估计。近年来,主要由于全基因组关联研究的成功,MR 得到了广泛的推广。目前已开发出许多 MR 方法,但评估 GxE 相互作用的工作还很有限。在本文中,我们重点讨论了两种主要的 IV 方法:两阶段预测因子替换法和两阶段残差包含法,并将它们扩展到线性回归模型和逻辑回归模型下,分别用于连续结果和二元结果的 GxE 交互作用。综合模拟研究和分析推导表明,线性回归模型的解析相对简单。相比之下,逻辑回归模型面临的挑战要复杂得多,需要付出更多的努力。
{"title":"Unveiling challenges in Mendelian randomization for gene–environment interaction","authors":"Malka Gorfine, Conghui Qu, Ulrike Peters, Li Hsu","doi":"10.1002/gepi.22552","DOIUrl":"10.1002/gepi.22552","url":null,"abstract":"<p>Gene–environment (GxE) interactions play a crucial role in understanding the complex etiology of various traits, but assessing them using observational data can be challenging due to unmeasured confounders for lifestyle and environmental risk factors. Mendelian randomization (MR) has emerged as a valuable method for assessing causal relationships based on observational data. This approach utilizes genetic variants as instrumental variables (IVs) with the aim of providing a valid statistical test and estimation of causal effects in the presence of unmeasured confounders. MR has gained substantial popularity in recent years largely due to the success of genome-wide association studies. Many methods have been developed for MR; however, limited work has been done on evaluating GxE interaction. In this paper, we focus on two primary IV approaches: the two-stage predictor substitution and the two-stage residual inclusion, and extend them to accommodate GxE interaction under both the linear and logistic regression models for continuous and binary outcomes, respectively. Comprehensive simulation study and analytical derivations reveal that resolving the linear regression model is relatively straightforward. In contrast, the logistic regression model presents a considerably more intricate challenge, which demands additional effort.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 4","pages":"164-189"},"PeriodicalIF":2.1,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139989797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashish Patel, Dipender Gill, Dmitry Shungin, Christos S. Mantzoros, Lotte Bjerre Knudsen, Jack Bowden, Stephen Burgess
Phenotypic heterogeneity at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight into the pathways by which pharmacological interventions may affect disease risk. However, statistical inference in such investigations may be poor if overdispersion heterogeneity in measured genetic associations is unaccounted for. In this work, we first develop conditional F statistics for dimension-reduced genetic associations that enable more accurate measurement of phenotypic heterogeneity. We then develop a novel extension for two-sample multivariable Mendelian randomization that accounts for overdispersion heterogeneity in dimension-reduced genetic associations. Our empirical focus is to use genetic variants in the GLP1R gene region to understand the mechanism by which GLP1R agonism affects coronary artery disease (CAD) risk. Colocalization analyses indicate that distinct variants in the GLP1R gene region are associated with body mass index and type 2 diabetes (T2D). Multivariable Mendelian randomization analyses that were corrected for overdispersion heterogeneity suggest that bodyweight lowering rather than T2D liability lowering effects of GLP1R agonism are more likely contributing to reduced CAD risk. Tissue-specific analyses prioritized brain tissue as the most likely to be relevant for CAD risk, of the tissues considered. We hope the multivariable Mendelian randomization approach illustrated here is widely applicable to better understand mechanisms linking drug targets to diseases outcomes, and hence to guide drug development efforts.
多变量孟德尔随机化法可以利用编码药物靶点的基因组位点的表型异质性,深入了解药物干预可能影响疾病风险的途径。然而,如果不考虑测得的遗传关联中的过度分散异质性,此类研究的统计推断可能会很差。在这项工作中,我们首先开发了降维遗传关联的条件 F 统计量,从而能够更准确地测量表型异质性。然后,我们为双样本多变量孟德尔随机化开发了一种新的扩展方法,以考虑降维遗传关联中的过度分散异质性。我们的实证重点是利用 GLP1R 基因区域的遗传变异来了解 GLP1R 激动作用影响冠状动脉疾病(CAD)风险的机制。共定位分析表明,GLP1R 基因区的不同变异与体重指数和 2 型糖尿病(T2D)有关。校正了过度分散异质性的多变量孟德尔随机分析表明,GLP1R激动剂降低体重而非T2D责任的作用更有可能降低CAD风险。组织特异性分析认为,在所考虑的组织中,脑组织最有可能与冠心病风险相关。我们希望本文介绍的多变量孟德尔随机化方法能广泛应用于更好地理解药物靶点与疾病结果之间的关联机制,从而指导药物开发工作。
{"title":"Robust use of phenotypic heterogeneity at drug target genes for mechanistic insights: Application of cis-multivariable Mendelian randomization to GLP1R gene region","authors":"Ashish Patel, Dipender Gill, Dmitry Shungin, Christos S. Mantzoros, Lotte Bjerre Knudsen, Jack Bowden, Stephen Burgess","doi":"10.1002/gepi.22551","DOIUrl":"10.1002/gepi.22551","url":null,"abstract":"<p>Phenotypic heterogeneity at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight into the pathways by which pharmacological interventions may affect disease risk. However, statistical inference in such investigations may be poor if overdispersion heterogeneity in measured genetic associations is unaccounted for. In this work, we first develop conditional <i>F</i> statistics for dimension-reduced genetic associations that enable more accurate measurement of phenotypic heterogeneity. We then develop a novel extension for two-sample multivariable Mendelian randomization that accounts for overdispersion heterogeneity in dimension-reduced genetic associations. Our empirical focus is to use genetic variants in the <i>GLP1R</i> gene region to understand the mechanism by which GLP1R agonism affects coronary artery disease (CAD) risk. Colocalization analyses indicate that distinct variants in the <i>GLP1R</i> gene region are associated with body mass index and type 2 diabetes (T2D). Multivariable Mendelian randomization analyses that were corrected for overdispersion heterogeneity suggest that bodyweight lowering rather than T2D liability lowering effects of GLP1R agonism are more likely contributing to reduced CAD risk. Tissue-specific analyses prioritized brain tissue as the most likely to be relevant for CAD risk, of the tissues considered. We hope the multivariable Mendelian randomization approach illustrated here is widely applicable to better understand mechanisms linking drug targets to diseases outcomes, and hence to guide drug development efforts.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 4","pages":"151-163"},"PeriodicalIF":2.1,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139912418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Individual probabilistic assessments on the risk of cancer, primary or secondary, will not be understood by most patients. That is the essence of our arguments in this paper. Greater understanding can be achieved by extensive, intensive, and detailed counseling. But since probability itself is a concept that easily escapes our everyday intuition—consider the famous Monte Hall paradox—then it would also be wise to advise patients and potential patients, to not put undue weight on any probabilistic assessment. Such assessments can be of value to the epidemiologist in the investigation of different potential etiologies describing cancer evolution or to the clinical trialist as a way to maximize design efficiency. But to an ordinary individual we cannot anticipate that these assessments will be correctly interpreted.
{"title":"Making sense of breast cancer risk estimates","authors":"John O'Quigley","doi":"10.1002/gepi.22550","DOIUrl":"10.1002/gepi.22550","url":null,"abstract":"<p>Individual probabilistic assessments on the risk of cancer, primary or secondary, will not be understood by most patients. That is the essence of our arguments in this paper. Greater understanding can be achieved by extensive, intensive, and detailed counseling. But since probability itself is a concept that easily escapes our everyday intuition—consider the famous Monte Hall paradox—then it would also be wise to advise patients and potential patients, to not put undue weight on any probabilistic assessment. Such assessments can be of value to the epidemiologist in the investigation of different potential etiologies describing cancer evolution or to the clinical trialist as a way to maximize design efficiency. But to an ordinary individual we cannot anticipate that these assessments will be correctly interpreted.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 3","pages":"141-147"},"PeriodicalIF":2.1,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139706548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advancements in high-throughput genomic technologies have revolutionized the field of disease biomarker identification by providing large-scale genomic data. There is an increasing focus on understanding the relationships among diverse patient groups with distinct disease subtypes and characteristics. Complex diseases exhibit both heterogeneity and shared genomic factors, making it essential to investigate these patterns to accurately detect markers and comprehensively understand the diseases. Integrative analysis has emerged as a promising approach to address this challenge. However, existing studies have been limited by ignoring the adjacency structure of genomic measurements, such as single nucleotide polymorphisms (SNPs) and DNA methylations. In this study, we propose a structured integrative analysis method that incorporates a spline type penalty to accommodate this adjacency structure. We utilize a fused lasso type penalty to identify both heterogeneity and commonality across the groups. Extensive simulations demonstrate its superiority compared to several direct competing methods. The analysis of The Cancer Genome Atlas melanoma data with DNA methylation measurements and GENEVA diabetes data with SNP measurements exhibit that the proposed analysis lead to meaningful findings with better prediction performance and higher selection stability.
高通量基因组技术的进步提供了大规模的基因组数据,从而彻底改变了疾病生物标志物鉴定领域。人们越来越重视了解具有不同疾病亚型和特征的不同患者群体之间的关系。复杂的疾病既有异质性,也有共同的基因组因素,因此必须研究这些模式,以准确检测标记物,全面了解疾病。整合分析已成为应对这一挑战的一种有前途的方法。然而,现有的研究由于忽略了单核苷酸多态性(SNP)和 DNA 甲基化等基因组测量的邻接结构而受到限制。在本研究中,我们提出了一种结构化综合分析方法,该方法结合了样条线型惩罚,以适应这种邻接结构。我们利用融合套索型惩罚来识别各组间的异质性和共性。大量的模拟证明,与几种直接竞争的方法相比,这种方法更胜一筹。对癌症基因组图谱黑色素瘤数据(DNA 甲基化测量)和 GENEVA 糖尿病数据(SNP 测量)的分析表明,所提出的分析方法具有更好的预测性能和更高的选择稳定性,能带来有意义的发现。
{"title":"Revealing genomic heterogeneity and commonality: A penalized integrative analysis approach accounting for the adjacency structure of measurements","authors":"Xindi Wang, Yu Jiang, Yifan Sun","doi":"10.1002/gepi.22549","DOIUrl":"10.1002/gepi.22549","url":null,"abstract":"<p>Advancements in high-throughput genomic technologies have revolutionized the field of disease biomarker identification by providing large-scale genomic data. There is an increasing focus on understanding the relationships among diverse patient groups with distinct disease subtypes and characteristics. Complex diseases exhibit both heterogeneity and shared genomic factors, making it essential to investigate these patterns to accurately detect markers and comprehensively understand the diseases. Integrative analysis has emerged as a promising approach to address this challenge. However, existing studies have been limited by ignoring the adjacency structure of genomic measurements, such as single nucleotide polymorphisms (SNPs) and DNA methylations. In this study, we propose a structured integrative analysis method that incorporates a spline type penalty to accommodate this adjacency structure. We utilize a fused lasso type penalty to identify both heterogeneity and commonality across the groups. Extensive simulations demonstrate its superiority compared to several direct competing methods. The analysis of The Cancer Genome Atlas melanoma data with DNA methylation measurements and GENEVA diabetes data with SNP measurements exhibit that the proposed analysis lead to meaningful findings with better prediction performance and higher selection stability.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 3","pages":"114-140"},"PeriodicalIF":2.1,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139691643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}