Christina Nieuwoudt, Fabiha Binte Farooq, Angela Brooks-Wilson, Alexandre Bureau, Jinko Graham
Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent–child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.
{"title":"Statistics to prioritize rare variants in family-based sequencing studies with disease subtypes","authors":"Christina Nieuwoudt, Fabiha Binte Farooq, Angela Brooks-Wilson, Alexandre Bureau, Jinko Graham","doi":"10.1002/gepi.22579","DOIUrl":"10.1002/gepi.22579","url":null,"abstract":"<p>Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent–child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"324-343"},"PeriodicalIF":1.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22579","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian D. Chen, Chanhwa Lee, Amanda L. Tapia, Alexander P. Reiner, Hua Tang, Charles Kooperberg, JoAnn E. Manson, Yun Li, Laura M. Raffield
In most Proteome-Wide Association Studies (PWAS), variants near the protein-coding gene (±1 Mb), also known as cis single nucleotide polymorphisms (SNPs), are used to predict protein levels, which are then tested for association with phenotypes. However, proteins can be regulated through variants outside of the cis region. An intermediate GWAS step to identify protein quantitative trait loci (pQTL) allows for the inclusion of trans SNPs outside the cis region in protein-level prediction models. Here, we assess the prediction of 540 proteins in 1002 individuals from the Women's Health Initiative (WHI), split equally into a GWAS set, an elastic net training set, and a testing set. We compared the testing r2 between measured and predicted protein levels using this proposed approach, to the testing r2 using only cis SNPs. The two methods usually resulted in similar testing r2, but some proteins showed a significant increase in testing r2 with our method. For example, for cartilage acidic protein 1, the testing r2 increased from 0.101 to 0.351. We also demonstrate reproducible findings for predicted protein association with lipid and blood cell traits in WHI participants without proteomics data and in UK Biobank utilizing our PWAS weights.
{"title":"Proteome-wide association study using cis and trans variants and applied to blood cell and lipid-related traits in the Women's Health Initiative study","authors":"Brian D. Chen, Chanhwa Lee, Amanda L. Tapia, Alexander P. Reiner, Hua Tang, Charles Kooperberg, JoAnn E. Manson, Yun Li, Laura M. Raffield","doi":"10.1002/gepi.22578","DOIUrl":"10.1002/gepi.22578","url":null,"abstract":"<p>In most Proteome-Wide Association Studies (PWAS), variants near the protein-coding gene (±1 Mb), also known as <i>cis</i> single nucleotide polymorphisms (SNPs), are used to predict protein levels, which are then tested for association with phenotypes. However, proteins can be regulated through variants outside of the cis region. An intermediate GWAS step to identify protein quantitative trait loci (pQTL) allows for the inclusion of trans SNPs outside the cis region in protein-level prediction models. Here, we assess the prediction of 540 proteins in 1002 individuals from the Women's Health Initiative (WHI), split equally into a GWAS set, an elastic net training set, and a testing set. We compared the testing <i>r</i><sup>2</sup> between measured and predicted protein levels using this proposed approach, to the testing <i>r</i><sup>2</sup> using only cis SNPs. The two methods usually resulted in similar testing <i>r</i><sup>2</sup>, but some proteins showed a significant increase in testing <i>r</i><sup>2</sup> with our method. For example, for cartilage acidic protein 1, the testing <i>r</i><sup>2</sup> increased from 0.101 to 0.351. We also demonstrate reproducible findings for predicted protein association with lipid and blood cell traits in WHI participants without proteomics data and in UK Biobank utilizing our PWAS weights.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"310-323"},"PeriodicalIF":1.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lai Jiang, Jiayi Shen, Burcu F. Darst, Christopher A. Haiman, Nicholas Mancuso, David V. Conti
Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.
工具变量(IV)分析已广泛应用于流行病学,利用观察数据推断因果关系。在孟德尔随机化和全转录组关联研究中,遗传变异也可被视为有效的工具变量。然而,大多数多变量 IV 方法无法扩展到高通量实验数据。在这里,我们利用之前工作的灵活性--联合分析边际汇总统计量的分层模型(hJAM)--建立了一个可扩展的框架(SHA-JAM),该框架可应用于大量中间产物和大量相关遗传变异--这是在利用 omic 技术的现代实验中经常遇到的情况。SHA-JAM旨在通过将单核苷酸多态性(SNP)-中间体或SNP-基因表达关联分析的估计值作为分层模型中的先验信息,估计高维风险因素对结果的条件效应。大量模拟研究结果表明,与现有的类似分析方法相比,SHA-JAM 的接收者操作特征曲线下面积(AUC)更大,估计值的均方误差更小,计算速度更快。在前列腺癌的两个应用实例中,我们使用来自超过 140,000 名男性前列腺癌 GWAS 的汇总统计数据以及代谢物和转录组的高维公开汇总数据,分别研究了代谢物和转录组之间的关联。
{"title":"Hierarchical joint analysis of marginal summary statistics—Part II: High-dimensional instrumental analysis of omics data","authors":"Lai Jiang, Jiayi Shen, Burcu F. Darst, Christopher A. Haiman, Nicholas Mancuso, David V. Conti","doi":"10.1002/gepi.22577","DOIUrl":"10.1002/gepi.22577","url":null,"abstract":"<p>Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"291-309"},"PeriodicalIF":1.7,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22577","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141418544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome-wide association studies (GWAS) have been helpful in identifying genetic variants predicting cancer risk and providing new insights into cancer biology. Increasing use of genetically informed care, as well as genetically informed prevention and treatment strategies, have also drawn attention to some of the inherent limitations of cancer genetic data. Specifically, genetic endowment is lifelong. However, those recruited into cancer studies tend to be middle-aged or older people, meaning the exposure most likely starts before recruitment, as opposed to exposure and recruitment aligning, as in a trial or a target trial. Studies in survivors can be biased as a result of depletion of the susceptibles, here specifically due to genetic vulnerability and the cancer of interest or a competing risk. In addition, including prevalent cases in a case-control study will make the genetics of survival with cancer look harmful (Neyman bias). Here, we describe ways of designing GWAS to maximize explanatory power and predictive utility, by reducing selection bias due to only recruiting survivors and reducing Neyman bias due to including prevalent cases alongside using other techniques, such as selection diagrams, age-stratification, and Mendelian randomization, to facilitate GWAS interpretability and utility.
{"title":"Interpreting disease genome-wide association studies and polygenetic risk scores given eligibility and study design considerations","authors":"Catherine Mary Schooling, Mary Beth Terry","doi":"10.1002/gepi.22567","DOIUrl":"10.1002/gepi.22567","url":null,"abstract":"<p>Genome-wide association studies (GWAS) have been helpful in identifying genetic variants predicting cancer risk and providing new insights into cancer biology. Increasing use of genetically informed care, as well as genetically informed prevention and treatment strategies, have also drawn attention to some of the inherent limitations of cancer genetic data. Specifically, genetic endowment is lifelong. However, those recruited into cancer studies tend to be middle-aged or older people, meaning the exposure most likely starts before recruitment, as opposed to exposure and recruitment aligning, as in a trial or a target trial. Studies in survivors can be biased as a result of depletion of the susceptibles, here specifically due to genetic vulnerability and the cancer of interest or a competing risk. In addition, including prevalent cases in a case-control study will make the genetics of survival with cancer look harmful (Neyman bias). Here, we describe ways of designing GWAS to maximize explanatory power and predictive utility, by reducing selection bias due to only recruiting survivors and reducing Neyman bias due to including prevalent cases alongside using other techniques, such as selection diagrams, age-stratification, and Mendelian randomization, to facilitate GWAS interpretability and utility.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"468-472"},"PeriodicalIF":1.7,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141155102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing “gene component scores” and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.
{"title":"Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma","authors":"Diptavo Dutta, Ananda Sen, Jaya M. Satagopan","doi":"10.1002/gepi.22566","DOIUrl":"10.1002/gepi.22566","url":null,"abstract":"<p>Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing “gene component scores” and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of <i>ASAH1</i> gene trans-regulated by methylation of several genes including <i>SIX5</i> and by CNAs in the 10q25 region including <i>TCF7L2</i>. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"414-432"},"PeriodicalIF":1.7,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22566","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140943393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saptarshi Chakraborty, Zoe Guan, Caroline E. Kostrzewa, Ronglai Shen, Colin B. Begg
Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline BRCA1/2 mutations and in head and neck cancer patients exposed to human papillomavirus.
在过去一代人的时间里,大量研究发现了会增加特定癌症风险的种系变异。与此同时,测序技术的革命使得高通量注释体细胞基因组成为可能,从而描述了单个肿瘤的特征。然而,由于典型肿瘤中存在大量变异、大多数个体变异的罕见性以及肿瘤体细胞指纹的异质性,研究种系变异与体细胞改变模式之间的关系面临巨大挑战。在本文中,我们提出了统计方法,以可解释的方式构建种系与体细胞关系的研究框架。该方法使用元特征(meta-features)来体现个体体细胞改变的生物学背景,从而对罕见突变进行隐式分组。我们的团队以前曾通过多层次回归模型使用这一技术,高精度地诊断出肿瘤的起源部位。在这里,我们进一步利用计算语言学中的主题模型来实现元特征的可解释低维嵌入。我们展示了该方法如何识别与特定种系变异或环境风险因素相关的独特体细胞特征。我们利用癌症基因组图谱(The Cancer Genome Atlas)的全外显子组测序数据来说明该方法,以描述具有种系 BRCA1/2 基因突变的乳腺癌患者和暴露于人类乳头瘤病毒的头颈部癌症患者的体细胞肿瘤指纹特征。
{"title":"Identifying somatic fingerprints of cancers defined by germline and environmental risk factors","authors":"Saptarshi Chakraborty, Zoe Guan, Caroline E. Kostrzewa, Ronglai Shen, Colin B. Begg","doi":"10.1002/gepi.22565","DOIUrl":"10.1002/gepi.22565","url":null,"abstract":"<p>Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline <i>BRCA1/2</i> mutations and in head and neck cancer patients exposed to human papillomavirus.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"455-467"},"PeriodicalIF":1.7,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thanthirige L. M. Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas
Multigene panel testing now allows efficient testing of many cancer susceptibility genes leading to a larger number of mutation carriers being identified. They need to be counseled about their cancer risk conferred by the specific gene mutation. An important cancer susceptibility gene is PALB2. Multiple studies reported risk estimates for breast cancer (BC) conferred by pathogenic variants in PALB2. Due to the diverse modalities of reported risk estimates (age-specific risk, odds ratio, relative risk, and standardized incidence ratio) and effect sizes, a meta-analysis combining these estimates is necessary to accurately counsel patients with this mutation. However, this is not trivial due to heterogeneity of studies in terms of study design and risk measure. We utilized a recently proposed Bayesian random-effects meta-analysis method that can synthesize estimates from such heterogeneous studies. We applied this method to combine estimates from 12 studies on BC risk for carriers of pathogenic PALB2 mutations. The estimated overall (meta-analysis-based) risk of BC is 12.80% (6.11%−22.59%) by age 50 and 48.47% (36.05%−61.74%) by age 80. Pathogenic mutations in PALB2 makes women more susceptible to BC. Our risk estimates can help clinically manage patients carrying pathogenic variants in PALB2.
现在,多基因面板检测可以对许多癌症易感基因进行有效检测,从而发现更多的基因突变携带者。他们需要就特定基因突变带来的癌症风险接受咨询。一个重要的癌症易感基因是 PALB2。多项研究报告了 PALB2 致病变异所带来的乳腺癌(BC)风险估计值。由于报告的风险估计值(年龄特异性风险、几率比例、相对风险和标准化发病率比)和效应大小的方式各不相同,因此有必要将这些估计值结合起来进行荟萃分析,以便为该基因突变的患者提供准确的咨询。然而,由于各项研究在研究设计和风险测量方面存在异质性,因此这并非易事。我们采用了最近提出的贝叶斯随机效应荟萃分析方法,该方法可以综合此类异质性研究的估计值。我们采用这种方法综合了 12 项研究对致病性 PALB2 突变携带者 BC 风险的估计值。据估计,到 50 岁时 BC 的总体(基于荟萃分析的)风险为 12.80%(6.11%-22.59%),到 80 岁时为 48.47%(36.05%-61.74%)。PALB2 的致病突变使女性更易患 BC。我们的风险估计值有助于对携带PALB2致病变异的患者进行临床管理。
{"title":"Meta-analysis of breast cancer risk for individuals with PALB2 pathogenic variants","authors":"Thanthirige L. M. Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas","doi":"10.1002/gepi.22561","DOIUrl":"10.1002/gepi.22561","url":null,"abstract":"<p>Multigene panel testing now allows efficient testing of many cancer susceptibility genes leading to a larger number of mutation carriers being identified. They need to be counseled about their cancer risk conferred by the specific gene mutation. An important cancer susceptibility gene is PALB2. Multiple studies reported risk estimates for breast cancer (BC) conferred by pathogenic variants in PALB2. Due to the diverse modalities of reported risk estimates (age-specific risk, odds ratio, relative risk, and standardized incidence ratio) and effect sizes, a meta-analysis combining these estimates is necessary to accurately counsel patients with this mutation. However, this is not trivial due to heterogeneity of studies in terms of study design and risk measure. We utilized a recently proposed Bayesian random-effects meta-analysis method that can synthesize estimates from such heterogeneous studies. We applied this method to combine estimates from 12 studies on BC risk for carriers of pathogenic PALB2 mutations. The estimated overall (meta-analysis-based) risk of BC is 12.80% (6.11%−22.59%) by age 50 and 48.47% (36.05%−61.74%) by age 80. Pathogenic mutations in PALB2 makes women more susceptible to BC. Our risk estimates can help clinically manage patients carrying pathogenic variants in PALB2.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"448-454"},"PeriodicalIF":1.7,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140666320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Negar Janani, Kendra A. Young, Greg Kinney, Matthew Strand, John E. Hokanson, Yaning Liu, Troy Butler, Erin Austin
The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.
全基因组关联研究(GWAS)通常使用线性或逻辑回归模型来确定相关表型(性状)与基因型(遗传变异)之间的关联。然而,使用加法假设回归有潜在的局限性。首先,残差的正态性假设在实践中很少见,而偏离正态性会增加 I 类错误率。其次,基于这种假设建立模型会忽略遗传结构,如显性、隐性和保护性风险情况。忽略遗传变异可能会导致关于变异与性状之间关联的错误结论。我们提出了一种建立在数据一致性反演(DCI)基础上的无假设模型,DCI 是最近开发的一种用于不确定性量化的计量理论框架。这个由 DCI 衍生的模型在模型输入上建立了一个非参数分布,该分布可传播到观测数据的分布,而无需对回归模型中的残差进行所需的正态性假设。这一特点使拟议的 DCI 衍生模型能够涵盖所有遗传变异,而无需强调经典 GWAS 模型的可加性。利用 COPDGene 数据进行的模拟和复制 GWAS 证明,该模型在控制 I 类错误率方面的能力至少与经典 GWAS(加法线性模型)方法相当,同时在发现不同遗传传播模式的变异方面具有相似或更强的能力。
{"title":"A novel application of data-consistent inversion to overcome spurious inference in genome-wide association studies","authors":"Negar Janani, Kendra A. Young, Greg Kinney, Matthew Strand, John E. Hokanson, Yaning Liu, Troy Butler, Erin Austin","doi":"10.1002/gepi.22563","DOIUrl":"10.1002/gepi.22563","url":null,"abstract":"<p>The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 6","pages":"270-288"},"PeriodicalIF":1.7,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Azeez Alade, Tabitha Peter, Tamara Busch, Waheed Awotoye, Deepti Anand, Oladayo Abimbola, Emmanuel Aladenika, Mojisola Olujitan, Oscar Rysavy, Phuong Fawng Nguyen, Thirona Naicker, Peter A. Mossey, Lord J. J. Gowans, Mekonen A. Eshete, Wasiu L. Adeyemo, Erliang Zeng, Eric Van Otterloo, Michael O'Rorke, Adebowale Adeyemo, Jeffrey C. Murray, Salil A. Lachke, Paul A. Romitti, Azeez Butali
Nonsyndromic orofacial clefts (NSOFCs) represent a large proportion (70%–80%) of all OFCs. They can be broadly categorized into nonsyndromic cleft lip with or without cleft palate (NSCL/P) and nonsyndromic cleft palate only (NSCPO). Although NSCL/P and NSCPO are considered etiologically distinct, recent evidence suggests the presence of shared genetic risks. Thus, we investigated the genetic overlap between NSCL/P and NSCPO using African genome-wide association study (GWAS) data on NSOFCs. These data consist of 814 NSCL/P, 205 NSCPO cases, and 2159 unrelated controls. We generated common single-nucleotide variants (SNVs) association summary statistics separately for each phenotype (NSCL/P and NSCPO) under an additive genetic model. Subsequently, we employed the pleiotropic analysis under the composite null (PLACO) method to test for genetic overlap. Our analysis identified two loci with genome-wide significance (rs181737795 [p = 2.58E−08] and rs2221169 [p = 4.5E−08]) and one locus with marginal significance (rs187523265 [p = 5.22E−08]). Using mouse transcriptomics data and information from genetic phenotype databases, we identified MDN1, MAP3k7, KMT2A, ARCN1, and VADC2 as top candidate genes for the associated SNVs. These findings enhance our understanding of genetic variants associated with NSOFCs and identify potential candidate genes for further exploration.
{"title":"Shared genetic risk between major orofacial cleft phenotypes in an African population","authors":"Azeez Alade, Tabitha Peter, Tamara Busch, Waheed Awotoye, Deepti Anand, Oladayo Abimbola, Emmanuel Aladenika, Mojisola Olujitan, Oscar Rysavy, Phuong Fawng Nguyen, Thirona Naicker, Peter A. Mossey, Lord J. J. Gowans, Mekonen A. Eshete, Wasiu L. Adeyemo, Erliang Zeng, Eric Van Otterloo, Michael O'Rorke, Adebowale Adeyemo, Jeffrey C. Murray, Salil A. Lachke, Paul A. Romitti, Azeez Butali","doi":"10.1002/gepi.22564","DOIUrl":"10.1002/gepi.22564","url":null,"abstract":"<p>Nonsyndromic orofacial clefts (NSOFCs) represent a large proportion (70%–80%) of all OFCs. They can be broadly categorized into nonsyndromic cleft lip with or without cleft palate (NSCL/P) and nonsyndromic cleft palate only (NSCPO). Although NSCL/P and NSCPO are considered etiologically distinct, recent evidence suggests the presence of shared genetic risks. Thus, we investigated the genetic overlap between NSCL/P and NSCPO using African genome-wide association study (GWAS) data on NSOFCs. These data consist of 814 NSCL/P, 205 NSCPO cases, and 2159 unrelated controls. We generated common single-nucleotide variants (SNVs) association summary statistics separately for each phenotype (NSCL/P and NSCPO) under an additive genetic model. Subsequently, we employed the pleiotropic analysis under the composite null (PLACO) method to test for genetic overlap. Our analysis identified two loci with genome-wide significance (rs181737795 [<i>p</i> = 2.58E−08] and rs2221169 [<i>p</i> = 4.5E−08]) and one locus with marginal significance (rs187523265 [<i>p</i> = 5.22E−08]). Using mouse transcriptomics data and information from genetic phenotype databases, we identified <i>MDN1, MAP3k7, KMT2A, ARCN1</i>, and <i>VADC2</i> as top candidate genes for the associated SNVs. These findings enhance our understanding of genetic variants associated with NSOFCs and identify potential candidate genes for further exploration.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 6","pages":"258-269"},"PeriodicalIF":1.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22564","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meiling Liu, Yu-Ru Su, Yang Liu, Li Hsu, Qianchuan He
Genetic factors play a fundamental role in disease development. Studying the genetic association with clinical outcomes is critical for understanding disease biology and devising novel treatment targets. However, the frequencies of genetic variations are often low, making it difficult to examine the variants one-by-one. Moreover, the clinical outcomes are complex, including patients' survival time and other binary or continuous outcomes such as recurrences and lymph node count, and how to effectively analyze genetic association with these outcomes remains unclear. In this article, we proposed a structured test statistic for testing genetic association with mixed types of survival, binary, and continuous outcomes. The structured testing incorporates known biological information of variants while allowing for their heterogeneous effects and is a powerful strategy for analyzing infrequent genetic factors. Simulation studies show that the proposed test statistic has correct type I error and is highly effective in detecting significant genetic variants. We applied our approach to a uterine corpus endometrial carcinoma study and identified several genetic pathways associated with the clinical outcomes.
遗传因素在疾病发展中起着根本性的作用。研究基因与临床结果的关联对于了解疾病生物学和设计新的治疗目标至关重要。然而,基因变异的频率通常很低,因此很难对变异进行逐一研究。此外,临床结果非常复杂,包括患者的生存时间和其他二元或连续结果,如复发和淋巴结计数,如何有效分析遗传与这些结果的关联仍不清楚。在这篇文章中,我们提出了一种结构化检验统计量,用于检验遗传与混合类型的生存、二元和连续结果之间的关联。结构化检验结合了变异体的已知生物学信息,同时允许变异体的异质性效应,是分析不常见遗传因素的有力策略。模拟研究表明,所提出的测试统计量具有正确的 I 型误差,在检测重要遗传变异方面非常有效。我们将这一方法应用于子宫内膜癌研究,并确定了与临床结果相关的几种遗传途径。
{"title":"Structured testing of genetic association with mixed clinical outcomes","authors":"Meiling Liu, Yu-Ru Su, Yang Liu, Li Hsu, Qianchuan He","doi":"10.1002/gepi.22560","DOIUrl":"10.1002/gepi.22560","url":null,"abstract":"<p>Genetic factors play a fundamental role in disease development. Studying the genetic association with clinical outcomes is critical for understanding disease biology and devising novel treatment targets. However, the frequencies of genetic variations are often low, making it difficult to examine the variants one-by-one. Moreover, the clinical outcomes are complex, including patients' survival time and other binary or continuous outcomes such as recurrences and lymph node count, and how to effectively analyze genetic association with these outcomes remains unclear. In this article, we proposed a structured test statistic for testing genetic association with mixed types of survival, binary, and continuous outcomes. The structured testing incorporates known biological information of variants while allowing for their heterogeneous effects and is a powerful strategy for analyzing infrequent genetic factors. Simulation studies show that the proposed test statistic has correct type I error and is highly effective in detecting significant genetic variants. We applied our approach to a uterine corpus endometrial carcinoma study and identified several genetic pathways associated with the clinical outcomes.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 5","pages":"226-237"},"PeriodicalIF":1.7,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140596715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}