Genetic Epidemiology最新文献_第7页

Statistics to prioritize rare variants in family-based sequencing studies with disease subtypes 在基于疾病亚型的家族测序研究中优先考虑罕见变异的统计方法。

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-06-28 DOI: 10.1002/gepi.22579

Christina Nieuwoudt, Fabiha Binte Farooq, Angela Brooks-Wilson, Alexandre Bureau, Jinko Graham

Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent–child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.

以家族为基础的测序研究越来越多地用于发现具有家族聚集性疾病特征的高风险罕见遗传变异。在一些研究中，收集了具有多种疾病亚型的家族，并对受影响亲属的外显子组进行测序，以寻找共有的罕见变异体（RVs）。由于不同的家族可能携带不同的致病变异体，而每个家族又携带许多 RV，因此在这种研究设计中，检测致病变异体的测试功率可能较低。我们的目标是通过通路分析或功能研究等方法，优先选择共有变异进行进一步研究。传递失衡检验根据亲子三人组的孟德尔传递偏差来确定变异的优先次序。将这一想法推广到家族中，我们提出了一些方法来优先考虑两种疾病亚型（一种亚型的遗传性高于另一种亚型）的患病亲属中共享的 RV。全局方法以研究中观察到的变异为条件，并假定携带致病变异的概率是已知的。相比之下，局部方法以在特定家庭中观察到变异体为条件，以消除携带概率。我们的模拟结果表明，全局方法对载体概率的错误指定具有很强的鲁棒性，即使在载体概率被错误指定的情况下，全局方法也能比局部方法更有效地确定优先次序。

{"title":"Statistics to prioritize rare variants in family-based sequencing studies with disease subtypes","authors":"Christina Nieuwoudt, Fabiha Binte Farooq, Angela Brooks-Wilson, Alexandre Bureau, Jinko Graham","doi":"10.1002/gepi.22579","DOIUrl":"10.1002/gepi.22579","url":null,"abstract":"Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent–child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"324-343"},"PeriodicalIF":1.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22579","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proteome-wide association study using cis and trans variants and applied to blood cell and lipid-related traits in the Women's Health Initiative study 利用顺式和反式变异进行全蛋白质组关联研究，并将其应用于妇女健康倡议研究中的血细胞和血脂相关特征。

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-06-28 DOI: 10.1002/gepi.22578

Brian D. Chen, Chanhwa Lee, Amanda L. Tapia, Alexander P. Reiner, Hua Tang, Charles Kooperberg, JoAnn E. Manson, Yun Li, Laura M. Raffield

In most Proteome-Wide Association Studies (PWAS), variants near the protein-coding gene (±1 Mb), also known as cis single nucleotide polymorphisms (SNPs), are used to predict protein levels, which are then tested for association with phenotypes. However, proteins can be regulated through variants outside of the cis region. An intermediate GWAS step to identify protein quantitative trait loci (pQTL) allows for the inclusion of trans SNPs outside the cis region in protein-level prediction models. Here, we assess the prediction of 540 proteins in 1002 individuals from the Women's Health Initiative (WHI), split equally into a GWAS set, an elastic net training set, and a testing set. We compared the testing r² between measured and predicted protein levels using this proposed approach, to the testing r² using only cis SNPs. The two methods usually resulted in similar testing r², but some proteins showed a significant increase in testing r² with our method. For example, for cartilage acidic protein 1, the testing r² increased from 0.101 to 0.351. We also demonstrate reproducible findings for predicted protein association with lipid and blood cell traits in WHI participants without proteomics data and in UK Biobank utilizing our PWAS weights.

在大多数蛋白质组全关联研究（PWAS）中，蛋白质编码基因附近的变异（±1 Mb），也称为顺式单核苷酸多态性（SNPs），被用来预测蛋白质水平，然后检测其与表型的关联。然而，蛋白质可通过顺式区域外的变异进行调控。GWAS 鉴定蛋白质数量性状位点（pQTL）的中间步骤允许将顺式区域外的反式 SNP 纳入蛋白质水平预测模型中。在这里，我们评估了妇女健康倡议（WHI）中 1002 个个体中 540 个蛋白质的预测结果，这些个体被平均分成一个 GWAS 集、一个弹性网训练集和一个测试集。我们比较了使用这种拟议方法和仅使用顺式 SNPs 的测试 r2，以及测量和预测蛋白质水平之间的测试 r2。这两种方法通常会产生相似的测试 r2，但有些蛋白质在使用我们的方法后测试 r2 显著增加。例如，对于软骨酸性蛋白 1，检测 r2 从 0.101 增加到 0.351。我们还展示了在没有蛋白质组学数据的 WHI 参与者中以及在英国生物库中利用我们的 PWAS 权重预测蛋白质与血脂和血细胞特征相关性的重复性结果。

{"title":"Proteome-wide association study using cis and trans variants and applied to blood cell and lipid-related traits in the Women's Health Initiative study","authors":"Brian D. Chen, Chanhwa Lee, Amanda L. Tapia, Alexander P. Reiner, Hua Tang, Charles Kooperberg, JoAnn E. Manson, Yun Li, Laura M. Raffield","doi":"10.1002/gepi.22578","DOIUrl":"10.1002/gepi.22578","url":null,"abstract":"In most Proteome-Wide Association Studies (PWAS), variants near the protein-coding gene (±1 Mb), also known as cis single nucleotide polymorphisms (SNPs), are used to predict protein levels, which are then tested for association with phenotypes. However, proteins can be regulated through variants outside of the cis region. An intermediate GWAS step to identify protein quantitative trait loci (pQTL) allows for the inclusion of trans SNPs outside the cis region in protein-level prediction models. Here, we assess the prediction of 540 proteins in 1002 individuals from the Women's Health Initiative (WHI), split equally into a GWAS set, an elastic net training set, and a testing set. We compared the testing r2 between measured and predicted protein levels using this proposed approach, to the testing r2 using only cis SNPs. The two methods usually resulted in similar testing r2, but some proteins showed a significant increase in testing r2 with our method. For example, for cartilage acidic protein 1, the testing r2 increased from 0.101 to 0.351. We also demonstrate reproducible findings for predicted protein association with lipid and blood cell traits in WHI participants without proteomics data and in UK Biobank utilizing our PWAS weights.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"310-323"},"PeriodicalIF":1.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical joint analysis of marginal summary statistics—Part II: High-dimensional instrumental analysis of omics data 边际汇总统计的层次联合分析--第二部分：omics 数据的高维工具分析。

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-06-17 DOI: 10.1002/gepi.22577

Lai Jiang, Jiayi Shen, Burcu F. Darst, Christopher A. Haiman, Nicholas Mancuso, David V. Conti

Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.

工具变量（IV）分析已广泛应用于流行病学，利用观察数据推断因果关系。在孟德尔随机化和全转录组关联研究中，遗传变异也可被视为有效的工具变量。然而，大多数多变量 IV 方法无法扩展到高通量实验数据。在这里，我们利用之前工作的灵活性--联合分析边际汇总统计量的分层模型（hJAM）--建立了一个可扩展的框架（SHA-JAM），该框架可应用于大量中间产物和大量相关遗传变异--这是在利用 omic 技术的现代实验中经常遇到的情况。SHA-JAM旨在通过将单核苷酸多态性（SNP）-中间体或SNP-基因表达关联分析的估计值作为分层模型中的先验信息，估计高维风险因素对结果的条件效应。大量模拟研究结果表明，与现有的类似分析方法相比，SHA-JAM 的接收者操作特征曲线下面积（AUC）更大，估计值的均方误差更小，计算速度更快。在前列腺癌的两个应用实例中，我们使用来自超过 140,000 名男性前列腺癌 GWAS 的汇总统计数据以及代谢物和转录组的高维公开汇总数据，分别研究了代谢物和转录组之间的关联。

{"title":"Hierarchical joint analysis of marginal summary statistics—Part II: High-dimensional instrumental analysis of omics data","authors":"Lai Jiang, Jiayi Shen, Burcu F. Darst, Christopher A. Haiman, Nicholas Mancuso, David V. Conti","doi":"10.1002/gepi.22577","DOIUrl":"10.1002/gepi.22577","url":null,"abstract":"Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"291-309"},"PeriodicalIF":1.7,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22577","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141418544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interpreting disease genome-wide association studies and polygenetic risk scores given eligibility and study design considerations 考虑到资格和研究设计因素，解读疾病全基因组关联研究和多基因风险评分。

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-05-26 DOI: 10.1002/gepi.22567

Catherine Mary Schooling, Mary Beth Terry

Genome-wide association studies (GWAS) have been helpful in identifying genetic variants predicting cancer risk and providing new insights into cancer biology. Increasing use of genetically informed care, as well as genetically informed prevention and treatment strategies, have also drawn attention to some of the inherent limitations of cancer genetic data. Specifically, genetic endowment is lifelong. However, those recruited into cancer studies tend to be middle-aged or older people, meaning the exposure most likely starts before recruitment, as opposed to exposure and recruitment aligning, as in a trial or a target trial. Studies in survivors can be biased as a result of depletion of the susceptibles, here specifically due to genetic vulnerability and the cancer of interest or a competing risk. In addition, including prevalent cases in a case-control study will make the genetics of survival with cancer look harmful (Neyman bias). Here, we describe ways of designing GWAS to maximize explanatory power and predictive utility, by reducing selection bias due to only recruiting survivors and reducing Neyman bias due to including prevalent cases alongside using other techniques, such as selection diagrams, age-stratification, and Mendelian randomization, to facilitate GWAS interpretability and utility.

全基因组关联研究（GWAS）有助于确定预测癌症风险的基因变异，并为癌症生物学提供新的见解。越来越多地使用基因知情护理以及基因知情预防和治疗策略，也使人们注意到癌症基因数据的一些固有局限性。具体来说，基因禀赋是终身的。然而，癌症研究招募的对象往往是中年人或老年人，这意味着暴露很可能在招募之前就开始了，而不是像试验或目标试验那样，暴露和招募是一致的。对幸存者的研究可能会因为易感人群的减少而产生偏差，这里的易感人群是指遗传易感性和相关癌症或竞争性风险。此外，在病例对照研究中纳入流行病例会使癌症生存遗传学看起来有害（奈曼偏倚）。在此，我们将介绍如何设计全球基因组研究，以最大限度地提高解释力和预测效用，具体方法是减少因仅招募幸存者而产生的选择偏倚，减少因纳入流行病例而产生的奈曼偏倚，同时使用其他技术，如选择图、年龄分层和孟德尔随机化，以促进全球基因组研究的可解释性和效用。

{"title":"Interpreting disease genome-wide association studies and polygenetic risk scores given eligibility and study design considerations","authors":"Catherine Mary Schooling, Mary Beth Terry","doi":"10.1002/gepi.22567","DOIUrl":"10.1002/gepi.22567","url":null,"abstract":"Genome-wide association studies (GWAS) have been helpful in identifying genetic variants predicting cancer risk and providing new insights into cancer biology. Increasing use of genetically informed care, as well as genetically informed prevention and treatment strategies, have also drawn attention to some of the inherent limitations of cancer genetic data. Specifically, genetic endowment is lifelong. However, those recruited into cancer studies tend to be middle-aged or older people, meaning the exposure most likely starts before recruitment, as opposed to exposure and recruitment aligning, as in a trial or a target trial. Studies in survivors can be biased as a result of depletion of the susceptibles, here specifically due to genetic vulnerability and the cancer of interest or a competing risk. In addition, including prevalent cases in a case-control study will make the genetics of survival with cancer look harmful (Neyman bias). Here, we describe ways of designing GWAS to maximize explanatory power and predictive utility, by reducing selection bias due to only recruiting survivors and reducing Neyman bias due to including prevalent cases alongside using other techniques, such as selection diagrams, age-stratification, and Mendelian randomization, to facilitate GWAS interpretability and utility.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"468-472"},"PeriodicalIF":1.7,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141155102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma 利用联合稀疏典型相关分析确定与疾病结果相关的基因--在肾透明细胞癌中的应用

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-05-15 DOI: 10.1002/gepi.22566

Diptavo Dutta, Ananda Sen, Jaya M. Satagopan

Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing “gene component scores” and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.

拷贝数畸变（CNAs）等体细胞变化和甲基化等表观遗传学改变通过调控基因表达对癌症的疾病预后和预后有关键影响，而基因表达则驱动着关键的生物过程。为了确定潜在的生物标记物和分子靶标，并了解它们如何影响疾病预后，必须通过联合综合分析确定关键的 CNAs 组、相关的甲基化及其影响的基因表达。在这里，我们提出了一种新的分析管道，即联合稀疏典型相关分析（jsCCA），它是 sCCA 的扩展，可有效识别疾病终点（尤其是肿瘤特征）背景下的 CNAs、甲基化位点和基因（表达）成分组合。我们的方法能检测出与甲基化位点集高度相关的潜在正交基因成分，而甲基化位点集又与 CNA 位点集相关。然后找出这些成分中与结果相关的基因。此外，我们还通过构建 "基因成分分数 "来汇总每个基因表达集对肿瘤分期的影响，并测试其与传统风险因素的相互作用。通过分析 TCGA-KIRC 中 515 名肾透明细胞癌（ccRCC）患者的临床和基因组数据，我们发现有八个基因成分与甲基化位点相关，并受到近端 CNA 位点组的调控。与诊断时肿瘤分期的关联分析发现了一种新的关联，即 ASAH1 基因的表达受包括 SIX5 在内的几个基因的甲基化和包括 TCF7L2 在内的 10q25 区域的 CNAs 的转调。为量化基因组对肿瘤分期的整体影响而进行的进一步分析表明，在八个基因成分中，有两个与吸烟在肿瘤分期上有显著的相互作用。这些基因成分代表了不同的生物功能，包括免疫功能、炎症反应和缺氧调控通路。我们的研究结果表明，jsCCA 分析可以识别可解释的重要基因、调控结构和临床后果通路。这种方法适用于多模态数据的综合分析，尤其是在癌症基因组学领域。

{"title":"Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma","authors":"Diptavo Dutta, Ananda Sen, Jaya M. Satagopan","doi":"10.1002/gepi.22566","DOIUrl":"10.1002/gepi.22566","url":null,"abstract":"Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing “gene component scores” and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"414-432"},"PeriodicalIF":1.7,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22566","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140943393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying somatic fingerprints of cancers defined by germline and environmental risk factors 识别由种系和环境风险因素确定的癌症的体细胞指纹

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-04-30 DOI: 10.1002/gepi.22565

Saptarshi Chakraborty, Zoe Guan, Caroline E. Kostrzewa, Ronglai Shen, Colin B. Begg

Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline BRCA1/2 mutations and in head and neck cancer patients exposed to human papillomavirus.

在过去一代人的时间里，大量研究发现了会增加特定癌症风险的种系变异。与此同时，测序技术的革命使得高通量注释体细胞基因组成为可能，从而描述了单个肿瘤的特征。然而，由于典型肿瘤中存在大量变异、大多数个体变异的罕见性以及肿瘤体细胞指纹的异质性，研究种系变异与体细胞改变模式之间的关系面临巨大挑战。在本文中，我们提出了统计方法，以可解释的方式构建种系与体细胞关系的研究框架。该方法使用元特征（meta-features）来体现个体体细胞改变的生物学背景，从而对罕见突变进行隐式分组。我们的团队以前曾通过多层次回归模型使用这一技术，高精度地诊断出肿瘤的起源部位。在这里，我们进一步利用计算语言学中的主题模型来实现元特征的可解释低维嵌入。我们展示了该方法如何识别与特定种系变异或环境风险因素相关的独特体细胞特征。我们利用癌症基因组图谱（The Cancer Genome Atlas）的全外显子组测序数据来说明该方法，以描述具有种系 BRCA1/2 基因突变的乳腺癌患者和暴露于人类乳头瘤病毒的头颈部癌症患者的体细胞肿瘤指纹特征。

{"title":"Identifying somatic fingerprints of cancers defined by germline and environmental risk factors","authors":"Saptarshi Chakraborty, Zoe Guan, Caroline E. Kostrzewa, Ronglai Shen, Colin B. Begg","doi":"10.1002/gepi.22565","DOIUrl":"10.1002/gepi.22565","url":null,"abstract":"Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline BRCA1/2 mutations and in head and neck cancer patients exposed to human papillomavirus.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"455-467"},"PeriodicalIF":1.7,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Meta-analysis of breast cancer risk for individuals with PALB2 pathogenic variants PALB2致病变异个体罹患乳腺癌风险的元分析。

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-04-23 DOI: 10.1002/gepi.22561

Thanthirige L. M. Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas

Multigene panel testing now allows efficient testing of many cancer susceptibility genes leading to a larger number of mutation carriers being identified. They need to be counseled about their cancer risk conferred by the specific gene mutation. An important cancer susceptibility gene is PALB2. Multiple studies reported risk estimates for breast cancer (BC) conferred by pathogenic variants in PALB2. Due to the diverse modalities of reported risk estimates (age-specific risk, odds ratio, relative risk, and standardized incidence ratio) and effect sizes, a meta-analysis combining these estimates is necessary to accurately counsel patients with this mutation. However, this is not trivial due to heterogeneity of studies in terms of study design and risk measure. We utilized a recently proposed Bayesian random-effects meta-analysis method that can synthesize estimates from such heterogeneous studies. We applied this method to combine estimates from 12 studies on BC risk for carriers of pathogenic PALB2 mutations. The estimated overall (meta-analysis-based) risk of BC is 12.80% (6.11%−22.59%) by age 50 and 48.47% (36.05%−61.74%) by age 80. Pathogenic mutations in PALB2 makes women more susceptible to BC. Our risk estimates can help clinically manage patients carrying pathogenic variants in PALB2.

现在，多基因面板检测可以对许多癌症易感基因进行有效检测，从而发现更多的基因突变携带者。他们需要就特定基因突变带来的癌症风险接受咨询。一个重要的癌症易感基因是 PALB2。多项研究报告了 PALB2 致病变异所带来的乳腺癌（BC）风险估计值。由于报告的风险估计值（年龄特异性风险、几率比例、相对风险和标准化发病率比）和效应大小的方式各不相同，因此有必要将这些估计值结合起来进行荟萃分析，以便为该基因突变的患者提供准确的咨询。然而，由于各项研究在研究设计和风险测量方面存在异质性，因此这并非易事。我们采用了最近提出的贝叶斯随机效应荟萃分析方法，该方法可以综合此类异质性研究的估计值。我们采用这种方法综合了 12 项研究对致病性 PALB2 突变携带者 BC 风险的估计值。据估计，到 50 岁时 BC 的总体（基于荟萃分析的）风险为 12.80%（6.11%-22.59%），到 80 岁时为 48.47%（36.05%-61.74%）。PALB2 的致病突变使女性更易患 BC。我们的风险估计值有助于对携带PALB2致病变异的患者进行临床管理。

{"title":"Meta-analysis of breast cancer risk for individuals with PALB2 pathogenic variants","authors":"Thanthirige L. M. Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas","doi":"10.1002/gepi.22561","DOIUrl":"10.1002/gepi.22561","url":null,"abstract":"Multigene panel testing now allows efficient testing of many cancer susceptibility genes leading to a larger number of mutation carriers being identified. They need to be counseled about their cancer risk conferred by the specific gene mutation. An important cancer susceptibility gene is PALB2. Multiple studies reported risk estimates for breast cancer (BC) conferred by pathogenic variants in PALB2. Due to the diverse modalities of reported risk estimates (age-specific risk, odds ratio, relative risk, and standardized incidence ratio) and effect sizes, a meta-analysis combining these estimates is necessary to accurately counsel patients with this mutation. However, this is not trivial due to heterogeneity of studies in terms of study design and risk measure. We utilized a recently proposed Bayesian random-effects meta-analysis method that can synthesize estimates from such heterogeneous studies. We applied this method to combine estimates from 12 studies on BC risk for carriers of pathogenic PALB2 mutations. The estimated overall (meta-analysis-based) risk of BC is 12.80% (6.11%−22.59%) by age 50 and 48.47% (36.05%−61.74%) by age 80. Pathogenic mutations in PALB2 makes women more susceptible to BC. Our risk estimates can help clinically manage patients carrying pathogenic variants in PALB2.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"448-454"},"PeriodicalIF":1.7,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140666320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel application of data-consistent inversion to overcome spurious inference in genome-wide association studies 在全基因组关联研究中克服虚假推断的数据一致性反演新应用

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-04-21 DOI: 10.1002/gepi.22563

Negar Janani, Kendra A. Young, Greg Kinney, Matthew Strand, John E. Hokanson, Yaning Liu, Troy Butler, Erin Austin

The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.

全基因组关联研究（GWAS）通常使用线性或逻辑回归模型来确定相关表型（性状）与基因型（遗传变异）之间的关联。然而，使用加法假设回归有潜在的局限性。首先，残差的正态性假设在实践中很少见，而偏离正态性会增加 I 类错误率。其次，基于这种假设建立模型会忽略遗传结构，如显性、隐性和保护性风险情况。忽略遗传变异可能会导致关于变异与性状之间关联的错误结论。我们提出了一种建立在数据一致性反演（DCI）基础上的无假设模型，DCI 是最近开发的一种用于不确定性量化的计量理论框架。这个由 DCI 衍生的模型在模型输入上建立了一个非参数分布，该分布可传播到观测数据的分布，而无需对回归模型中的残差进行所需的正态性假设。这一特点使拟议的 DCI 衍生模型能够涵盖所有遗传变异，而无需强调经典 GWAS 模型的可加性。利用 COPDGene 数据进行的模拟和复制 GWAS 证明，该模型在控制 I 类错误率方面的能力至少与经典 GWAS（加法线性模型）方法相当，同时在发现不同遗传传播模式的变异方面具有相似或更强的能力。

{"title":"A novel application of data-consistent inversion to overcome spurious inference in genome-wide association studies","authors":"Negar Janani, Kendra A. Young, Greg Kinney, Matthew Strand, John E. Hokanson, Yaning Liu, Troy Butler, Erin Austin","doi":"10.1002/gepi.22563","DOIUrl":"10.1002/gepi.22563","url":null,"abstract":"The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 6","pages":"270-288"},"PeriodicalIF":1.7,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Shared genetic risk between major orofacial cleft phenotypes in an African population 非洲人口中主要口面裂表型之间的共同遗传风险

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-04-18 DOI: 10.1002/gepi.22564

Azeez Alade, Tabitha Peter, Tamara Busch, Waheed Awotoye, Deepti Anand, Oladayo Abimbola, Emmanuel Aladenika, Mojisola Olujitan, Oscar Rysavy, Phuong Fawng Nguyen, Thirona Naicker, Peter A. Mossey, Lord J. J. Gowans, Mekonen A. Eshete, Wasiu L. Adeyemo, Erliang Zeng, Eric Van Otterloo, Michael O'Rorke, Adebowale Adeyemo, Jeffrey C. Murray, Salil A. Lachke, Paul A. Romitti, Azeez Butali

Nonsyndromic orofacial clefts (NSOFCs) represent a large proportion (70%–80%) of all OFCs. They can be broadly categorized into nonsyndromic cleft lip with or without cleft palate (NSCL/P) and nonsyndromic cleft palate only (NSCPO). Although NSCL/P and NSCPO are considered etiologically distinct, recent evidence suggests the presence of shared genetic risks. Thus, we investigated the genetic overlap between NSCL/P and NSCPO using African genome-wide association study (GWAS) data on NSOFCs. These data consist of 814 NSCL/P, 205 NSCPO cases, and 2159 unrelated controls. We generated common single-nucleotide variants (SNVs) association summary statistics separately for each phenotype (NSCL/P and NSCPO) under an additive genetic model. Subsequently, we employed the pleiotropic analysis under the composite null (PLACO) method to test for genetic overlap. Our analysis identified two loci with genome-wide significance (rs181737795 [p = 2.58E−08] and rs2221169 [p = 4.5E−08]) and one locus with marginal significance (rs187523265 [p = 5.22E−08]). Using mouse transcriptomics data and information from genetic phenotype databases, we identified MDN1, MAP3k7, KMT2A, ARCN1, and VADC2 as top candidate genes for the associated SNVs. These findings enhance our understanding of genetic variants associated with NSOFCs and identify potential candidate genes for further exploration.

非综合征口面裂（NSOFCs）在所有口面裂中占很大比例（70%-80%）。它们可大致分为伴有或不伴有腭裂的非综合征唇裂（NSCL/P）和仅伴有腭裂的非综合征唇裂（NSCPO）。虽然 NSCL/P 和 NSCPO 在病因学上被认为是不同的，但最近的证据表明它们存在共同的遗传风险。因此，我们利用非洲 NSOFCs 全基因组关联研究（GWAS）数据调查了 NSCL/P 和 NSCPO 之间的遗传重叠。这些数据包括 814 例 NSCL/P、205 例 NSCPO 和 2159 例无关对照。在加性遗传模型下，我们为每种表型（NSCL/P 和 NSCPO）分别生成了常见单核苷酸变体（SNVs）关联汇总统计。随后，我们采用复合无效（PLACO）方法下的多向分析来检验遗传重叠。我们的分析确定了两个具有全基因组意义的位点（rs181737795 [p = 2.58E-08] 和 rs2221169 [p = 4.5E-08]）和一个具有边缘意义的位点（rs187523265 [p = 5.22E-08]）。利用小鼠转录组学数据和遗传表型数据库的信息，我们确定 MDN1、MAP3k7、KMT2A、ARCN1 和 VADC2 为相关 SNV 的顶级候选基因。这些发现加深了我们对与 NSOFCs 相关的遗传变异的理解，并确定了有待进一步探索的潜在候选基因。

{"title":"Shared genetic risk between major orofacial cleft phenotypes in an African population","authors":"Azeez Alade, Tabitha Peter, Tamara Busch, Waheed Awotoye, Deepti Anand, Oladayo Abimbola, Emmanuel Aladenika, Mojisola Olujitan, Oscar Rysavy, Phuong Fawng Nguyen, Thirona Naicker, Peter A. Mossey, Lord J. J. Gowans, Mekonen A. Eshete, Wasiu L. Adeyemo, Erliang Zeng, Eric Van Otterloo, Michael O'Rorke, Adebowale Adeyemo, Jeffrey C. Murray, Salil A. Lachke, Paul A. Romitti, Azeez Butali","doi":"10.1002/gepi.22564","DOIUrl":"10.1002/gepi.22564","url":null,"abstract":"Nonsyndromic orofacial clefts (NSOFCs) represent a large proportion (70%–80%) of all OFCs. They can be broadly categorized into nonsyndromic cleft lip with or without cleft palate (NSCL/P) and nonsyndromic cleft palate only (NSCPO). Although NSCL/P and NSCPO are considered etiologically distinct, recent evidence suggests the presence of shared genetic risks. Thus, we investigated the genetic overlap between NSCL/P and NSCPO using African genome-wide association study (GWAS) data on NSOFCs. These data consist of 814 NSCL/P, 205 NSCPO cases, and 2159 unrelated controls. We generated common single-nucleotide variants (SNVs) association summary statistics separately for each phenotype (NSCL/P and NSCPO) under an additive genetic model. Subsequently, we employed the pleiotropic analysis under the composite null (PLACO) method to test for genetic overlap. Our analysis identified two loci with genome-wide significance (rs181737795 [p = 2.58E−08] and rs2221169 [p = 4.5E−08]) and one locus with marginal significance (rs187523265 [p = 5.22E−08]). Using mouse transcriptomics data and information from genetic phenotype databases, we identified MDN1, MAP3k7, KMT2A, ARCN1, and VADC2 as top candidate genes for the associated SNVs. These findings enhance our understanding of genetic variants associated with NSOFCs and identify potential candidate genes for further exploration.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 6","pages":"258-269"},"PeriodicalIF":1.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22564","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Structured testing of genetic association with mixed clinical outcomes 对临床结果不一的遗传关联进行结构化测试

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology

Pub Date : 2024-04-12 DOI: 10.1002/gepi.22560

Meiling Liu, Yu-Ru Su, Yang Liu, Li Hsu, Qianchuan He

Genetic factors play a fundamental role in disease development. Studying the genetic association with clinical outcomes is critical for understanding disease biology and devising novel treatment targets. However, the frequencies of genetic variations are often low, making it difficult to examine the variants one-by-one. Moreover, the clinical outcomes are complex, including patients' survival time and other binary or continuous outcomes such as recurrences and lymph node count, and how to effectively analyze genetic association with these outcomes remains unclear. In this article, we proposed a structured test statistic for testing genetic association with mixed types of survival, binary, and continuous outcomes. The structured testing incorporates known biological information of variants while allowing for their heterogeneous effects and is a powerful strategy for analyzing infrequent genetic factors. Simulation studies show that the proposed test statistic has correct type I error and is highly effective in detecting significant genetic variants. We applied our approach to a uterine corpus endometrial carcinoma study and identified several genetic pathways associated with the clinical outcomes.

遗传因素在疾病发展中起着根本性的作用。研究基因与临床结果的关联对于了解疾病生物学和设计新的治疗目标至关重要。然而，基因变异的频率通常很低，因此很难对变异进行逐一研究。此外，临床结果非常复杂，包括患者的生存时间和其他二元或连续结果，如复发和淋巴结计数，如何有效分析遗传与这些结果的关联仍不清楚。在这篇文章中，我们提出了一种结构化检验统计量，用于检验遗传与混合类型的生存、二元和连续结果之间的关联。结构化检验结合了变异体的已知生物学信息，同时允许变异体的异质性效应，是分析不常见遗传因素的有力策略。模拟研究表明，所提出的测试统计量具有正确的 I 型误差，在检测重要遗传变异方面非常有效。我们将这一方法应用于子宫内膜癌研究，并确定了与临床结果相关的几种遗传途径。

{"title":"Structured testing of genetic association with mixed clinical outcomes","authors":"Meiling Liu, Yu-Ru Su, Yang Liu, Li Hsu, Qianchuan He","doi":"10.1002/gepi.22560","DOIUrl":"10.1002/gepi.22560","url":null,"abstract":"Genetic factors play a fundamental role in disease development. Studying the genetic association with clinical outcomes is critical for understanding disease biology and devising novel treatment targets. However, the frequencies of genetic variations are often low, making it difficult to examine the variants one-by-one. Moreover, the clinical outcomes are complex, including patients' survival time and other binary or continuous outcomes such as recurrences and lymph node count, and how to effectively analyze genetic association with these outcomes remains unclear. In this article, we proposed a structured test statistic for testing genetic association with mixed types of survival, binary, and continuous outcomes. The structured testing incorporates known biological information of variants while allowing for their heterogeneous effects and is a powerful strategy for analyzing infrequent genetic factors. Simulation studies show that the proposed test statistic has correct type I error and is highly effective in detecting significant genetic variants. We applied our approach to a uterine corpus endometrial carcinoma study and identified several genetic pathways associated with the clinical outcomes.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 5","pages":"226-237"},"PeriodicalIF":1.7,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140596715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0