首页 > 最新文献

Genetic Epidemiology最新文献

英文 中文
Exploring and Accounting for Genetically Driven Effect Heterogeneity in Mendelian Randomization 探索和解释孟德尔随机化中基因驱动的效应异质性。
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-09-22 DOI: 10.1002/gepi.22587
Annika Jaitner, Krasimira Tsaneva-Atanasova, Rachel M. Freathy, Jack Bowden

Mendelian randomization (MR) is a framework to estimate the causal effect of a modifiable health exposure, drug target or pharmaceutical intervention on a downstream outcome by using genetic variants as instrumental variables. A crucial assumption allowing estimation of the average causal effect in MR, termed homogeneity, is that the causal effect does not vary across levels of any instrument used in the analysis. In contrast, the science of pharmacogenetics seeks to actively uncover and exploit genetically driven effect heterogeneity for the purposes of precision medicine. In this study, we consider a recently proposed method for performing pharmacogenetic analysis on observational data—the Triangulation WIthin a STudy (TWIST) framework—and explore how it can be combined with traditional MR approaches to properly characterise average causal effects and genetically driven effect heterogeneity. We propose two new methods which not only estimate the genetically driven effect heterogeneity but also enable the estimation of a causal effect in the genetic group with and without the risk allele separately. Both methods utilise homogeneity-respecting and homogeneity-violating genetic variants and rely on a different set of assumptions. Using data from the ALSPAC study, we apply our new methods to estimate the causal effect of smoking before and during pregnancy on offspring birth weight in mothers whose genetics mean they find it (relatively) easier or harder to quit smoking.

孟德尔随机化(MR)是一种利用遗传变异作为工具变量来估算可改变的健康暴露、药物目标或药物干预对下游结果的因果效应的框架。在 MR 中,估算平均因果效应的一个重要假设(称为同质性)是,因果效应不会因分析中使用的任何工具的不同水平而变化。与此相反,药物遗传学试图积极发现和利用基因驱动的效应异质性,以实现精准医疗的目的。在本研究中,我们考虑了最近提出的一种对观察数据进行药物遗传学分析的方法--TWIST(Triangulation WIthin a STudy)框架--并探讨了如何将其与传统的 MR 方法相结合,以正确描述平均因果效应和基因驱动的效应异质性。我们提出了两种新方法,它们不仅能估算基因驱动效应异质性,还能分别估算有风险等位基因和无风险等位基因基因组的因果效应。这两种方法都利用了尊重同质性和违反同质性的遗传变异,并依赖于不同的假设。利用 ALSPAC 研究的数据,我们运用新方法估算了母亲在怀孕前和怀孕期间吸烟对后代出生体重的因果效应,这些母亲的遗传意味着戒烟(相对)更容易或更困难。
{"title":"Exploring and Accounting for Genetically Driven Effect Heterogeneity in Mendelian Randomization","authors":"Annika Jaitner,&nbsp;Krasimira Tsaneva-Atanasova,&nbsp;Rachel M. Freathy,&nbsp;Jack Bowden","doi":"10.1002/gepi.22587","DOIUrl":"10.1002/gepi.22587","url":null,"abstract":"<p>Mendelian randomization (MR) is a framework to estimate the causal effect of a modifiable health exposure, drug target or pharmaceutical intervention on a downstream outcome by using genetic variants as instrumental variables. A crucial assumption allowing estimation of the average causal effect in MR, termed <i>homogeneity</i>, is that the causal effect does not vary across levels of any instrument used in the analysis. In contrast, the science of pharmacogenetics seeks to actively uncover and exploit genetically driven effect heterogeneity for the purposes of precision medicine. In this study, we consider a recently proposed method for performing pharmacogenetic analysis on observational data—the Triangulation WIthin a STudy (TWIST) framework—and explore how it can be combined with traditional MR approaches to properly characterise average causal effects and genetically driven effect heterogeneity. We propose two new methods which not only estimate the genetically driven effect heterogeneity but also enable the estimation of a causal effect in the genetic group with and without the risk allele separately. Both methods utilise homogeneity-respecting and homogeneity-violating genetic variants and rely on a different set of assumptions. Using data from the ALSPAC study, we apply our new methods to estimate the causal effect of smoking before and during pregnancy on offspring birth weight in mothers whose genetics mean they find it (relatively) easier or harder to quit smoking.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22587","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142284341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using clustering of genetic variants in Mendelian randomization to interrogate the causal pathways underlying multimorbidity from a common risk factor 利用孟德尔随机化中的遗传变异聚类,从一个共同的风险因素出发,探究多病致病的因果途径。
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-08-13 DOI: 10.1002/gepi.22582
Xiaoran Liang, Ninon Mounier, Nicolas Apfel, Sara Khalid, Timothy M. Frayling, Jack Bowden

Mendelian randomization (MR) is an epidemiological approach that utilizes genetic variants as instrumental variables to estimate the causal effect of an exposure on a health outcome. This paper investigates an MR scenario in which genetic variants aggregate into clusters that identify heterogeneous causal effects. Such variant clusters are likely to emerge if they affect the exposure and outcome via distinct biological pathways. In the multi-outcome MR framework, where a shared exposure causally impacts several disease outcomes simultaneously, these variant clusters can provide insights into the common disease-causing mechanisms underpinning the co-occurrence of multiple long-term conditions, a phenomenon known as multimorbidity. To identify such variant clusters, we adapt the general method of agglomerative hierarchical clustering to multi-sample summary-data MR setup, enabling cluster detection based on variant-specific ratio estimates. Particularly, we tailor the method for multi-outcome MR to aid in elucidating the causal pathways through which a common risk factor contributes to multiple morbidities. We show in simulations that our “MR-AHC” method detects clusters with high accuracy, outperforming the existing methods. We apply the method to investigate the causal effects of high body fat percentage on type 2 diabetes and osteoarthritis, uncovering interconnected cellular processes underlying this multimorbid disease pair.

孟德尔随机化(MR)是一种流行病学方法,它利用遗传变异作为工具变量来估计暴露对健康结果的因果效应。本文研究了一种 MR 情景,在这种情景中,遗传变异聚集成群,从而确定了异质性因果效应。如果基因变异通过不同的生物途径影响暴露和结果,就有可能出现这种变异集群。在多结果 MR 框架中,共同的暴露会同时对几种疾病结果产生因果影响,这些变异集群可以让人们深入了解多种长期病症并发的共同致病机制,这种现象被称为多病共存。为了识别这种变异集群,我们将聚类分层聚类的一般方法调整为多样本汇总数据磁共振设置,从而能够根据变异特异性比率估计值进行集群检测。特别是,我们为多结果 MR 定制了方法,以帮助阐明一个共同风险因素导致多种疾病的因果途径。我们的模拟结果表明,我们的 "MR-AHC "方法能高精度地检测到集群,优于现有方法。我们应用该方法研究了高体脂率对 2 型糖尿病和骨关节炎的因果效应,揭示了这对多病组合背后相互关联的细胞过程。
{"title":"Using clustering of genetic variants in Mendelian randomization to interrogate the causal pathways underlying multimorbidity from a common risk factor","authors":"Xiaoran Liang,&nbsp;Ninon Mounier,&nbsp;Nicolas Apfel,&nbsp;Sara Khalid,&nbsp;Timothy M. Frayling,&nbsp;Jack Bowden","doi":"10.1002/gepi.22582","DOIUrl":"10.1002/gepi.22582","url":null,"abstract":"<p>Mendelian randomization (MR) is an epidemiological approach that utilizes genetic variants as instrumental variables to estimate the causal effect of an exposure on a health outcome. This paper investigates an MR scenario in which genetic variants aggregate into clusters that identify heterogeneous causal effects. Such variant clusters are likely to emerge if they affect the exposure and outcome via distinct biological pathways. In the multi-outcome MR framework, where a shared exposure causally impacts several disease outcomes simultaneously, these variant clusters can provide insights into the common disease-causing mechanisms underpinning the co-occurrence of multiple long-term conditions, a phenomenon known as multimorbidity. To identify such variant clusters, we adapt the general method of agglomerative hierarchical clustering to multi-sample summary-data MR setup, enabling cluster detection based on variant-specific ratio estimates. Particularly, we tailor the method for multi-outcome MR to aid in elucidating the causal pathways through which a common risk factor contributes to multiple morbidities. We show in simulations that our “MR-AHC” method detects clusters with high accuracy, outperforming the existing methods. We apply the method to investigate the causal effects of high body fat percentage on type 2 diabetes and osteoarthritis, uncovering interconnected cellular processes underlying this multimorbid disease pair.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22582","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141975524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring pleiotropy in Mendelian randomisation analyses: What are genetic variants associated with ‘cigarette smoking initiation’ really capturing? 探索孟德尔随机分析中的多义性:与 "开始吸烟 "相关的基因变异到底在捕捉什么?
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-08-04 DOI: 10.1002/gepi.22583
Zoe E. Reed, Robyn E. Wootton, Jasmine N. Khouja, Tom G. Richardson, Eleanor Sanderson, George Davey Smith, Marcus R. Munafò

Genetic variants used as instruments for exposures in Mendelian randomisation (MR) analyses may have horizontal pleiotropic effects (i.e., influence outcomes via pathways other than through the exposure), which can undermine the validity of results. We examined the extent of this using smoking behaviours as an example. We first ran a phenome-wide association study in UK Biobank, using a smoking initiation genetic instrument. From the most strongly associated phenotypes, we selected those we considered could either plausibly or not plausibly be caused by smoking. We examined associations between genetic instruments for smoking initiation, smoking heaviness and lifetime smoking and these phenotypes in UK Biobank and the Avon Longitudinal Study of Parents and Children (ALSPAC). We conducted negative control analyses among never smokers, including children. We found evidence that smoking-related genetic instruments were associated with phenotypes not plausibly caused by smoking in UK Biobank and (to a lesser extent) ALSPAC. We observed associations with phenotypes among never smokers. Our results demonstrate that smoking-related genetic risk scores are associated with unexpected phenotypes that are less plausibly downstream of smoking. This may reflect horizontal pleiotropy in these genetic risk scores, and we would encourage researchers to exercise caution this when using these and genetic risk scores for other complex behavioural exposures. We outline approaches that could be taken to consider this and overcome issues caused by potential horizontal pleiotropy, for example, in genetically informed causal inference analyses (e.g., MR) it is important to consider negative control outcomes and triangulation approaches, to avoid arriving at incorrect conclusions.

在孟德尔随机化(MR)分析中,作为暴露工具的基因变异可能会产生水平多向效应(即通过暴露以外的途径影响结果),这可能会损害结果的有效性。我们以吸烟行为为例,研究了这种影响的程度。我们首先在英国生物库中使用吸烟起始基因工具进行了全表型关联研究。从关联性最强的表型中,我们选择了那些我们认为可能由吸烟引起或不可能由吸烟引起的表型。我们研究了英国生物数据库和雅芳父母与子女纵向研究(ALSPAC)中吸烟起始、吸烟量和终生吸烟的基因工具与这些表型之间的关联。我们对包括儿童在内的从不吸烟者进行了阴性对照分析。我们发现有证据表明,在英国生物数据库和(在较小程度上)ALSPAC 中,与吸烟相关的遗传工具与非由吸烟引起的表型相关。我们观察到从未吸烟者的表型与吸烟相关。我们的研究结果表明,与吸烟相关的遗传风险评分与意外的表型相关,而这些表型不太可能是吸烟的下游因素。这可能反映了这些遗传风险评分的横向多效性,我们鼓励研究人员在将这些评分和遗传风险评分用于其他复杂的行为暴露时谨慎行事。我们概述了可以采取哪些方法来考虑这一点并克服潜在的横向褶积性所造成的问题,例如,在遗传信息因果推断分析(如 MR)中,必须考虑负对照结果和三角测量方法,以避免得出不正确的结论。
{"title":"Exploring pleiotropy in Mendelian randomisation analyses: What are genetic variants associated with ‘cigarette smoking initiation’ really capturing?","authors":"Zoe E. Reed,&nbsp;Robyn E. Wootton,&nbsp;Jasmine N. Khouja,&nbsp;Tom G. Richardson,&nbsp;Eleanor Sanderson,&nbsp;George Davey Smith,&nbsp;Marcus R. Munafò","doi":"10.1002/gepi.22583","DOIUrl":"10.1002/gepi.22583","url":null,"abstract":"<p>Genetic variants used as instruments for exposures in Mendelian randomisation (MR) analyses may have horizontal pleiotropic effects (i.e., influence outcomes via pathways other than through the exposure), which can undermine the validity of results. We examined the extent of this using smoking behaviours as an example. We first ran a phenome-wide association study in UK Biobank, using a smoking initiation genetic instrument. From the most strongly associated phenotypes, we selected those we considered could either plausibly or not plausibly be caused by smoking. We examined associations between genetic instruments for smoking initiation, smoking heaviness and lifetime smoking and these phenotypes in UK Biobank and the Avon Longitudinal Study of Parents and Children (ALSPAC). We conducted negative control analyses among never smokers, including children. We found evidence that smoking-related genetic instruments were associated with phenotypes not plausibly caused by smoking in UK Biobank and (to a lesser extent) ALSPAC. We observed associations with phenotypes among never smokers. Our results demonstrate that smoking-related genetic risk scores are associated with unexpected phenotypes that are less plausibly downstream of smoking. This may reflect horizontal pleiotropy in these genetic risk scores, and we would encourage researchers to exercise caution this when using these and genetic risk scores for other complex behavioural exposures. We outline approaches that could be taken to consider this and overcome issues caused by potential horizontal pleiotropy, for example, in genetically informed causal inference analyses (e.g., MR) it is important to consider negative control outcomes and triangulation approaches, to avoid arriving at incorrect conclusions.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7616876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of genetic correlations to examine selection bias 利用基因相关性研究选择偏差。
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-07-30 DOI: 10.1002/gepi.22584
Chin Yang Shapland, Apostolos Gkatzionis, Gibran Hemani, Kate Tilling

Observational studies are rarely representative of their target population because there are known and unknown factors that affect an individual's choice to participate (the selection mechanism). Selection can cause bias in a given analysis if the outcome is related to selection (conditional on the other variables in the model). Detecting and adjusting for selection bias in practice typically requires access to data on nonselected individuals. Here, we propose methods to detect selection bias in genetic studies by comparing correlations among genetic variants in the selected sample to those expected under no selection. We examine the use of four hypothesis tests to identify induced associations between genetic variants in the selected sample. We evaluate these approaches in Monte Carlo simulations. Finally, we use these approaches in an applied example using data from the UK Biobank (UKBB). The proposed tests suggested an association between alcohol consumption and selection into UKBB. Hence, UKBB analyses with alcohol consumption as the exposure or outcome may be biased by this selection.

观察性研究很少能代表其目标人群,因为有已知和未知的因素会影响个人对参与的选择(选择机制)。如果结果与选择有关(以模型中的其他变量为条件),选择就会导致特定分析出现偏差。在实践中,检测和调整选择偏差通常需要获取非选择个体的数据。在此,我们提出了在遗传研究中检测选择偏倚的方法,即比较所选样本中遗传变异的相关性和无选择情况下的相关性。我们研究了使用四种假设检验来识别所选样本中遗传变异之间的诱导关联。我们在蒙特卡罗模拟中对这些方法进行了评估。最后,我们利用英国生物库 (UKBB) 的数据将这些方法用于一个应用实例中。所提出的测试表明,酒精消费与英国生物库的选择之间存在关联。 因此,以酒精消费为暴露或结果的英国生物库分析可能会因这种选择而产生偏差。
{"title":"Use of genetic correlations to examine selection bias","authors":"Chin Yang Shapland,&nbsp;Apostolos Gkatzionis,&nbsp;Gibran Hemani,&nbsp;Kate Tilling","doi":"10.1002/gepi.22584","DOIUrl":"10.1002/gepi.22584","url":null,"abstract":"<p>Observational studies are rarely representative of their target population because there are known and unknown factors that affect an individual's choice to participate (the selection mechanism). Selection can cause bias in a given analysis if the outcome is related to selection (conditional on the other variables in the model). Detecting and adjusting for selection bias in practice typically requires access to data on nonselected individuals. Here, we propose methods to detect selection bias in genetic studies by comparing correlations among genetic variants in the selected sample to those expected under no selection. We examine the use of four hypothesis tests to identify induced associations between genetic variants in the selected sample. We evaluate these approaches in Monte Carlo simulations. Finally, we use these approaches in an applied example using data from the UK Biobank (UKBB). The proposed tests suggested an association between alcohol consumption and selection into UKBB. Hence, UKBB analyses with alcohol consumption as the exposure or outcome may be biased by this selection.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22584","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141855281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polygenic hazard score models for the prediction of Alzheimer's free survival using the lasso for Cox's proportional hazards model 利用考克斯比例危险模型的套索,建立预测阿尔茨海默氏症患者自由生存期的多基因危险评分模型。
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-07-09 DOI: 10.1002/gepi.22581
Georg Hahn, Dmitry Prokopenko, Julian Hecker, Sharon M. Lutz, Kristina Mullin, Rudolph E. Tanzi, Stacia DeSantis, Christoph Lange

The prediction of the susceptibility of an individual to a certain disease is an important and timely research area. An established technique is to estimate the risk of an individual with the help of an integrated risk model, that is, a polygenic risk score with added epidemiological covariates. However, integrated risk models do not capture any time dependence, and may provide a point estimate of the relative risk with respect to a reference population. The aim of this work is twofold. First, we explore and advocate the idea of predicting the time-dependent hazard and survival (defined as disease-free time) of an individual for the onset of a disease. This provides a practitioner with a much more differentiated view of absolute survival as a function of time. Second, to compute the time-dependent risk of an individual, we use published methodology to fit a Cox's proportional hazard model to data from a genetic SNP study of time to Alzheimer's disease (AD) onset, using the lasso to incorporate further epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status, 10 leading principal components, and selected genomic loci. We apply the lasso for Cox's proportional hazards to a data set of 6792 AD patients (composed of 4102 cases and 2690 controls) and 87 covariates. We demonstrate that fitting a lasso model for Cox's proportional hazards allows one to obtain more accurate survival curves than with state-of-the-art (likelihood-based) methods. Moreover, the methodology allows one to obtain personalized survival curves for a patient, thus giving a much more differentiated view of the expected progression of a disease than the view offered by integrated risk models. The runtime to compute personalized survival curves is under a minute for the entire data set of AD patients, thus enabling it to handle datasets with 60,000–100,000 subjects in less than 1 h.

预测个人对某种疾病的易感性是一个重要而及时的研究领域。一种成熟的技术是借助综合风险模型来估计个体的风险,即多基因风险评分加上流行病学协变量。然而,综合风险模型无法捕捉任何时间依赖性,只能提供相对于参照人群的相对风险点估算值。这项工作有两个目的。首先,我们探索并倡导预测个体发病时与时间相关的危险性和生存期(定义为无病时间)。这为从业者提供了一个更有区别的绝对生存时间函数。其次,为了计算个体的时间相关风险,我们使用已公布的方法,对阿尔茨海默病(AD)发病时间的遗传 SNP 研究数据拟合 Cox 比例危险模型,并使用套索法纳入更多流行病学变量,如性别、APOE(载脂蛋白 E,AD 的遗传风险因素)状态、10 个主要主成分和选定的基因组位点。我们在一个包含 6792 例 AD 患者(由 4102 例病例和 2690 例对照组成)和 87 个协变量的数据集上应用了 lasso 的 Cox 比例危险度模型。我们证明,与最先进的(基于似然法的)方法相比,拟合 Cox 比例危险度的套索模型可以获得更准确的生存曲线。此外,该方法还能获得患者的个性化生存曲线,因此,与综合风险模型相比,该方法能提供更有区别的疾病预期进展情况。对整个 AD 患者数据集而言,计算个性化生存曲线的运行时间不到一分钟,因此可以在 1 小时内处理 60,000 至 100,000 个受试者的数据集。
{"title":"Polygenic hazard score models for the prediction of Alzheimer's free survival using the lasso for Cox's proportional hazards model","authors":"Georg Hahn,&nbsp;Dmitry Prokopenko,&nbsp;Julian Hecker,&nbsp;Sharon M. Lutz,&nbsp;Kristina Mullin,&nbsp;Rudolph E. Tanzi,&nbsp;Stacia DeSantis,&nbsp;Christoph Lange","doi":"10.1002/gepi.22581","DOIUrl":"10.1002/gepi.22581","url":null,"abstract":"<p>The prediction of the susceptibility of an individual to a certain disease is an important and timely research area. An established technique is to estimate the risk of an individual with the help of an integrated risk model, that is, a polygenic risk score with added epidemiological covariates. However, integrated risk models do not capture any time dependence, and may provide a point estimate of the relative risk with respect to a reference population. The aim of this work is twofold. First, we explore and advocate the idea of predicting the time-dependent hazard and survival (defined as disease-free time) of an individual for the onset of a disease. This provides a practitioner with a much more differentiated view of absolute survival as a function of time. Second, to compute the time-dependent risk of an individual, we use published methodology to fit a Cox's proportional hazard model to data from a genetic SNP study of time to Alzheimer's disease (AD) onset, using the lasso to incorporate further epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status, 10 leading principal components, and selected genomic loci. We apply the lasso for Cox's proportional hazards to a data set of 6792 AD patients (composed of 4102 cases and 2690 controls) and 87 covariates. We demonstrate that fitting a lasso model for Cox's proportional hazards allows one to obtain more accurate survival curves than with state-of-the-art (likelihood-based) methods. Moreover, the methodology allows one to obtain personalized survival curves for a patient, thus giving a much more differentiated view of the expected progression of a disease than the view offered by integrated risk models. The runtime to compute personalized survival curves is under a minute for the entire data set of AD patients, thus enabling it to handle datasets with 60,000–100,000 subjects in less than 1 h.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141563235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistics to prioritize rare variants in family-based sequencing studies with disease subtypes 在基于疾病亚型的家族测序研究中优先考虑罕见变异的统计方法。
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-06-28 DOI: 10.1002/gepi.22579
Christina Nieuwoudt, Fabiha Binte Farooq, Angela Brooks-Wilson, Alexandre Bureau, Jinko Graham

Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent–child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.

以家族为基础的测序研究越来越多地用于发现具有家族聚集性疾病特征的高风险罕见遗传变异。在一些研究中,收集了具有多种疾病亚型的家族,并对受影响亲属的外显子组进行测序,以寻找共有的罕见变异体(RVs)。由于不同的家族可能携带不同的致病变异体,而每个家族又携带许多 RV,因此在这种研究设计中,检测致病变异体的测试功率可能较低。我们的目标是通过通路分析或功能研究等方法,优先选择共有变异进行进一步研究。传递失衡检验根据亲子三人组的孟德尔传递偏差来确定变异的优先次序。将这一想法推广到家族中,我们提出了一些方法来优先考虑两种疾病亚型(一种亚型的遗传性高于另一种亚型)的患病亲属中共享的 RV。全局方法以研究中观察到的变异为条件,并假定携带致病变异的概率是已知的。相比之下,局部方法以在特定家庭中观察到变异体为条件,以消除携带概率。我们的模拟结果表明,全局方法对载体概率的错误指定具有很强的鲁棒性,即使在载体概率被错误指定的情况下,全局方法也能比局部方法更有效地确定优先次序。
{"title":"Statistics to prioritize rare variants in family-based sequencing studies with disease subtypes","authors":"Christina Nieuwoudt,&nbsp;Fabiha Binte Farooq,&nbsp;Angela Brooks-Wilson,&nbsp;Alexandre Bureau,&nbsp;Jinko Graham","doi":"10.1002/gepi.22579","DOIUrl":"10.1002/gepi.22579","url":null,"abstract":"<p>Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent–child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"324-343"},"PeriodicalIF":1.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22579","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proteome-wide association study using cis and trans variants and applied to blood cell and lipid-related traits in the Women's Health Initiative study 利用顺式和反式变异进行全蛋白质组关联研究,并将其应用于妇女健康倡议研究中的血细胞和血脂相关特征。
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-06-28 DOI: 10.1002/gepi.22578
Brian D. Chen, Chanhwa Lee, Amanda L. Tapia, Alexander P. Reiner, Hua Tang, Charles Kooperberg, JoAnn E. Manson, Yun Li, Laura M. Raffield

In most Proteome-Wide Association Studies (PWAS), variants near the protein-coding gene (±1 Mb), also known as cis single nucleotide polymorphisms (SNPs), are used to predict protein levels, which are then tested for association with phenotypes. However, proteins can be regulated through variants outside of the cis region. An intermediate GWAS step to identify protein quantitative trait loci (pQTL) allows for the inclusion of trans SNPs outside the cis region in protein-level prediction models. Here, we assess the prediction of 540 proteins in 1002 individuals from the Women's Health Initiative (WHI), split equally into a GWAS set, an elastic net training set, and a testing set. We compared the testing r2 between measured and predicted protein levels using this proposed approach, to the testing r2 using only cis SNPs. The two methods usually resulted in similar testing r2, but some proteins showed a significant increase in testing r2 with our method. For example, for cartilage acidic protein 1, the testing r2 increased from 0.101 to 0.351. We also demonstrate reproducible findings for predicted protein association with lipid and blood cell traits in WHI participants without proteomics data and in UK Biobank utilizing our PWAS weights.

在大多数蛋白质组全关联研究(PWAS)中,蛋白质编码基因附近的变异(±1 Mb),也称为顺式单核苷酸多态性(SNPs),被用来预测蛋白质水平,然后检测其与表型的关联。然而,蛋白质可通过顺式区域外的变异进行调控。GWAS 鉴定蛋白质数量性状位点(pQTL)的中间步骤允许将顺式区域外的反式 SNP 纳入蛋白质水平预测模型中。在这里,我们评估了妇女健康倡议(WHI)中 1002 个个体中 540 个蛋白质的预测结果,这些个体被平均分成一个 GWAS 集、一个弹性网训练集和一个测试集。我们比较了使用这种拟议方法和仅使用顺式 SNPs 的测试 r2,以及测量和预测蛋白质水平之间的测试 r2。这两种方法通常会产生相似的测试 r2,但有些蛋白质在使用我们的方法后测试 r2 显著增加。例如,对于软骨酸性蛋白 1,检测 r2 从 0.101 增加到 0.351。我们还展示了在没有蛋白质组学数据的 WHI 参与者中以及在英国生物库中利用我们的 PWAS 权重预测蛋白质与血脂和血细胞特征相关性的重复性结果。
{"title":"Proteome-wide association study using cis and trans variants and applied to blood cell and lipid-related traits in the Women's Health Initiative study","authors":"Brian D. Chen,&nbsp;Chanhwa Lee,&nbsp;Amanda L. Tapia,&nbsp;Alexander P. Reiner,&nbsp;Hua Tang,&nbsp;Charles Kooperberg,&nbsp;JoAnn E. Manson,&nbsp;Yun Li,&nbsp;Laura M. Raffield","doi":"10.1002/gepi.22578","DOIUrl":"10.1002/gepi.22578","url":null,"abstract":"<p>In most Proteome-Wide Association Studies (PWAS), variants near the protein-coding gene (±1 Mb), also known as <i>cis</i> single nucleotide polymorphisms (SNPs), are used to predict protein levels, which are then tested for association with phenotypes. However, proteins can be regulated through variants outside of the cis region. An intermediate GWAS step to identify protein quantitative trait loci (pQTL) allows for the inclusion of trans SNPs outside the cis region in protein-level prediction models. Here, we assess the prediction of 540 proteins in 1002 individuals from the Women's Health Initiative (WHI), split equally into a GWAS set, an elastic net training set, and a testing set. We compared the testing <i>r</i><sup>2</sup> between measured and predicted protein levels using this proposed approach, to the testing <i>r</i><sup>2</sup> using only cis SNPs. The two methods usually resulted in similar testing <i>r</i><sup>2</sup>, but some proteins showed a significant increase in testing <i>r</i><sup>2</sup> with our method. For example, for cartilage acidic protein 1, the testing <i>r</i><sup>2</sup> increased from 0.101 to 0.351. We also demonstrate reproducible findings for predicted protein association with lipid and blood cell traits in WHI participants without proteomics data and in UK Biobank utilizing our PWAS weights.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"310-323"},"PeriodicalIF":1.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical joint analysis of marginal summary statistics—Part II: High-dimensional instrumental analysis of omics data 边际汇总统计的层次联合分析--第二部分:omics 数据的高维工具分析。
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-06-17 DOI: 10.1002/gepi.22577
Lai Jiang, Jiayi Shen, Burcu F. Darst, Christopher A. Haiman, Nicholas Mancuso, David V. Conti

Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.

工具变量(IV)分析已广泛应用于流行病学,利用观察数据推断因果关系。在孟德尔随机化和全转录组关联研究中,遗传变异也可被视为有效的工具变量。然而,大多数多变量 IV 方法无法扩展到高通量实验数据。在这里,我们利用之前工作的灵活性--联合分析边际汇总统计量的分层模型(hJAM)--建立了一个可扩展的框架(SHA-JAM),该框架可应用于大量中间产物和大量相关遗传变异--这是在利用 omic 技术的现代实验中经常遇到的情况。SHA-JAM旨在通过将单核苷酸多态性(SNP)-中间体或SNP-基因表达关联分析的估计值作为分层模型中的先验信息,估计高维风险因素对结果的条件效应。大量模拟研究结果表明,与现有的类似分析方法相比,SHA-JAM 的接收者操作特征曲线下面积(AUC)更大,估计值的均方误差更小,计算速度更快。在前列腺癌的两个应用实例中,我们使用来自超过 140,000 名男性前列腺癌 GWAS 的汇总统计数据以及代谢物和转录组的高维公开汇总数据,分别研究了代谢物和转录组之间的关联。
{"title":"Hierarchical joint analysis of marginal summary statistics—Part II: High-dimensional instrumental analysis of omics data","authors":"Lai Jiang,&nbsp;Jiayi Shen,&nbsp;Burcu F. Darst,&nbsp;Christopher A. Haiman,&nbsp;Nicholas Mancuso,&nbsp;David V. Conti","doi":"10.1002/gepi.22577","DOIUrl":"10.1002/gepi.22577","url":null,"abstract":"<p>Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"291-309"},"PeriodicalIF":1.7,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22577","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141418544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpreting disease genome-wide association studies and polygenetic risk scores given eligibility and study design considerations 考虑到资格和研究设计因素,解读疾病全基因组关联研究和多基因风险评分。
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-05-26 DOI: 10.1002/gepi.22567
Catherine Mary Schooling, Mary Beth Terry

Genome-wide association studies (GWAS) have been helpful in identifying genetic variants predicting cancer risk and providing new insights into cancer biology. Increasing use of genetically informed care, as well as genetically informed prevention and treatment strategies, have also drawn attention to some of the inherent limitations of cancer genetic data. Specifically, genetic endowment is lifelong. However, those recruited into cancer studies tend to be middle-aged or older people, meaning the exposure most likely starts before recruitment, as opposed to exposure and recruitment aligning, as in a trial or a target trial. Studies in survivors can be biased as a result of depletion of the susceptibles, here specifically due to genetic vulnerability and the cancer of interest or a competing risk. In addition, including prevalent cases in a case-control study will make the genetics of survival with cancer look harmful (Neyman bias). Here, we describe ways of designing GWAS to maximize explanatory power and predictive utility, by reducing selection bias due to only recruiting survivors and reducing Neyman bias due to including prevalent cases alongside using other techniques, such as selection diagrams, age-stratification, and Mendelian randomization, to facilitate GWAS interpretability and utility.

全基因组关联研究(GWAS)有助于确定预测癌症风险的基因变异,并为癌症生物学提供新的见解。越来越多地使用基因知情护理以及基因知情预防和治疗策略,也使人们注意到癌症基因数据的一些固有局限性。具体来说,基因禀赋是终身的。然而,癌症研究招募的对象往往是中年人或老年人,这意味着暴露很可能在招募之前就开始了,而不是像试验或目标试验那样,暴露和招募是一致的。对幸存者的研究可能会因为易感人群的减少而产生偏差,这里的易感人群是指遗传易感性和相关癌症或竞争性风险。此外,在病例对照研究中纳入流行病例会使癌症生存遗传学看起来有害(奈曼偏倚)。在此,我们将介绍如何设计全球基因组研究,以最大限度地提高解释力和预测效用,具体方法是减少因仅招募幸存者而产生的选择偏倚,减少因纳入流行病例而产生的奈曼偏倚,同时使用其他技术,如选择图、年龄分层和孟德尔随机化,以促进全球基因组研究的可解释性和效用。
{"title":"Interpreting disease genome-wide association studies and polygenetic risk scores given eligibility and study design considerations","authors":"Catherine Mary Schooling,&nbsp;Mary Beth Terry","doi":"10.1002/gepi.22567","DOIUrl":"10.1002/gepi.22567","url":null,"abstract":"<p>Genome-wide association studies (GWAS) have been helpful in identifying genetic variants predicting cancer risk and providing new insights into cancer biology. Increasing use of genetically informed care, as well as genetically informed prevention and treatment strategies, have also drawn attention to some of the inherent limitations of cancer genetic data. Specifically, genetic endowment is lifelong. However, those recruited into cancer studies tend to be middle-aged or older people, meaning the exposure most likely starts before recruitment, as opposed to exposure and recruitment aligning, as in a trial or a target trial. Studies in survivors can be biased as a result of depletion of the susceptibles, here specifically due to genetic vulnerability and the cancer of interest or a competing risk. In addition, including prevalent cases in a case-control study will make the genetics of survival with cancer look harmful (Neyman bias). Here, we describe ways of designing GWAS to maximize explanatory power and predictive utility, by reducing selection bias due to only recruiting survivors and reducing Neyman bias due to including prevalent cases alongside using other techniques, such as selection diagrams, age-stratification, and Mendelian randomization, to facilitate GWAS interpretability and utility.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"468-472"},"PeriodicalIF":1.7,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141155102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma 利用联合稀疏典型相关分析确定与疾病结果相关的基因--在肾透明细胞癌中的应用
IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Pub Date : 2024-05-15 DOI: 10.1002/gepi.22566
Diptavo Dutta, Ananda Sen, Jaya M. Satagopan

Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing “gene component scores” and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.

拷贝数畸变(CNAs)等体细胞变化和甲基化等表观遗传学改变通过调控基因表达对癌症的疾病预后和预后有关键影响,而基因表达则驱动着关键的生物过程。为了确定潜在的生物标记物和分子靶标,并了解它们如何影响疾病预后,必须通过联合综合分析确定关键的 CNAs 组、相关的甲基化及其影响的基因表达。在这里,我们提出了一种新的分析管道,即联合稀疏典型相关分析(jsCCA),它是 sCCA 的扩展,可有效识别疾病终点(尤其是肿瘤特征)背景下的 CNAs、甲基化位点和基因(表达)成分组合。我们的方法能检测出与甲基化位点集高度相关的潜在正交基因成分,而甲基化位点集又与 CNA 位点集相关。然后找出这些成分中与结果相关的基因。此外,我们还通过构建 "基因成分分数 "来汇总每个基因表达集对肿瘤分期的影响,并测试其与传统风险因素的相互作用。通过分析 TCGA-KIRC 中 515 名肾透明细胞癌(ccRCC)患者的临床和基因组数据,我们发现有八个基因成分与甲基化位点相关,并受到近端 CNA 位点组的调控。与诊断时肿瘤分期的关联分析发现了一种新的关联,即 ASAH1 基因的表达受包括 SIX5 在内的几个基因的甲基化和包括 TCF7L2 在内的 10q25 区域的 CNAs 的转调。为量化基因组对肿瘤分期的整体影响而进行的进一步分析表明,在八个基因成分中,有两个与吸烟在肿瘤分期上有显著的相互作用。这些基因成分代表了不同的生物功能,包括免疫功能、炎症反应和缺氧调控通路。我们的研究结果表明,jsCCA 分析可以识别可解释的重要基因、调控结构和临床后果通路。这种方法适用于多模态数据的综合分析,尤其是在癌症基因组学领域。
{"title":"Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma","authors":"Diptavo Dutta,&nbsp;Ananda Sen,&nbsp;Jaya M. Satagopan","doi":"10.1002/gepi.22566","DOIUrl":"10.1002/gepi.22566","url":null,"abstract":"<p>Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing “gene component scores” and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of <i>ASAH1</i> gene trans-regulated by methylation of several genes including <i>SIX5</i> and by CNAs in the 10q25 region including <i>TCF7L2</i>. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"414-432"},"PeriodicalIF":1.7,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22566","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140943393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genetic Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1