Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers

IF 1.7 4区 医学 Q3 GENETICS & HEREDITY Genetic Epidemiology Pub Date : 2023-10-11 DOI:10.1002/gepi.22537
Zhongyuan Chen, Han Liang, Peng Wei
{"title":"Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers","authors":"Zhongyuan Chen,&nbsp;Han Liang,&nbsp;Peng Wei","doi":"10.1002/gepi.22537","DOIUrl":null,"url":null,"abstract":"<p>Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case–control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels. <math>\n <semantics>\n <mrow>\n <mi>p</mi>\n </mrow>\n <annotation> $p$</annotation>\n </semantics></math>-values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"617-636"},"PeriodicalIF":1.7000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22537","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case–control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels. p $p$ -values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于人类癌症体细胞突变和种系变异之间关联研究的数据适应性和基于途径的测试。
癌症是一种由遗传基因变异和体细胞突变共同驱动的疾病。最近可获得的癌症基因组的大规模测序数据为研究它们之间的相互作用提供了前所未有的机会。然而,以前关于这一主题的研究受到简单、低统计幂检验(如Fisher精确检验)的限制。在本文中,我们设计了基于得分统计的数据自适应和基于路径的测试,用于体细胞突变和种系变异之间的关联研究。先前的研究表明,在一项病例对照研究中,两种基于单核苷酸多态性(SNP)集的关联测试,即自适应总分(aSPU)和基于数据自适应通路(aSUPath)的测试,提高了与单一疾病特征的全基因组关联研究(GWAS)的能力。我们将aSPU和aSUPath扩展到多性状,即队列研究中多个基因的体细胞突变,允许在SNP和基因水平上进行广泛的信息聚合。p$p$-假设不同的遗传结构,将来自不同参数的值组合起来,以产生体细胞突变和种系变异的数据适应性测试。广泛的模拟表明,与一些常用的方法相比,我们的数据适应性体细胞突变/种系变异测试可以应用于多个种系SNPs/基因/途径,并且通常具有更高的统计能力,同时保持适当的I型误差。拟议的测试应用于一个由2583名受试者组成的大型现实世界国际癌症基因组联合会全基因组测序数据集,在基因和途径水平上检测到与其他现有方法相比更显著和生物学相关的关联。我们的研究系统地确定了不同癌症类型的各种种系变异和体细胞突变之间的关联,这可能为癌症风险预测、预后和治疗提供有价值的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genetic Epidemiology
Genetic Epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
4.40
自引率
9.50%
发文量
49
审稿时长
6-12 weeks
期刊介绍: Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations. Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.
期刊最新文献
Genetic Associations of Persistent Opioid Use After Surgery Point to OPRM1 but Not Other Opioid-Related Loci as the Main Driver of Opioid Use Disorder. Bayesian Effect Size Ranking to Prioritise Genetic Risk Variants in Common Diseases for Follow-Up Studies. Using Family History Data to Improve the Power of Association Studies: Application to Cancer in UK Biobank. Issue Information Issue Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1