与单基因遗传病相关的基因中的预测功能缺失变异集群可解释不完全渗透性

IF 10.4 1区 生物学 Q1 GENETICS & HEREDITY Genome Medicine Pub Date : 2024-04-26 DOI:10.1186/s13073-024-01333-4
Robin N. Beaumont, Gareth Hawkes, Adam C. Gunning, Caroline F. Wright
{"title":"与单基因遗传病相关的基因中的预测功能缺失变异集群可解释不完全渗透性","authors":"Robin N. Beaumont, Gareth Hawkes, Adam C. Gunning, Caroline F. Wright","doi":"10.1186/s13073-024-01333-4","DOIUrl":null,"url":null,"abstract":"Genetic variants that severely alter protein products (e.g. nonsense, frameshift) are often associated with disease. For some genes, these predicted loss-of-function variants (pLoFs) are observed throughout the gene, whilst in others, they occur only at specific locations. We hypothesised that, for genes linked with monogenic diseases that display incomplete penetrance, pLoF variants present in apparently unaffected individuals may be limited to regions where pLoFs are tolerated. To test this, we investigated whether pLoF location could explain instances of incomplete penetrance of variants expected to be pathogenic for Mendelian conditions. We used exome sequence data in 454,773 individuals in the UK Biobank (UKB) to investigate the locations of pLoFs in a population cohort. We counted numbers of unique pLoF, missense, and synonymous variants in UKB in each quintile of the coding sequence (CDS) of all protein-coding genes and clustered the variants using Gaussian mixture models. We limited the analyses to genes with ≥ 5 variants of each type (16,473 genes). We compared the locations of pLoFs in UKB with all theoretically possible pLoFs in a transcript, and pathogenic pLoFs from ClinVar, and performed simulations to estimate the false-positive rate of non-uniformly distributed variants. For most genes, all variant classes fell into clusters representing broadly uniform variant distributions, but genes in which haploinsufficiency causes developmental disorders were less likely to have uniform pLoF distribution than other genes (P < 2.2 × 10−6). We identified a number of genes, including ARID1B and GATA6, where pLoF variants in the first quarter of the CDS were rescued by the presence of an alternative translation start site and should not be reported as pathogenic. For other genes, such as ODC1, pLoFs were located approximately uniformly across the gene, but pathogenic pLoFs were clustered only at the end, consistent with a gain-of-function disease mechanism. Our results suggest the potential benefits of localised constraint metrics and that the location of pLoF variants should be considered when interpreting variants.","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":null,"pages":null},"PeriodicalIF":10.4000,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clustering of predicted loss-of-function variants in genes linked with monogenic disease can explain incomplete penetrance\",\"authors\":\"Robin N. Beaumont, Gareth Hawkes, Adam C. Gunning, Caroline F. Wright\",\"doi\":\"10.1186/s13073-024-01333-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genetic variants that severely alter protein products (e.g. nonsense, frameshift) are often associated with disease. For some genes, these predicted loss-of-function variants (pLoFs) are observed throughout the gene, whilst in others, they occur only at specific locations. We hypothesised that, for genes linked with monogenic diseases that display incomplete penetrance, pLoF variants present in apparently unaffected individuals may be limited to regions where pLoFs are tolerated. To test this, we investigated whether pLoF location could explain instances of incomplete penetrance of variants expected to be pathogenic for Mendelian conditions. We used exome sequence data in 454,773 individuals in the UK Biobank (UKB) to investigate the locations of pLoFs in a population cohort. We counted numbers of unique pLoF, missense, and synonymous variants in UKB in each quintile of the coding sequence (CDS) of all protein-coding genes and clustered the variants using Gaussian mixture models. We limited the analyses to genes with ≥ 5 variants of each type (16,473 genes). We compared the locations of pLoFs in UKB with all theoretically possible pLoFs in a transcript, and pathogenic pLoFs from ClinVar, and performed simulations to estimate the false-positive rate of non-uniformly distributed variants. For most genes, all variant classes fell into clusters representing broadly uniform variant distributions, but genes in which haploinsufficiency causes developmental disorders were less likely to have uniform pLoF distribution than other genes (P < 2.2 × 10−6). We identified a number of genes, including ARID1B and GATA6, where pLoF variants in the first quarter of the CDS were rescued by the presence of an alternative translation start site and should not be reported as pathogenic. For other genes, such as ODC1, pLoFs were located approximately uniformly across the gene, but pathogenic pLoFs were clustered only at the end, consistent with a gain-of-function disease mechanism. Our results suggest the potential benefits of localised constraint metrics and that the location of pLoF variants should be considered when interpreting variants.\",\"PeriodicalId\":12645,\"journal\":{\"name\":\"Genome Medicine\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2024-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Medicine\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13073-024-01333-4\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-024-01333-4","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

严重改变蛋白质产物的基因变异(如无意义变异、框架移位)往往与疾病有关。对于某些基因来说,这些预测的功能缺失变异(pLoFs)会在整个基因中出现,而对于其他基因来说,它们只出现在特定的位置。我们假设,对于与显示出不完全渗透性的单基因疾病相关的基因,表面上未受影响的个体中出现的 pLoF 变异可能仅限于 pLoFs 可容忍的区域。为了验证这一点,我们研究了 pLoF 的位置是否能解释孟德尔疾病致病变体的不完全渗透性。我们利用英国生物库(UKB)中 454,773 个个体的外显子组序列数据,研究了 pLoFs 在人群中的位置。我们统计了英国生物库中所有蛋白编码基因的编码序列(CDS)中每个五分位数的独特 pLoF、错义和同义变异的数量,并使用高斯混合模型对变异进行了聚类。我们将分析局限于每种类型变异≥ 5 个的基因(16,473 个基因)。我们将 UKB 中 pLoFs 的位置与转录本中所有理论上可能出现的 pLoFs 和 ClinVar 中的致病 pLoFs 进行了比较,并进行了模拟以估计非均匀分布变异的假阳性率。对于大多数基因来说,所有变异类别都属于代表大致均匀变异分布的群组,但单倍体缺乏导致发育障碍的基因比其他基因更不可能具有均匀的 pLoF 分布(P < 2.2 × 10-6)。我们发现了一些基因,包括 ARID1B 和 GATA6,在这些基因中,CDS 前四分之一处的 pLoF 变异因存在替代翻译起始位点而得到了挽救,因此不应被报告为致病基因。对于其他基因,如 ODC1,pLoFs 在整个基因中的位置大致均匀,但致病性 pLoFs 只聚集在基因末端,这与功能增益疾病机制一致。我们的研究结果表明了局部限制度量的潜在好处,在解释变异时应考虑 pLoF 变异的位置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Clustering of predicted loss-of-function variants in genes linked with monogenic disease can explain incomplete penetrance
Genetic variants that severely alter protein products (e.g. nonsense, frameshift) are often associated with disease. For some genes, these predicted loss-of-function variants (pLoFs) are observed throughout the gene, whilst in others, they occur only at specific locations. We hypothesised that, for genes linked with monogenic diseases that display incomplete penetrance, pLoF variants present in apparently unaffected individuals may be limited to regions where pLoFs are tolerated. To test this, we investigated whether pLoF location could explain instances of incomplete penetrance of variants expected to be pathogenic for Mendelian conditions. We used exome sequence data in 454,773 individuals in the UK Biobank (UKB) to investigate the locations of pLoFs in a population cohort. We counted numbers of unique pLoF, missense, and synonymous variants in UKB in each quintile of the coding sequence (CDS) of all protein-coding genes and clustered the variants using Gaussian mixture models. We limited the analyses to genes with ≥ 5 variants of each type (16,473 genes). We compared the locations of pLoFs in UKB with all theoretically possible pLoFs in a transcript, and pathogenic pLoFs from ClinVar, and performed simulations to estimate the false-positive rate of non-uniformly distributed variants. For most genes, all variant classes fell into clusters representing broadly uniform variant distributions, but genes in which haploinsufficiency causes developmental disorders were less likely to have uniform pLoF distribution than other genes (P < 2.2 × 10−6). We identified a number of genes, including ARID1B and GATA6, where pLoF variants in the first quarter of the CDS were rescued by the presence of an alternative translation start site and should not be reported as pathogenic. For other genes, such as ODC1, pLoFs were located approximately uniformly across the gene, but pathogenic pLoFs were clustered only at the end, consistent with a gain-of-function disease mechanism. Our results suggest the potential benefits of localised constraint metrics and that the location of pLoF variants should be considered when interpreting variants.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genome Medicine
Genome Medicine GENETICS & HEREDITY-
CiteScore
20.80
自引率
0.80%
发文量
128
审稿时长
6-12 weeks
期刊介绍: Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.
期刊最新文献
Curating genomic disease-gene relationships with Gene2Phenotype (G2P). Circular RNA landscape in extracellular vesicles from human biofluids. Cardiomyopathies in 100,000 genomes project: interval evaluation improves diagnostic yield and informs strategies for ongoing gene discovery. Developmental-status-aware transcriptional decomposition establishes a cell state panorama of human cancers. A genome-based survey of invasive pneumococci in Norway over four decades reveals lineage-specific responses to vaccination.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1