Colby Tubbs, Mary Lauren Benton, Evonne McArthur, John A. Capra, Douglas M. Ruderfer
{"title":"通过 CTCF 结合活性的增减识别有害的非编码变异","authors":"Colby Tubbs, Mary Lauren Benton, Evonne McArthur, John A. Capra, Douglas M. Ruderfer","doi":"10.1101/2024.09.04.609712","DOIUrl":null,"url":null,"abstract":"Noncoding single nucleotide variants are the predominant class of genetic variation in whole genome sequencing and are key drivers of phenotypic variation. However, their functional annotation remains challenging. To address this, we develop a hypothesis-driven functional annotation scheme for CTCF binding sites given CTCFs critical roles in gene regulation and extensive profiling in regulatory datasets. We synthesize CTCFs binding patterns at 1,063,879 genomic loci across 214 biological contexts into a summary metric, which we refer to as binding activity. We find that binding activity is significantly enriched for both conserved nucleotides (Pearson R = 0.31, p < 2.2 x 10-16) and sequences that contain high-quality CTCF binding motifs (Pearson R = 0.63, p = 2.9 x 10-12). We then integrate binding activity with high confidence change in precision weight matrix scores. By applying this framework to 1,253,330 SNVs in gnomAD, we explore signatures of selection acting against the disruption of CTCF binding. We find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity (Pearson R = 0.67, p = 1.5 x 10-14). To contextualize these findings, we apply MAPS to other functional classes of variation and find that a subset of 198,149 loss of CTCF binding variants are observed as infrequently as missense variants. This work implicates these thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for the interpretable annotation of noncoding variants.","PeriodicalId":501246,"journal":{"name":"bioRxiv - Genetics","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying deleterious noncoding variation through gain and loss of CTCF binding activity\",\"authors\":\"Colby Tubbs, Mary Lauren Benton, Evonne McArthur, John A. Capra, Douglas M. Ruderfer\",\"doi\":\"10.1101/2024.09.04.609712\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Noncoding single nucleotide variants are the predominant class of genetic variation in whole genome sequencing and are key drivers of phenotypic variation. However, their functional annotation remains challenging. To address this, we develop a hypothesis-driven functional annotation scheme for CTCF binding sites given CTCFs critical roles in gene regulation and extensive profiling in regulatory datasets. We synthesize CTCFs binding patterns at 1,063,879 genomic loci across 214 biological contexts into a summary metric, which we refer to as binding activity. We find that binding activity is significantly enriched for both conserved nucleotides (Pearson R = 0.31, p < 2.2 x 10-16) and sequences that contain high-quality CTCF binding motifs (Pearson R = 0.63, p = 2.9 x 10-12). We then integrate binding activity with high confidence change in precision weight matrix scores. By applying this framework to 1,253,330 SNVs in gnomAD, we explore signatures of selection acting against the disruption of CTCF binding. We find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity (Pearson R = 0.67, p = 1.5 x 10-14). To contextualize these findings, we apply MAPS to other functional classes of variation and find that a subset of 198,149 loss of CTCF binding variants are observed as infrequently as missense variants. This work implicates these thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for the interpretable annotation of noncoding variants.\",\"PeriodicalId\":501246,\"journal\":{\"name\":\"bioRxiv - Genetics\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Genetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.09.04.609712\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.04.609712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
非编码单核苷酸变异是全基因组测序中最主要的一类遗传变异,也是表型变异的主要驱动因素。然而,对它们进行功能注释仍然具有挑战性。鉴于 CTCF 在基因调控中的关键作用以及调控数据集的广泛剖析,我们开发了一种假设驱动的 CTCF 结合位点功能注释方案。我们将 214 种生物背景下 1,063,879 个基因组位点上的 CTCFs 结合模式综合成一个总结性指标,我们称之为结合活性。我们发现,结合活性在保守核苷酸(Pearson R = 0.31,p < 2.2 x 10-16)和包含高质量 CTCF 结合图案的序列(Pearson R = 0.63,p = 2.9 x 10-12)中都有明显的富集。然后,我们将结合活性与精确度权重矩阵得分中的高置信度变化进行整合。通过将这一框架应用于 gnomAD 中的 1,253,330 个 SNVs,我们探索了针对 CTCF 结合破坏的选择特征。我们发现,在具有高体外活性的基因位点上,突变性调整单体比例(MAPS)指标与 CTCF 结合的丧失之间存在着强烈的正相关关系(Pearson R = 0.67,p = 1.5 x 10-14)。为了说明这些发现的背景,我们将 MAPS 应用于其他功能类变异,发现在 198,149 个 CTCF 结合丧失变异中,有一个子集与错义变异一样不常被观察到。这项工作将这数千个破坏 CTCF 结合的罕见非编码变异与进一步的功能研究联系起来,同时为非编码变异的可解释性注释提供了一个蓝图。
Identifying deleterious noncoding variation through gain and loss of CTCF binding activity
Noncoding single nucleotide variants are the predominant class of genetic variation in whole genome sequencing and are key drivers of phenotypic variation. However, their functional annotation remains challenging. To address this, we develop a hypothesis-driven functional annotation scheme for CTCF binding sites given CTCFs critical roles in gene regulation and extensive profiling in regulatory datasets. We synthesize CTCFs binding patterns at 1,063,879 genomic loci across 214 biological contexts into a summary metric, which we refer to as binding activity. We find that binding activity is significantly enriched for both conserved nucleotides (Pearson R = 0.31, p < 2.2 x 10-16) and sequences that contain high-quality CTCF binding motifs (Pearson R = 0.63, p = 2.9 x 10-12). We then integrate binding activity with high confidence change in precision weight matrix scores. By applying this framework to 1,253,330 SNVs in gnomAD, we explore signatures of selection acting against the disruption of CTCF binding. We find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity (Pearson R = 0.67, p = 1.5 x 10-14). To contextualize these findings, we apply MAPS to other functional classes of variation and find that a subset of 198,149 loss of CTCF binding variants are observed as infrequently as missense variants. This work implicates these thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for the interpretable annotation of noncoding variants.