Identifying deleterious noncoding variation through gain and loss of CTCF binding activity

Colby Tubbs, Mary Lauren Benton, Evonne McArthur, John A. Capra, Douglas M. Ruderfer
{"title":"Identifying deleterious noncoding variation through gain and loss of CTCF binding activity","authors":"Colby Tubbs, Mary Lauren Benton, Evonne McArthur, John A. Capra, Douglas M. Ruderfer","doi":"10.1101/2024.09.04.609712","DOIUrl":null,"url":null,"abstract":"Noncoding single nucleotide variants are the predominant class of genetic variation in whole genome sequencing and are key drivers of phenotypic variation. However, their functional annotation remains challenging. To address this, we develop a hypothesis-driven functional annotation scheme for CTCF binding sites given CTCFs critical roles in gene regulation and extensive profiling in regulatory datasets. We synthesize CTCFs binding patterns at 1,063,879 genomic loci across 214 biological contexts into a summary metric, which we refer to as binding activity. We find that binding activity is significantly enriched for both conserved nucleotides (Pearson R = 0.31, p < 2.2 x 10-16) and sequences that contain high-quality CTCF binding motifs (Pearson R = 0.63, p = 2.9 x 10-12). We then integrate binding activity with high confidence change in precision weight matrix scores. By applying this framework to 1,253,330 SNVs in gnomAD, we explore signatures of selection acting against the disruption of CTCF binding. We find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity (Pearson R = 0.67, p = 1.5 x 10-14). To contextualize these findings, we apply MAPS to other functional classes of variation and find that a subset of 198,149 loss of CTCF binding variants are observed as infrequently as missense variants. This work implicates these thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for the interpretable annotation of noncoding variants.","PeriodicalId":501246,"journal":{"name":"bioRxiv - Genetics","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.04.609712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Noncoding single nucleotide variants are the predominant class of genetic variation in whole genome sequencing and are key drivers of phenotypic variation. However, their functional annotation remains challenging. To address this, we develop a hypothesis-driven functional annotation scheme for CTCF binding sites given CTCFs critical roles in gene regulation and extensive profiling in regulatory datasets. We synthesize CTCFs binding patterns at 1,063,879 genomic loci across 214 biological contexts into a summary metric, which we refer to as binding activity. We find that binding activity is significantly enriched for both conserved nucleotides (Pearson R = 0.31, p < 2.2 x 10-16) and sequences that contain high-quality CTCF binding motifs (Pearson R = 0.63, p = 2.9 x 10-12). We then integrate binding activity with high confidence change in precision weight matrix scores. By applying this framework to 1,253,330 SNVs in gnomAD, we explore signatures of selection acting against the disruption of CTCF binding. We find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity (Pearson R = 0.67, p = 1.5 x 10-14). To contextualize these findings, we apply MAPS to other functional classes of variation and find that a subset of 198,149 loss of CTCF binding variants are observed as infrequently as missense variants. This work implicates these thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for the interpretable annotation of noncoding variants.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过 CTCF 结合活性的增减识别有害的非编码变异
非编码单核苷酸变异是全基因组测序中最主要的一类遗传变异,也是表型变异的主要驱动因素。然而,对它们进行功能注释仍然具有挑战性。鉴于 CTCF 在基因调控中的关键作用以及调控数据集的广泛剖析,我们开发了一种假设驱动的 CTCF 结合位点功能注释方案。我们将 214 种生物背景下 1,063,879 个基因组位点上的 CTCFs 结合模式综合成一个总结性指标,我们称之为结合活性。我们发现,结合活性在保守核苷酸(Pearson R = 0.31,p < 2.2 x 10-16)和包含高质量 CTCF 结合图案的序列(Pearson R = 0.63,p = 2.9 x 10-12)中都有明显的富集。然后,我们将结合活性与精确度权重矩阵得分中的高置信度变化进行整合。通过将这一框架应用于 gnomAD 中的 1,253,330 个 SNVs,我们探索了针对 CTCF 结合破坏的选择特征。我们发现,在具有高体外活性的基因位点上,突变性调整单体比例(MAPS)指标与 CTCF 结合的丧失之间存在着强烈的正相关关系(Pearson R = 0.67,p = 1.5 x 10-14)。为了说明这些发现的背景,我们将 MAPS 应用于其他功能类变异,发现在 198,149 个 CTCF 结合丧失变异中,有一个子集与错义变异一样不常被观察到。这项工作将这数千个破坏 CTCF 结合的罕见非编码变异与进一步的功能研究联系起来,同时为非编码变异的可解释性注释提供了一个蓝图。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multiplexed spatial mapping of chromatin features, transcriptome, and proteins in tissues Mitochondrial superoxide acts in the intestine to extend longevity AyurPhenoClusters define common molecular roots for rare diseases and uncover ciliary dysfunctions in syndromic conditions Screening and identification of gene expression in large cohorts of clinical lung cancer samples unveils the major involvement of EZH2 and SOX2 LncRNA TAAL is a Modulator of Tie1-Mediated Vascular Function in Diabetic Retinopathy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1