Statistical framework for calling allelic imbalance in high-throughput sequencing data

IF 15.7 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Nature Communications Pub Date : 2025-02-18 DOI:10.1038/s41467-024-55513-2
Andrey Buyan, Georgy Meshcheryakov, Viacheslav Safronov, Sergey Abramov, Alexandr Boytsov, Vladimir Nozdrin, Eugene F. Baulin, Semyon Kolmykov, Jeff Vierstra, Fedor Kolpakov, Vsevolod J. Makeev, Ivan V. Kulakovskiy
{"title":"Statistical framework for calling allelic imbalance in high-throughput sequencing data","authors":"Andrey Buyan, Georgy Meshcheryakov, Viacheslav Safronov, Sergey Abramov, Alexandr Boytsov, Vladimir Nozdrin, Eugene F. Baulin, Semyon Kolmykov, Jeff Vierstra, Fedor Kolpakov, Vsevolod J. Makeev, Ivan V. Kulakovskiy","doi":"10.1038/s41467-024-55513-2","DOIUrl":null,"url":null,"abstract":"<p>High-throughput sequencing facilitates large-scale studies of gene regulation and allows tracing the associations of individual genomic variants with changes in gene regulation and expression. Compared to classic association studies, the assessment of an allelic imbalance at heterozygous variants captures functional variant effects with smaller sample sizes, higher sensitivity, and better resolution. Yet, identification of allele-specific variants from allelic read counts remains challenging due to data-dependent biases and overdispersion arising from technical and biological variability. We present MIXALIME, a novel computational framework for calling allele-specific variants in diverse omics data with a repertoire of statistical models accounting for read mapping bias and copy number variation. We benchmark MIXALIME with DNase-Seq, ATAC-Seq, and CAGE-Seq data, and we demonstrate that the allelic imbalance highlights causal variants in GWAS results. Finally, as a showcase of the large-scale practical application of MIXALIME, we present an atlas of variants exhibiting allele-specific chromatin accessibility, built from thousands of available datasets obtained from diverse cell types.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"122 32 1","pages":""},"PeriodicalIF":15.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-024-55513-2","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

High-throughput sequencing facilitates large-scale studies of gene regulation and allows tracing the associations of individual genomic variants with changes in gene regulation and expression. Compared to classic association studies, the assessment of an allelic imbalance at heterozygous variants captures functional variant effects with smaller sample sizes, higher sensitivity, and better resolution. Yet, identification of allele-specific variants from allelic read counts remains challenging due to data-dependent biases and overdispersion arising from technical and biological variability. We present MIXALIME, a novel computational framework for calling allele-specific variants in diverse omics data with a repertoire of statistical models accounting for read mapping bias and copy number variation. We benchmark MIXALIME with DNase-Seq, ATAC-Seq, and CAGE-Seq data, and we demonstrate that the allelic imbalance highlights causal variants in GWAS results. Finally, as a showcase of the large-scale practical application of MIXALIME, we present an atlas of variants exhibiting allele-specific chromatin accessibility, built from thousands of available datasets obtained from diverse cell types.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高通量测序数据中调用等位基因失衡的统计框架
高通量测序促进了基因调控的大规模研究,并允许追踪个体基因组变异与基因调控和表达变化的关联。与经典的关联研究相比,杂合变异的等位基因失衡评估以更小的样本量、更高的灵敏度和更好的分辨率捕获功能性变异效应。然而,由于技术和生物变异引起的数据依赖偏差和过度分散,从等位基因读取计数中识别等位基因特异性变异仍然具有挑战性。我们提出MIXALIME,这是一个新的计算框架,用于调用不同组学数据中的等位基因特异性变异,并具有一系列统计模型,用于考虑读映射偏差和拷贝数变化。我们用dna - seq、ATAC-Seq和CAGE-Seq数据对MIXALIME进行基准测试,并证明等位基因失衡突出了GWAS结果中的因果变异。最后,作为MIXALIME大规模实际应用的展示,我们展示了一个展示等位基因特异性染色质可及性的变异图谱,该图谱基于从不同细胞类型获得的数千个可用数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Nature Communications
Nature Communications Biological Science Disciplines-
CiteScore
24.90
自引率
2.40%
发文量
6928
审稿时长
3.7 months
期刊介绍: Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.
期刊最新文献
Topologically reconstructing Pancharatnam-Berry phase via encircling exceptional point for chiral spin-orbit interaction steering. Extensive enhancer crosstalk controls PPARG2 activation during adipogenesis. Modifying muscle metabolic dysregulation in inclusion body myositis with pioglitazone: a single-arm trial. Sustained hydrogen peroxide production via MXene-functionalized supramolecular docking. A topographical organization in the primary olfactory cortex.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1