Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation

IF 31.7 1区 生物学 Q1 GENETICS & HEREDITY Nature genetics Pub Date : 2025-01-08 DOI:10.1038/s41588-024-02053-6
Johannes Linder, Divyanshi Srivastava, Han Yuan, Vikram Agarwal, David R. Kelley
{"title":"Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation","authors":"Johannes Linder, Divyanshi Srivastava, Han Yuan, Vikram Agarwal, David R. Kelley","doi":"10.1038/s41588-024-02053-6","DOIUrl":null,"url":null,"abstract":"<p>Sequence-based machine-learning models trained on genomics data improve genetic variant interpretation by providing functional predictions describing their impact on the <i>cis</i>-regulatory code. However, current tools do not predict RNA-seq expression profiles because of modeling challenges. Here, we introduce Borzoi, a model that learns to predict cell-type-specific and tissue-specific RNA-seq coverage from DNA sequence. Using statistics derived from Borzoi’s predicted coverage, we isolate and accurately score DNA variant effects across multiple layers of regulation, including transcription, splicing and polyadenylation. Evaluated on quantitative trait loci, Borzoi is competitive with and often outperforms state-of-the-art models trained on individual regulatory functions. By applying attribution methods to the derived statistics, we extract <i>cis</i>-regulatory motifs driving RNA expression and post-transcriptional regulation in normal tissues. The wide availability of RNA-seq data across species, conditions and assays profiling specific aspects of regulation emphasizes the potential of this approach to decipher the mapping from DNA sequence to regulatory function.</p>","PeriodicalId":18985,"journal":{"name":"Nature genetics","volume":"66 1","pages":""},"PeriodicalIF":31.7000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41588-024-02053-6","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Sequence-based machine-learning models trained on genomics data improve genetic variant interpretation by providing functional predictions describing their impact on the cis-regulatory code. However, current tools do not predict RNA-seq expression profiles because of modeling challenges. Here, we introduce Borzoi, a model that learns to predict cell-type-specific and tissue-specific RNA-seq coverage from DNA sequence. Using statistics derived from Borzoi’s predicted coverage, we isolate and accurately score DNA variant effects across multiple layers of regulation, including transcription, splicing and polyadenylation. Evaluated on quantitative trait loci, Borzoi is competitive with and often outperforms state-of-the-art models trained on individual regulatory functions. By applying attribution methods to the derived statistics, we extract cis-regulatory motifs driving RNA expression and post-transcriptional regulation in normal tissues. The wide availability of RNA-seq data across species, conditions and assays profiling specific aspects of regulation emphasizes the potential of this approach to decipher the mapping from DNA sequence to regulatory function.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从DNA序列预测RNA-seq覆盖作为基因调控的统一模型
基于序列的机器学习模型经过基因组学数据的训练,通过提供描述基因变异对顺式调控代码影响的功能预测,改善了基因变异的解释。然而,由于建模方面的挑战,目前的工具不能预测RNA-seq表达谱。在这里,我们介绍Borzoi,这是一种学习从DNA序列预测细胞类型特异性和组织特异性RNA-seq覆盖的模型。利用来自Borzoi预测覆盖率的统计数据,我们分离并准确地评分了多个调控层的DNA变异效应,包括转录、剪接和聚腺苷化。通过对数量性状位点的评估,猎狼犬可以与最先进的个体调节功能模型相竞争,而且往往优于它们。通过将归因方法应用于导出的统计数据,我们提取了正常组织中驱动RNA表达和转录后调控的顺式调控基序。RNA-seq数据在物种、条件和分析中广泛可用,分析了调控的特定方面,强调了这种方法在破译从DNA序列到调控功能的映射方面的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Nature genetics
Nature genetics 生物-遗传学
CiteScore
43.00
自引率
2.60%
发文量
241
审稿时长
3 months
期刊介绍: Nature Genetics publishes the very highest quality research in genetics. It encompasses genetic and functional genomic studies on human and plant traits and on other model organisms. Current emphasis is on the genetic basis for common and complex diseases and on the functional mechanism, architecture and evolution of gene networks, studied by experimental perturbation. Integrative genetic topics comprise, but are not limited to: -Genes in the pathology of human disease -Molecular analysis of simple and complex genetic traits -Cancer genetics -Agricultural genomics -Developmental genetics -Regulatory variation in gene expression -Strategies and technologies for extracting function from genomic data -Pharmacological genomics -Genome evolution
期刊最新文献
Author Correction: Pangenome graphs and their applications in biodiversity genomics Mass spectrometry-based mapping of plasma protein QTLs in children and adolescents Safeguard repressor locks hepatocyte identity and blocks liver cancer Plasma proteome variation and its genetic determinants in children and adolescents ImmuneLENS characterizes systemic immune dysregulation in aging and cancer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1