DNA 呼吸与深度学习基础模型的整合推进了人类转录因子的全基因组结合预测

IF 16.6 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Nucleic Acids Research Pub Date : 2024-09-14 DOI:10.1093/nar/gkae783
Anowarul Kabir, Manish Bhattarai, Selma Peterson, Yonatan Najman-Licht, Kim Ø Rasmussen, Amarda Shehu, Alan R Bishop, Boian Alexandrov, Anny Usheva
{"title":"DNA 呼吸与深度学习基础模型的整合推进了人类转录因子的全基因组结合预测","authors":"Anowarul Kabir, Manish Bhattarai, Selma Peterson, Yonatan Najman-Licht, Kim Ø Rasmussen, Amarda Shehu, Alan R Bishop, Boian Alexandrov, Anny Usheva","doi":"10.1093/nar/gkae783","DOIUrl":null,"url":null,"abstract":"It was previously shown that DNA breathing, thermodynamic stability, as well as transcriptional activity and transcription factor (TF) bindings are functionally correlated. To ascertain the precise relationship between TF binding and DNA breathing, we developed the multi-modal deep learning model EPBDxDNABERT-2, which is based on the Extended Peyrard-Bishop-Dauxois (EPBD) nonlinear DNA dynamics model. To train our EPBDxDNABERT-2, we used chromatin immunoprecipitation sequencing (ChIP-Seq) data comprising 690 ChIP-seq experimental results encompassing 161 distinct TFs and 91 human cell types. EPBDxDNABERT-2 significantly improves the prediction of over 660 TF-DNA, with an increase in the area under the receiver operating characteristic (AUROC) metric of up to 9.6% when compared to the baseline model that does not leverage DNA biophysical properties. We expanded our analysis to in vitro high-throughput Systematic Evolution of Ligands by Exponential enrichment (HT-SELEX) dataset of 215 TFs from 27 families, comparing EPBD with established frameworks. The integration of the DNA breathing features with DNABERT-2 foundational model, greatly enhanced TF-binding predictions. Notably, EPBDxDNABERT-2, trained on a large-scale multi-species genomes, with a cross-attention mechanism, improved predictive power shedding light on the mechanisms underlying disease-related non-coding variants discovered in genome-wide association studies.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":16.6000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DNA breathing integration with deep learning foundational model advances genome-wide binding prediction of human transcription factors\",\"authors\":\"Anowarul Kabir, Manish Bhattarai, Selma Peterson, Yonatan Najman-Licht, Kim Ø Rasmussen, Amarda Shehu, Alan R Bishop, Boian Alexandrov, Anny Usheva\",\"doi\":\"10.1093/nar/gkae783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It was previously shown that DNA breathing, thermodynamic stability, as well as transcriptional activity and transcription factor (TF) bindings are functionally correlated. To ascertain the precise relationship between TF binding and DNA breathing, we developed the multi-modal deep learning model EPBDxDNABERT-2, which is based on the Extended Peyrard-Bishop-Dauxois (EPBD) nonlinear DNA dynamics model. To train our EPBDxDNABERT-2, we used chromatin immunoprecipitation sequencing (ChIP-Seq) data comprising 690 ChIP-seq experimental results encompassing 161 distinct TFs and 91 human cell types. EPBDxDNABERT-2 significantly improves the prediction of over 660 TF-DNA, with an increase in the area under the receiver operating characteristic (AUROC) metric of up to 9.6% when compared to the baseline model that does not leverage DNA biophysical properties. We expanded our analysis to in vitro high-throughput Systematic Evolution of Ligands by Exponential enrichment (HT-SELEX) dataset of 215 TFs from 27 families, comparing EPBD with established frameworks. The integration of the DNA breathing features with DNABERT-2 foundational model, greatly enhanced TF-binding predictions. Notably, EPBDxDNABERT-2, trained on a large-scale multi-species genomes, with a cross-attention mechanism, improved predictive power shedding light on the mechanisms underlying disease-related non-coding variants discovered in genome-wide association studies.\",\"PeriodicalId\":19471,\"journal\":{\"name\":\"Nucleic Acids Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.6000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nucleic Acids Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/nar/gkae783\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkae783","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

以前的研究表明,DNA呼吸、热力学稳定性以及转录活性和转录因子(TF)结合在功能上是相关的。为了确定TF结合与DNA呼吸之间的精确关系,我们开发了多模态深度学习模型EPBDxDNABERT-2,该模型基于扩展的Peyard-Bishop-Dauxois(EPBD)非线性DNA动力学模型。为了训练 EPBDxDNABERT-2,我们使用了染色质免疫沉淀测序(ChIP-Seq)数据,其中包括 690 个 ChIP-seq 实验结果,涵盖 161 种不同的 TF 和 91 种人类细胞类型。EPBDxDNABERT-2 显著提高了对超过 660 个 TF-DNA 的预测能力,与未利用 DNA 生物物理特性的基线模型相比,接收者操作特征下面积 (AUROC) 指标提高了 9.6%。我们将分析扩展到了体外高通量配体指数富集系统进化(HT-SELEX)数据集,该数据集包含 27 个家族的 215 个 TFs,我们将 EPBD 与已建立的框架进行了比较。DNA呼吸特征与DNABERT-2基础模型的整合大大增强了TF结合预测的能力。值得注意的是,EPBDxDNABERT-2是在大规模多物种基因组上训练的,具有交叉关注机制,提高了预测能力,揭示了全基因组关联研究中发现的疾病相关非编码变异的机制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DNA breathing integration with deep learning foundational model advances genome-wide binding prediction of human transcription factors
It was previously shown that DNA breathing, thermodynamic stability, as well as transcriptional activity and transcription factor (TF) bindings are functionally correlated. To ascertain the precise relationship between TF binding and DNA breathing, we developed the multi-modal deep learning model EPBDxDNABERT-2, which is based on the Extended Peyrard-Bishop-Dauxois (EPBD) nonlinear DNA dynamics model. To train our EPBDxDNABERT-2, we used chromatin immunoprecipitation sequencing (ChIP-Seq) data comprising 690 ChIP-seq experimental results encompassing 161 distinct TFs and 91 human cell types. EPBDxDNABERT-2 significantly improves the prediction of over 660 TF-DNA, with an increase in the area under the receiver operating characteristic (AUROC) metric of up to 9.6% when compared to the baseline model that does not leverage DNA biophysical properties. We expanded our analysis to in vitro high-throughput Systematic Evolution of Ligands by Exponential enrichment (HT-SELEX) dataset of 215 TFs from 27 families, comparing EPBD with established frameworks. The integration of the DNA breathing features with DNABERT-2 foundational model, greatly enhanced TF-binding predictions. Notably, EPBDxDNABERT-2, trained on a large-scale multi-species genomes, with a cross-attention mechanism, improved predictive power shedding light on the mechanisms underlying disease-related non-coding variants discovered in genome-wide association studies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Nucleic Acids Research
Nucleic Acids Research 生物-生化与分子生物学
CiteScore
27.10
自引率
4.70%
发文量
1057
审稿时长
2 months
期刊介绍: Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.
期刊最新文献
Direct testing of natural twister ribozymes from over a thousand organisms reveals a broad tolerance for structural imperfections. EXPRESSO: a multi-omics database to explore multi-layered 3D genomic organization. GCM and gcType in 2024: comprehensive resources for microbial strains and genomic data. Genomes OnLine Database (GOLD) v.10: new features and updates. RBPWorld for exploring functions and disease associations of RNA-binding proteins across species.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1