学习基因表达工程中DNA的调控语法

Jan Zrimec, Aleksej Zelezniak
{"title":"学习基因表达工程中DNA的调控语法","authors":"Jan Zrimec, Aleksej Zelezniak","doi":"10.1145/3388440.3414922","DOIUrl":null,"url":null,"abstract":"The DNA regulatory code of gene expression is encoded in the gene regulatory structure spanning the coding and adjacent non-coding regulatory DNA regions. Deciphering this regulatory code, and how the whole gene structure interacts to produce mRNA transcripts and regulate mRNA abundance, can greatly improve our capabilities for controlling gene expression. Here, we consider that natural systems offer the most accurate information on gene expression regulation and apply deep learning on over 20,000 mRNA datasets to learn the DNA encoded regulatory code across a variety of model organisms from bacteria to Human [1]. We find that up to 82% of variation of gene expression is encoded in the gene regulatory structure across all model organisms. Coding and regulatory regions carry both overlapping and new, orthogonal information, and additively contribute to gene expression prediction. By mining the gene expression models for the relevant DNA regulatory motifs, we uncover that motif interactions across the whole gene regulatory structure define over 3 orders of magnitude of gene expression levels. Finally, we experimentally verify the usefulness of our AI-guided approach for protein expression engineering. Our results suggest that single motifs or regulatory regions might not be solely responsible for regulating gene expression levels. Instead, the whole gene regulatory structure, which contains the DNA regulatory grammar of interacting DNA motifs across the protein coding and non-coding regulatory regions, forms a coevolved transcriptional regulatory unit. This provides a solution by which whole gene systems with pre-specified expression patterns can be designed.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning the regulatory grammar of DNA for gene expression engineering\",\"authors\":\"Jan Zrimec, Aleksej Zelezniak\",\"doi\":\"10.1145/3388440.3414922\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The DNA regulatory code of gene expression is encoded in the gene regulatory structure spanning the coding and adjacent non-coding regulatory DNA regions. Deciphering this regulatory code, and how the whole gene structure interacts to produce mRNA transcripts and regulate mRNA abundance, can greatly improve our capabilities for controlling gene expression. Here, we consider that natural systems offer the most accurate information on gene expression regulation and apply deep learning on over 20,000 mRNA datasets to learn the DNA encoded regulatory code across a variety of model organisms from bacteria to Human [1]. We find that up to 82% of variation of gene expression is encoded in the gene regulatory structure across all model organisms. Coding and regulatory regions carry both overlapping and new, orthogonal information, and additively contribute to gene expression prediction. By mining the gene expression models for the relevant DNA regulatory motifs, we uncover that motif interactions across the whole gene regulatory structure define over 3 orders of magnitude of gene expression levels. Finally, we experimentally verify the usefulness of our AI-guided approach for protein expression engineering. Our results suggest that single motifs or regulatory regions might not be solely responsible for regulating gene expression levels. Instead, the whole gene regulatory structure, which contains the DNA regulatory grammar of interacting DNA motifs across the protein coding and non-coding regulatory regions, forms a coevolved transcriptional regulatory unit. This provides a solution by which whole gene systems with pre-specified expression patterns can be designed.\",\"PeriodicalId\":411338,\"journal\":{\"name\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388440.3414922\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3414922","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基因表达的DNA调控代码编码在跨越编码区和邻近的非编码DNA调控区的基因调控结构中。破译这个调控密码,以及整个基因结构如何相互作用产生mRNA转录物并调节mRNA丰度,可以大大提高我们控制基因表达的能力。在这里,我们认为自然系统提供了最准确的基因表达调控信息,并在超过20,000个mRNA数据集上应用深度学习来学习从细菌到人类的各种模式生物的DNA编码调控代码[1]。我们发现,在所有模式生物中,高达82%的基因表达变异是在基因调控结构中编码的。编码区和调控区携带重叠和新的、正交的信息,并有助于基因表达预测。通过挖掘相关DNA调控基序的基因表达模型,我们发现整个基因调控结构中的基序相互作用定义了超过3个数量级的基因表达水平。最后,我们通过实验验证了我们的人工智能引导方法在蛋白质表达工程中的实用性。我们的研究结果表明,单一的基序或调控区域可能不是调节基因表达水平的唯一原因。相反,整个基因调控结构,包括在蛋白质编码区和非编码调控区相互作用的DNA基序的DNA调控语法,形成了一个共同进化的转录调控单元。这提供了一种解决方案,通过这种解决方案,可以设计具有预先指定表达模式的整个基因系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Learning the regulatory grammar of DNA for gene expression engineering
The DNA regulatory code of gene expression is encoded in the gene regulatory structure spanning the coding and adjacent non-coding regulatory DNA regions. Deciphering this regulatory code, and how the whole gene structure interacts to produce mRNA transcripts and regulate mRNA abundance, can greatly improve our capabilities for controlling gene expression. Here, we consider that natural systems offer the most accurate information on gene expression regulation and apply deep learning on over 20,000 mRNA datasets to learn the DNA encoded regulatory code across a variety of model organisms from bacteria to Human [1]. We find that up to 82% of variation of gene expression is encoded in the gene regulatory structure across all model organisms. Coding and regulatory regions carry both overlapping and new, orthogonal information, and additively contribute to gene expression prediction. By mining the gene expression models for the relevant DNA regulatory motifs, we uncover that motif interactions across the whole gene regulatory structure define over 3 orders of magnitude of gene expression levels. Finally, we experimentally verify the usefulness of our AI-guided approach for protein expression engineering. Our results suggest that single motifs or regulatory regions might not be solely responsible for regulating gene expression levels. Instead, the whole gene regulatory structure, which contains the DNA regulatory grammar of interacting DNA motifs across the protein coding and non-coding regulatory regions, forms a coevolved transcriptional regulatory unit. This provides a solution by which whole gene systems with pre-specified expression patterns can be designed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RA2Vec CanMod From Interatomic Distances to Protein Tertiary Structures with a Deep Convolutional Neural Network Prediction of Large for Gestational Age Infants in Overweight and Obese Women at Approximately 20 Gestational Weeks Using Patient Information for the Prediction of Caregiver Burden in Amyotrophic Lateral Sclerosis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1