解读用于预测翻译率的深度神经网络。

IF 3.5 2区生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY BMC Genomics Pub Date : 2024-11-09 DOI:10.1186/s12864-024-10925-8

Frederick Korbel, Ekaterina Eroshok, Uwe Ohler

{"title":"解读用于预测翻译率的深度神经网络。","authors":"Frederick Korbel, Ekaterina Eroshok, Uwe Ohler","doi":"10.1186/s12864-024-10925-8","DOIUrl":null,"url":null,"abstract":"Background: The 5' untranslated region of mRNA strongly impacts the rate of translation initiation. A recent convolutional neural network (CNN) model accurately quantifies the relationship between massively parallel synthetic 5' untranslated regions (5'UTRs) and translation levels. However, the underlying biological features, which drive model predictions, remain elusive. Uncovering sequence determinants predictive of translation output may allow us to develop a more detailed understanding of translation regulation at the 5'UTR.Results: Applying model interpretation, we extract representations of regulatory logic from CNNs trained on synthetic and human 5'UTR reporter data. We reveal a complex interplay of regulatory sequence elements, such as initiation context and upstream open reading frames (uORFs) to influence model predictions. We show that models trained on synthetic data alone do not sufficiently explain translation regulation via the 5'UTR due to differences in the frequency of regulatory motifs compared to natural 5'UTRs.Conclusions: Our study demonstrates the significance of model interpretation in understanding model behavior, properties of experimental data and ultimately mRNA translation. By combining synthetic and human 5'UTR reporter data, we develop a model (OptMRL) which better captures the characteristics of human translation regulation. This approach provides a general strategy for building more successful sequence-based models of gene regulation, as it combines global sampling of random sequences with the subspace of naturally occurring sequences. Ultimately, this will enhance our understanding of 5'UTR sequences in disease and our ability to engineer translation output.","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1061"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549864/pdf/","citationCount":"0","resultStr":"{\"title\":\"Interpreting deep neural networks for the prediction of translation rates.\",\"authors\":\"Frederick Korbel, Ekaterina Eroshok, Uwe Ohler\",\"doi\":\"10.1186/s12864-024-10925-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: The 5' untranslated region of mRNA strongly impacts the rate of translation initiation. A recent convolutional neural network (CNN) model accurately quantifies the relationship between massively parallel synthetic 5' untranslated regions (5'UTRs) and translation levels. However, the underlying biological features, which drive model predictions, remain elusive. Uncovering sequence determinants predictive of translation output may allow us to develop a more detailed understanding of translation regulation at the 5'UTR.Results: Applying model interpretation, we extract representations of regulatory logic from CNNs trained on synthetic and human 5'UTR reporter data. We reveal a complex interplay of regulatory sequence elements, such as initiation context and upstream open reading frames (uORFs) to influence model predictions. We show that models trained on synthetic data alone do not sufficiently explain translation regulation via the 5'UTR due to differences in the frequency of regulatory motifs compared to natural 5'UTRs.Conclusions: Our study demonstrates the significance of model interpretation in understanding model behavior, properties of experimental data and ultimately mRNA translation. By combining synthetic and human 5'UTR reporter data, we develop a model (OptMRL) which better captures the characteristics of human translation regulation. This approach provides a general strategy for building more successful sequence-based models of gene regulation, as it combines global sampling of random sequences with the subspace of naturally occurring sequences. Ultimately, this will enhance our understanding of 5'UTR sequences in disease and our ability to engineer translation output.\",\"PeriodicalId\":9030,\"journal\":{\"name\":\"BMC Genomics\",\"volume\":\"25 1\",\"pages\":\"1061\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549864/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12864-024-10925-8\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-024-10925-8","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：mRNA 的 5' 非翻译区对翻译启动的速度有很大影响。最近的一个卷积神经网络（CNN）模型准确量化了大规模平行合成的 5' 非翻译区（5'UTR）与翻译水平之间的关系。然而，驱动模型预测的潜在生物特征仍然难以捉摸。揭示预测翻译输出的序列决定因素可以让我们更详细地了解 5'UTR 的翻译调控：应用模型解释，我们从合成和人类 5'UTR 报告数据训练的 CNN 中提取了调控逻辑的表征。我们揭示了起始上下文和上游开放阅读框架（uORFs）等调控序列元素之间复杂的相互作用对模型预测的影响。我们发现，与天然 5'UTR 相比，由于调控基序的频率不同，仅在合成数据上训练的模型并不能充分解释通过 5'UTR 进行的翻译调控：我们的研究证明了模型解释在理解模型行为、实验数据属性以及最终 mRNA 翻译方面的重要性。通过结合合成和人类 5'UTR 报告数据，我们建立了一个能更好捕捉人类翻译调控特征的模型（OptMRL）。这种方法将随机序列的全局采样与自然发生序列的子空间相结合，为建立更成功的基于序列的基因调控模型提供了通用策略。最终，这将增强我们对疾病中 5'UTR 序列的理解，并提高我们设计翻译输出的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Interpreting deep neural networks for the prediction of translation rates.

Background: The 5' untranslated region of mRNA strongly impacts the rate of translation initiation. A recent convolutional neural network (CNN) model accurately quantifies the relationship between massively parallel synthetic 5' untranslated regions (5'UTRs) and translation levels. However, the underlying biological features, which drive model predictions, remain elusive. Uncovering sequence determinants predictive of translation output may allow us to develop a more detailed understanding of translation regulation at the 5'UTR.

Results: Applying model interpretation, we extract representations of regulatory logic from CNNs trained on synthetic and human 5'UTR reporter data. We reveal a complex interplay of regulatory sequence elements, such as initiation context and upstream open reading frames (uORFs) to influence model predictions. We show that models trained on synthetic data alone do not sufficiently explain translation regulation via the 5'UTR due to differences in the frequency of regulatory motifs compared to natural 5'UTRs.

Conclusions: Our study demonstrates the significance of model interpretation in understanding model behavior, properties of experimental data and ultimately mRNA translation. By combining synthetic and human 5'UTR reporter data, we develop a model (OptMRL) which better captures the characteristics of human translation regulation. This approach provides a general strategy for building more successful sequence-based models of gene regulation, as it combines global sampling of random sequences with the subspace of naturally occurring sequences. Ultimately, this will enhance our understanding of 5'UTR sequences in disease and our ability to engineer translation output.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Genomics 生物-生物工程与应用微生物

CiteScore

7.40

自引率

4.50%

发文量

769

审稿时长

6.4 months

期刊介绍： BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.