基于深度递归神经网络状态推理的RNA二级结构预测改进

Q2 Mathematics Computational and Mathematical Biophysics Pub Date : 2019-06-26 DOI:10.1515/cmb-2020-0002

Devin Willmott, D. Murrugarra, Q. Ye

{"title":"基于深度递归神经网络状态推理的RNA二级结构预测改进","authors":"Devin Willmott, D. Murrugarra, Q. Ye","doi":"10.1515/cmb-2020-0002","DOIUrl":null,"url":null,"abstract":"Abstract The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems. This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.","PeriodicalId":34018,"journal":{"name":"Computational and Mathematical Biophysics","volume":"8 1","pages":"36 - 50"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/cmb-2020-0002","citationCount":"12","resultStr":"{\"title\":\"Improving RNA secondary structure prediction via state inference with deep recurrent neural networks\",\"authors\":\"Devin Willmott, D. Murrugarra, Q. Ye\",\"doi\":\"10.1515/cmb-2020-0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems. This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.\",\"PeriodicalId\":34018,\"journal\":{\"name\":\"Computational and Mathematical Biophysics\",\"volume\":\"8 1\",\"pages\":\"36 - 50\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/cmb-2020-0002\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and Mathematical Biophysics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/cmb-2020-0002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and Mathematical Biophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/cmb-2020-0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 12

摘要

摘要可以通过不同的机器学习技术来研究确定RNA序列的哪些核苷酸在RNA的二级结构中成对或不成对的问题，我们称之为RNA状态推断。RNA序列的成功状态推断可用于生成用于数据导向的RNA二级结构预测的辅助信息。用于状态推理的典型工具，如隐马尔可夫模型，在RNA状态推理中表现出较差的性能，部分原因是它们无法识别非局部依赖性。双向长短期记忆（LSTM）神经网络已成为一种强大的工具，可以对全局非线性序列相关性进行建模，并在许多不同的分类问题上取得了最先进的性能。本文围绕状态推理的深度学习方法，提出了一种实用的RNA二级结构推理方法。来自深度双向LSTM的状态预测用于生成合成的SHAPE数据，该数据可以通过最近邻热力学模型（NNTM）纳入RNA二级结构预测。这种方法为不同的16S核糖体RNA测试集产生预测的二级结构，平均比无向MFE结构准确25个百分点。准确性在很大程度上取决于我们的状态推理方法的成功，研究我们的状态预测的全局特征表明，我们的状态推断和结构推断方法的准确性都高度依赖于序列的配对模式与训练数据集的相似性。大型训练数据集的可用性对该方法的成功至关重要。代码可在https://github.com/dwillmott/rna-state-inf.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving RNA secondary structure prediction via state inference with deep recurrent neural networks

Abstract The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems. This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊