基于深度递归神经网络状态推理的RNA二级结构预测改进

Devin Willmott, D. Murrugarra, Q. Ye
{"title":"基于深度递归神经网络状态推理的RNA二级结构预测改进","authors":"Devin Willmott, D. Murrugarra, Q. Ye","doi":"10.1515/cmb-2020-0002","DOIUrl":null,"url":null,"abstract":"Abstract The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems. This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.","PeriodicalId":34018,"journal":{"name":"Computational and Mathematical Biophysics","volume":"8 1","pages":"36 - 50"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/cmb-2020-0002","citationCount":"12","resultStr":"{\"title\":\"Improving RNA secondary structure prediction via state inference with deep recurrent neural networks\",\"authors\":\"Devin Willmott, D. Murrugarra, Q. Ye\",\"doi\":\"10.1515/cmb-2020-0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems. This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.\",\"PeriodicalId\":34018,\"journal\":{\"name\":\"Computational and Mathematical Biophysics\",\"volume\":\"8 1\",\"pages\":\"36 - 50\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/cmb-2020-0002\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and Mathematical Biophysics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/cmb-2020-0002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and Mathematical Biophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/cmb-2020-0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 12

摘要

摘要可以通过不同的机器学习技术来研究确定RNA序列的哪些核苷酸在RNA的二级结构中成对或不成对的问题,我们称之为RNA状态推断。RNA序列的成功状态推断可用于生成用于数据导向的RNA二级结构预测的辅助信息。用于状态推理的典型工具,如隐马尔可夫模型,在RNA状态推理中表现出较差的性能,部分原因是它们无法识别非局部依赖性。双向长短期记忆(LSTM)神经网络已成为一种强大的工具,可以对全局非线性序列相关性进行建模,并在许多不同的分类问题上取得了最先进的性能。本文围绕状态推理的深度学习方法,提出了一种实用的RNA二级结构推理方法。来自深度双向LSTM的状态预测用于生成合成的SHAPE数据,该数据可以通过最近邻热力学模型(NNTM)纳入RNA二级结构预测。这种方法为不同的16S核糖体RNA测试集产生预测的二级结构,平均比无向MFE结构准确25个百分点。准确性在很大程度上取决于我们的状态推理方法的成功,研究我们的状态预测的全局特征表明,我们的状态推断和结构推断方法的准确性都高度依赖于序列的配对模式与训练数据集的相似性。大型训练数据集的可用性对该方法的成功至关重要。代码可在https://github.com/dwillmott/rna-state-inf.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving RNA secondary structure prediction via state inference with deep recurrent neural networks
Abstract The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems. This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computational and Mathematical Biophysics
Computational and Mathematical Biophysics Mathematics-Mathematical Physics
CiteScore
2.50
自引率
0.00%
发文量
8
审稿时长
30 weeks
期刊最新文献
Optimal control and bifurcation analysis of SEIHR model for COVID-19 with vaccination strategies and mask efficiency Assessing the impact of information-induced self-protection on Zika transmission: A mathematical modeling approach Optimal control of susceptible mature pest concerning disease-induced pest-natural enemy system with cost-effectiveness On building machine learning models for medical dataset with correlated features A mathematical study of the adrenocorticotropic hormone as a regulator of human gene expression in adrenal glands
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1