RiRPSSP: A unified deep learning method for prediction of regular and irregular protein secondary structures.

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Bioinformatics and Computational Biology Pub Date : 2023-02-01 DOI:10.1142/S0219720023500014

Mukhtar Ahmad Sofi, M Arif Wani

{"title":"RiRPSSP: A unified deep learning method for prediction of regular and irregular protein secondary structures.","authors":"Mukhtar Ahmad Sofi, M Arif Wani","doi":"10.1142/S0219720023500014","DOIUrl":null,"url":null,"abstract":"<p><p>Protein secondary structure prediction (PSSP) is an important and challenging task in protein bioinformatics. Protein secondary structures (SSs) are categorized in regular and irregular structure classes. Regular SSs, representing nearly 50% of amino acids consist of helices and sheets, whereas the remaining amino acids represent irregular SSs. [Formula: see text]-turns and [Formula: see text]-turns are the most abundant irregular SSs present in proteins. Existing methods are well developed for separate prediction of regular and irregular SSs. However, for more comprehensive PSSP, it is essential to develop a uniform model to predict all types of SSs simultaneously. In this work, using a novel dataset comprising dictionary of secondary structure of protein (DSSP)-based SSs and PROMOTIF-based [Formula: see text]-turns and [Formula: see text]-turns, we propose a unified deep learning model consisting of convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) for simultaneous prediction of regular and irregular SSs. To the best of our knowledge, this is the first study in PSSP covering both regular and irregular structures. The protein sequences in our constructed datasets, RiR6069 and RiR513, have been borrowed from benchmark CB6133 and CB513 datasets, respectively. The results are indicative of increased PSSP accuracy.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 1","pages":"2350001"},"PeriodicalIF":0.7000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Bioinformatics and Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1142/S0219720023500014","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Protein secondary structure prediction (PSSP) is an important and challenging task in protein bioinformatics. Protein secondary structures (SSs) are categorized in regular and irregular structure classes. Regular SSs, representing nearly 50% of amino acids consist of helices and sheets, whereas the remaining amino acids represent irregular SSs. [Formula: see text]-turns and [Formula: see text]-turns are the most abundant irregular SSs present in proteins. Existing methods are well developed for separate prediction of regular and irregular SSs. However, for more comprehensive PSSP, it is essential to develop a uniform model to predict all types of SSs simultaneously. In this work, using a novel dataset comprising dictionary of secondary structure of protein (DSSP)-based SSs and PROMOTIF-based [Formula: see text]-turns and [Formula: see text]-turns, we propose a unified deep learning model consisting of convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) for simultaneous prediction of regular and irregular SSs. To the best of our knowledge, this is the first study in PSSP covering both regular and irregular structures. The protein sequences in our constructed datasets, RiR6069 and RiR513, have been borrowed from benchmark CB6133 and CB513 datasets, respectively. The results are indicative of increased PSSP accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

RiRPSSP:用于预测规则和不规则蛋白质二级结构的统一深度学习方法。

蛋白质二级结构预测(PSSP)是蛋白质生物信息学领域的一项重要而富有挑战性的工作。蛋白质二级结构分为规则结构和不规则结构两类。近50%的氨基酸是由螺旋和片状组成的规则SSs，而其余的氨基酸则是不规则SSs。[公式:见文]-turn和[公式:见文]-turn是蛋白质中最丰富的不规则SSs。现有的方法已经发展得很好，可以分别预测规则和不规则的SSs。然而，对于更全面的PSSP，必须建立一个统一的模型来同时预测所有类型的SSs。在这项工作中，我们使用一个新的数据集，包括基于蛋白质二级结构(DSSP)的SSs字典和基于promotifs的[公式:见文本]-turns和[公式:见文本]-turns，我们提出了一个由卷积神经网络(cnn)和长短期记忆网络(LSTMs)组成的统一深度学习模型，用于同时预测规则和不规则的SSs。据我们所知，这是第一次在PSSP中同时研究规则和不规则结构。我们构建的数据集RiR6069和RiR513中的蛋白质序列分别借鉴了基准CB6133和CB513数据集。结果表明PSSP的准确性有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Bioinformatics and Computational Biology MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

2.10

自引率

0.00%

发文量

期刊介绍： The Journal of Bioinformatics and Computational Biology aims to publish high quality, original research articles, expository tutorial papers and review papers as well as short, critical comments on technical issues associated with the analysis of cellular information. The research papers will be technical presentations of new assertions, discoveries and tools, intended for a narrower specialist community. The tutorials, reviews and critical commentary will be targeted at a broader readership of biologists who are interested in using computers but are not knowledgeable about scientific computing, and equally, computer scientists who have an interest in biology but are not familiar with current thrusts nor the language of biology. Such carefully chosen tutorials and articles should greatly accelerate the rate of entry of these new creative scientists into the field.