Classification of twilight zone proteins using a structure-based phylogenetic approach

2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE) Pub Date : 2018-07-05 DOI:10.1109/ISCAIE.2018.8405437

Siti Aisyah Mohd Taha, Y. Zakaria

{"title":"Classification of twilight zone proteins using a structure-based phylogenetic approach","authors":"Siti Aisyah Mohd Taha, Y. Zakaria","doi":"10.1109/ISCAIE.2018.8405437","DOIUrl":null,"url":null,"abstract":"The emerging knowledge in drug discovery has heightened the need to study the classification of proteins in order to understand their structure, functions and evolutionary relationship. Due to high vulnerability of protein sequence to change throughout evolution, it is difficult to identify protein homology of distant evolutionarily-related proteins. These proteins are also known to be structurally homologous, thus, the structural approach was a more suitable method. This study focused on the methods for classifying twilight zone proteins using structure-based phylogenetic approach. However, since protein homology plays a major role in protein classification, finding the best alignment tool is the most crucial step. The classification of proteins was constructed by clustering 15 folds at their superfamily level. These proteins belonged to four main SCOPe classes which are the all alpha proteins (Class A), all beta proteins (Class B), wound alpha beta proteins (Class C) and mixed alpha beta proteins (Class D). Protein homology was identified using structural alignment tools which are FATCAT-F and FATCAT-R, while the sequence alignment was conducted using T-COFFEE. Classification tree was constructed using the Unweighted Pair Group Method of Arithmetic Mean (UPGMA) and the clusters were validated using Adjusted Rand Index (ARi), pseudo-jackknife confidence interval and manual observation of clusters. Results show that the structural approach produced better classification than the sequence-based method by producing clusters with higher resemblance to SCOPe for three main SCOPe classes (Class A, Class C and Class D). Moreover, FATCAT-R was able to cluster proteins more accurately than FATCAT-F with higher ARi results for a majority of protein folds. On the other hand, T-COFFEE was able to cluster Class B proteins more accurately than FATCAT-F and FATCAT-R.","PeriodicalId":333327,"journal":{"name":"2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAIE.2018.8405437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The emerging knowledge in drug discovery has heightened the need to study the classification of proteins in order to understand their structure, functions and evolutionary relationship. Due to high vulnerability of protein sequence to change throughout evolution, it is difficult to identify protein homology of distant evolutionarily-related proteins. These proteins are also known to be structurally homologous, thus, the structural approach was a more suitable method. This study focused on the methods for classifying twilight zone proteins using structure-based phylogenetic approach. However, since protein homology plays a major role in protein classification, finding the best alignment tool is the most crucial step. The classification of proteins was constructed by clustering 15 folds at their superfamily level. These proteins belonged to four main SCOPe classes which are the all alpha proteins (Class A), all beta proteins (Class B), wound alpha beta proteins (Class C) and mixed alpha beta proteins (Class D). Protein homology was identified using structural alignment tools which are FATCAT-F and FATCAT-R, while the sequence alignment was conducted using T-COFFEE. Classification tree was constructed using the Unweighted Pair Group Method of Arithmetic Mean (UPGMA) and the clusters were validated using Adjusted Rand Index (ARi), pseudo-jackknife confidence interval and manual observation of clusters. Results show that the structural approach produced better classification than the sequence-based method by producing clusters with higher resemblance to SCOPe for three main SCOPe classes (Class A, Class C and Class D). Moreover, FATCAT-R was able to cluster proteins more accurately than FATCAT-F with higher ARi results for a majority of protein folds. On the other hand, T-COFFEE was able to cluster Class B proteins more accurately than FATCAT-F and FATCAT-R.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用基于结构的系统发育方法对模糊带蛋白进行分类

药物发现方面的新知识提高了对蛋白质分类研究的需要，以便了解它们的结构、功能和进化关系。由于蛋白质序列在整个进化过程中极易发生变化，因此很难鉴定远缘进化相关蛋白的蛋白质同源性。这些蛋白质在结构上也是同源的，因此，结构方法是更合适的方法。本研究的重点是利用基于结构的系统发育方法对模糊带蛋白进行分类。然而，由于蛋白质同源性在蛋白质分类中起着重要作用，寻找最佳的比对工具是最关键的一步。在超家族水平上聚类15次，构建了蛋白质的分类。这些蛋白属于4个主要的SCOPe类，即全α蛋白(A类)、全β蛋白(B类)、缠绕α β蛋白(C类)和混合α β蛋白(D类)。使用结构比对工具FATCAT-F和FATCAT-R鉴定蛋白同源性，并使用T-COFFEE进行序列比对。采用UPGMA (Unweighted Pair Group Method of Arithmetic Mean)构建分类树，并采用调整后Rand指数(Adjusted Rand Index, ARi)、伪折刀置信区间(pseudo-jackknife confidence interval)和人工观察对聚类进行验证。结果表明，与基于序列的方法相比，结构方法的分类效果更好，对三个主要的SCOPe类别(A类、C类和D类)产生的聚类与SCOPe相似度更高。此外，FATCAT-R能够比FATCAT-F更准确地聚类蛋白质，对大多数蛋白质折叠具有更高的ARi结果。另一方面，T-COFFEE能够比FATCAT-F和FATCAT-R更准确地聚类B类蛋白。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)

自引率

0.00%

发文量