{"title":"Classification of twilight zone proteins using a structure-based phylogenetic approach","authors":"Siti Aisyah Mohd Taha, Y. Zakaria","doi":"10.1109/ISCAIE.2018.8405437","DOIUrl":null,"url":null,"abstract":"The emerging knowledge in drug discovery has heightened the need to study the classification of proteins in order to understand their structure, functions and evolutionary relationship. Due to high vulnerability of protein sequence to change throughout evolution, it is difficult to identify protein homology of distant evolutionarily-related proteins. These proteins are also known to be structurally homologous, thus, the structural approach was a more suitable method. This study focused on the methods for classifying twilight zone proteins using structure-based phylogenetic approach. However, since protein homology plays a major role in protein classification, finding the best alignment tool is the most crucial step. The classification of proteins was constructed by clustering 15 folds at their superfamily level. These proteins belonged to four main SCOPe classes which are the all alpha proteins (Class A), all beta proteins (Class B), wound alpha beta proteins (Class C) and mixed alpha beta proteins (Class D). Protein homology was identified using structural alignment tools which are FATCAT-F and FATCAT-R, while the sequence alignment was conducted using T-COFFEE. Classification tree was constructed using the Unweighted Pair Group Method of Arithmetic Mean (UPGMA) and the clusters were validated using Adjusted Rand Index (ARi), pseudo-jackknife confidence interval and manual observation of clusters. Results show that the structural approach produced better classification than the sequence-based method by producing clusters with higher resemblance to SCOPe for three main SCOPe classes (Class A, Class C and Class D). Moreover, FATCAT-R was able to cluster proteins more accurately than FATCAT-F with higher ARi results for a majority of protein folds. On the other hand, T-COFFEE was able to cluster Class B proteins more accurately than FATCAT-F and FATCAT-R.","PeriodicalId":333327,"journal":{"name":"2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAIE.2018.8405437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The emerging knowledge in drug discovery has heightened the need to study the classification of proteins in order to understand their structure, functions and evolutionary relationship. Due to high vulnerability of protein sequence to change throughout evolution, it is difficult to identify protein homology of distant evolutionarily-related proteins. These proteins are also known to be structurally homologous, thus, the structural approach was a more suitable method. This study focused on the methods for classifying twilight zone proteins using structure-based phylogenetic approach. However, since protein homology plays a major role in protein classification, finding the best alignment tool is the most crucial step. The classification of proteins was constructed by clustering 15 folds at their superfamily level. These proteins belonged to four main SCOPe classes which are the all alpha proteins (Class A), all beta proteins (Class B), wound alpha beta proteins (Class C) and mixed alpha beta proteins (Class D). Protein homology was identified using structural alignment tools which are FATCAT-F and FATCAT-R, while the sequence alignment was conducted using T-COFFEE. Classification tree was constructed using the Unweighted Pair Group Method of Arithmetic Mean (UPGMA) and the clusters were validated using Adjusted Rand Index (ARi), pseudo-jackknife confidence interval and manual observation of clusters. Results show that the structural approach produced better classification than the sequence-based method by producing clusters with higher resemblance to SCOPe for three main SCOPe classes (Class A, Class C and Class D). Moreover, FATCAT-R was able to cluster proteins more accurately than FATCAT-F with higher ARi results for a majority of protein folds. On the other hand, T-COFFEE was able to cluster Class B proteins more accurately than FATCAT-F and FATCAT-R.