Classification of twilight zone proteins using a structure-based phylogenetic approach

Siti Aisyah Mohd Taha, Y. Zakaria
{"title":"Classification of twilight zone proteins using a structure-based phylogenetic approach","authors":"Siti Aisyah Mohd Taha, Y. Zakaria","doi":"10.1109/ISCAIE.2018.8405437","DOIUrl":null,"url":null,"abstract":"The emerging knowledge in drug discovery has heightened the need to study the classification of proteins in order to understand their structure, functions and evolutionary relationship. Due to high vulnerability of protein sequence to change throughout evolution, it is difficult to identify protein homology of distant evolutionarily-related proteins. These proteins are also known to be structurally homologous, thus, the structural approach was a more suitable method. This study focused on the methods for classifying twilight zone proteins using structure-based phylogenetic approach. However, since protein homology plays a major role in protein classification, finding the best alignment tool is the most crucial step. The classification of proteins was constructed by clustering 15 folds at their superfamily level. These proteins belonged to four main SCOPe classes which are the all alpha proteins (Class A), all beta proteins (Class B), wound alpha beta proteins (Class C) and mixed alpha beta proteins (Class D). Protein homology was identified using structural alignment tools which are FATCAT-F and FATCAT-R, while the sequence alignment was conducted using T-COFFEE. Classification tree was constructed using the Unweighted Pair Group Method of Arithmetic Mean (UPGMA) and the clusters were validated using Adjusted Rand Index (ARi), pseudo-jackknife confidence interval and manual observation of clusters. Results show that the structural approach produced better classification than the sequence-based method by producing clusters with higher resemblance to SCOPe for three main SCOPe classes (Class A, Class C and Class D). Moreover, FATCAT-R was able to cluster proteins more accurately than FATCAT-F with higher ARi results for a majority of protein folds. On the other hand, T-COFFEE was able to cluster Class B proteins more accurately than FATCAT-F and FATCAT-R.","PeriodicalId":333327,"journal":{"name":"2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAIE.2018.8405437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The emerging knowledge in drug discovery has heightened the need to study the classification of proteins in order to understand their structure, functions and evolutionary relationship. Due to high vulnerability of protein sequence to change throughout evolution, it is difficult to identify protein homology of distant evolutionarily-related proteins. These proteins are also known to be structurally homologous, thus, the structural approach was a more suitable method. This study focused on the methods for classifying twilight zone proteins using structure-based phylogenetic approach. However, since protein homology plays a major role in protein classification, finding the best alignment tool is the most crucial step. The classification of proteins was constructed by clustering 15 folds at their superfamily level. These proteins belonged to four main SCOPe classes which are the all alpha proteins (Class A), all beta proteins (Class B), wound alpha beta proteins (Class C) and mixed alpha beta proteins (Class D). Protein homology was identified using structural alignment tools which are FATCAT-F and FATCAT-R, while the sequence alignment was conducted using T-COFFEE. Classification tree was constructed using the Unweighted Pair Group Method of Arithmetic Mean (UPGMA) and the clusters were validated using Adjusted Rand Index (ARi), pseudo-jackknife confidence interval and manual observation of clusters. Results show that the structural approach produced better classification than the sequence-based method by producing clusters with higher resemblance to SCOPe for three main SCOPe classes (Class A, Class C and Class D). Moreover, FATCAT-R was able to cluster proteins more accurately than FATCAT-F with higher ARi results for a majority of protein folds. On the other hand, T-COFFEE was able to cluster Class B proteins more accurately than FATCAT-F and FATCAT-R.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用基于结构的系统发育方法对模糊带蛋白进行分类
药物发现方面的新知识提高了对蛋白质分类研究的需要,以便了解它们的结构、功能和进化关系。由于蛋白质序列在整个进化过程中极易发生变化,因此很难鉴定远缘进化相关蛋白的蛋白质同源性。这些蛋白质在结构上也是同源的,因此,结构方法是更合适的方法。本研究的重点是利用基于结构的系统发育方法对模糊带蛋白进行分类。然而,由于蛋白质同源性在蛋白质分类中起着重要作用,寻找最佳的比对工具是最关键的一步。在超家族水平上聚类15次,构建了蛋白质的分类。这些蛋白属于4个主要的SCOPe类,即全α蛋白(A类)、全β蛋白(B类)、缠绕α β蛋白(C类)和混合α β蛋白(D类)。使用结构比对工具FATCAT-F和FATCAT-R鉴定蛋白同源性,并使用T-COFFEE进行序列比对。采用UPGMA (Unweighted Pair Group Method of Arithmetic Mean)构建分类树,并采用调整后Rand指数(Adjusted Rand Index, ARi)、伪折刀置信区间(pseudo-jackknife confidence interval)和人工观察对聚类进行验证。结果表明,与基于序列的方法相比,结构方法的分类效果更好,对三个主要的SCOPe类别(A类、C类和D类)产生的聚类与SCOPe相似度更高。此外,FATCAT-R能够比FATCAT-F更准确地聚类蛋白质,对大多数蛋白质折叠具有更高的ARi结果。另一方面,T-COFFEE能够比FATCAT-F和FATCAT-R更准确地聚类B类蛋白。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improved recurrent NARX neural network model for state of charge estimation of lithium-ion battery using pso algorithm Exploring antecedent factors toward knowledge sharing intention in E-learning The development of sports science knowledge management systems through CommonKADS and digital Kanban board Cancelable biometrics technique for iris recognition Timing analysis for Diffie Hellman Key Exchange In U-BOOT using Raspberry pi
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1