Impact of the Predicted Protein Structural Content on Prediction of Structural Classes for the Twilight Zone Proteins

Lukasz Kurgan, M. Rahbari, L. Homaeian
{"title":"Impact of the Predicted Protein Structural Content on Prediction of Structural Classes for the Twilight Zone Proteins","authors":"Lukasz Kurgan, M. Rahbari, L. Homaeian","doi":"10.1109/ICMLA.2006.27","DOIUrl":null,"url":null,"abstract":"This paper addresses in silico prediction of protein structural classes as defined in the SCOP database. The SCOP defines total of 11 classes, while majority of proteins are classified to the 4 classes: all-alpha all-beta alpha/beta, and alpha+beta. The main goals of this paper are to experimentally evaluate the impact of predicted protein secondary structure content on the structural class prediction and to develop a novel protein sequence representation. The experiments include application of three protein sequence representations and four classifiers to prediction of both 4 and 11 structural classes. The predictions are performed using a large dataset of low homology (twilight zone) sequences. The proposed sequence representation includes the predicted structural content, which provides the strongest contribution towards classification, composition and composition moment vectors, hydrophobic autocorrelations, chemical group composition and molecular weight of the protein. The predicted content values are shown on average to improve the prediction accuracy by 3.3% and 4.2% for the 4 and 11 classes, respectively, when compared to sequence representation that does not utilize this information. Finally, we propose a very compact, 20 dimensional sequence representation that is shown to improve the prediction accuracy by 5.1-8.5% when compared with recently published results","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2006.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This paper addresses in silico prediction of protein structural classes as defined in the SCOP database. The SCOP defines total of 11 classes, while majority of proteins are classified to the 4 classes: all-alpha all-beta alpha/beta, and alpha+beta. The main goals of this paper are to experimentally evaluate the impact of predicted protein secondary structure content on the structural class prediction and to develop a novel protein sequence representation. The experiments include application of three protein sequence representations and four classifiers to prediction of both 4 and 11 structural classes. The predictions are performed using a large dataset of low homology (twilight zone) sequences. The proposed sequence representation includes the predicted structural content, which provides the strongest contribution towards classification, composition and composition moment vectors, hydrophobic autocorrelations, chemical group composition and molecular weight of the protein. The predicted content values are shown on average to improve the prediction accuracy by 3.3% and 4.2% for the 4 and 11 classes, respectively, when compared to sequence representation that does not utilize this information. Finally, we propose a very compact, 20 dimensional sequence representation that is shown to improve the prediction accuracy by 5.1-8.5% when compared with recently published results
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预测蛋白质结构含量对模糊区蛋白质结构分类预测的影响
本文讨论了在SCOP数据库中定义的蛋白质结构类的计算机预测。SCOP总共定义了11类蛋白质,而大多数蛋白质被分类为4类:all- α - β α / β和α + β。本文的主要目的是通过实验评估预测的蛋白质二级结构含量对结构类预测的影响,并建立一种新的蛋白质序列表示方法。实验包括应用3种蛋白质序列表示和4种分类器对4和11种结构类进行预测。预测是使用低同源性(模糊区)序列的大型数据集进行的。所提出的序列表示包括预测的结构含量,这对蛋白质的分类、组成和组成矩向量、疏水自相关性、化学基团组成和分子量提供了最大的贡献。与不利用该信息的序列表示相比,平均显示的预测内容值可将4类和11类的预测精度分别提高3.3%和4.2%。最后,我们提出了一个非常紧凑的20维序列表示,与最近发表的结果相比,预测精度提高了5.1-8.5%
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Efficient Heuristic for Discovering Multiple Ill-Defined Attributes in Datasets Robust Model Selection Using Cross Validation: A Simple Iterative Technique for Developing Robust Gene Signatures in Biomedical Genomics Applications Detecting Web Content Function Using Generalized Hidden Markov Model Naive Bayes Classification Given Probability Estimation Trees A New Machine Learning Technique Based on Straight Line Segments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1