通过挖掘距离矩阵中视觉内容的关联预测SCOP排序域

Pin-Hao Chi, C. Shyu
{"title":"通过挖掘距离矩阵中视觉内容的关联预测SCOP排序域","authors":"Pin-Hao Chi, C. Shyu","doi":"10.1142/9781860947292_0008","DOIUrl":null,"url":null,"abstract":"Protein tertiary structures are known to have significant correlations with their biological functions. To understand the information of the protein structures, Structural Classification of Protein (SCOP) Database, which is manually constructed by human experts, classifies similar protein folds in the same domain hierarchy. Even though this approach is believed to be more reliable than applying traditional alignment methods in structural classifications, it is labor intensive. In this paper, we build a non-parametric classifier to predict possible SCOP domains for unknown protein structures. With supervised learning, the algorithm first maps tertiary structures of training proteins into two-dimensional distance matrices, and then extracts signatures from visual contents of matrices. A knowledge discovery and data mining (KDD) process further discovers relevant patterns in training signatures of each SCOP domain by mining association rules. Finally, the quantity of rules whose patterns match signatures of unknown proteins determines predicted domains in a ranked order. We select 7,702 protein chains from 150 domains of SCOP database 1.67 release as labelled data using 10 fold cross validation. Experimental results show that the prediction accuracy is 91.27% for the top ranked domain and 99.22% for the top 5 ranked domains. The average response time takes 6.34 seconds, exhibiting reasonably high prediction accuracy and efficiency.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"23 1","pages":"49-58"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Predicting Ranked SCOP Domains by Mining Associations of Visual Contents in Distance Matrices\",\"authors\":\"Pin-Hao Chi, C. Shyu\",\"doi\":\"10.1142/9781860947292_0008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein tertiary structures are known to have significant correlations with their biological functions. To understand the information of the protein structures, Structural Classification of Protein (SCOP) Database, which is manually constructed by human experts, classifies similar protein folds in the same domain hierarchy. Even though this approach is believed to be more reliable than applying traditional alignment methods in structural classifications, it is labor intensive. In this paper, we build a non-parametric classifier to predict possible SCOP domains for unknown protein structures. With supervised learning, the algorithm first maps tertiary structures of training proteins into two-dimensional distance matrices, and then extracts signatures from visual contents of matrices. A knowledge discovery and data mining (KDD) process further discovers relevant patterns in training signatures of each SCOP domain by mining association rules. Finally, the quantity of rules whose patterns match signatures of unknown proteins determines predicted domains in a ranked order. We select 7,702 protein chains from 150 domains of SCOP database 1.67 release as labelled data using 10 fold cross validation. Experimental results show that the prediction accuracy is 91.27% for the top ranked domain and 99.22% for the top 5 ranked domains. The average response time takes 6.34 seconds, exhibiting reasonably high prediction accuracy and efficiency.\",\"PeriodicalId\":74513,\"journal\":{\"name\":\"Proceedings of the ... Asia-Pacific bioinformatics conference\",\"volume\":\"23 1\",\"pages\":\"49-58\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... Asia-Pacific bioinformatics conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/9781860947292_0008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860947292_0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

已知蛋白质三级结构与其生物学功能有显著的相关性。为了了解蛋白质的结构信息,由人类专家手工构建的蛋白质结构分类数据库(SCOP)将相似的蛋白质折叠在同一域层次中进行分类。尽管这种方法被认为比在结构分类中应用传统的对齐方法更可靠,但它是劳动密集型的。在本文中,我们建立了一个非参数分类器来预测未知蛋白质结构可能的SCOP结构域。通过监督学习,该算法首先将训练蛋白的三级结构映射到二维距离矩阵中,然后从矩阵的视觉内容中提取特征。知识发现和数据挖掘(KDD)过程通过挖掘关联规则进一步发现每个SCOP域训练签名中的相关模式。最后,其模式与未知蛋白质的特征相匹配的规则的数量决定了预测结构域的排序顺序。我们从SCOP数据库1.67版本的150个结构域中选择7702个蛋白链作为标记数据,使用10倍交叉验证。实验结果表明,对排名前1位的域的预测准确率为91.27%,对排名前5位的域的预测准确率为99.22%。平均响应时间为6.34秒,具有较高的预测精度和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Predicting Ranked SCOP Domains by Mining Associations of Visual Contents in Distance Matrices
Protein tertiary structures are known to have significant correlations with their biological functions. To understand the information of the protein structures, Structural Classification of Protein (SCOP) Database, which is manually constructed by human experts, classifies similar protein folds in the same domain hierarchy. Even though this approach is believed to be more reliable than applying traditional alignment methods in structural classifications, it is labor intensive. In this paper, we build a non-parametric classifier to predict possible SCOP domains for unknown protein structures. With supervised learning, the algorithm first maps tertiary structures of training proteins into two-dimensional distance matrices, and then extracts signatures from visual contents of matrices. A knowledge discovery and data mining (KDD) process further discovers relevant patterns in training signatures of each SCOP domain by mining association rules. Finally, the quantity of rules whose patterns match signatures of unknown proteins determines predicted domains in a ranked order. We select 7,702 protein chains from 150 domains of SCOP database 1.67 release as labelled data using 10 fold cross validation. Experimental results show that the prediction accuracy is 91.27% for the top ranked domain and 99.22% for the top 5 ranked domains. The average response time takes 6.34 seconds, exhibiting reasonably high prediction accuracy and efficiency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tuning Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding. The Future of Bioinformatics CHEMICAL COMPOUND CLASSIFICATION WITH AUTOMATICALLY MINED STRUCTURE PATTERNS. Predicting Nucleolar Proteins Using Support-Vector Machines Proceedings of the 6th Asia-Pacific Bioinformatics Conference, APBC 2008, 14-17 January 2008, Kyoto, Japan
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1