{"title":"通过挖掘距离矩阵中视觉内容的关联预测SCOP排序域","authors":"Pin-Hao Chi, C. Shyu","doi":"10.1142/9781860947292_0008","DOIUrl":null,"url":null,"abstract":"Protein tertiary structures are known to have significant correlations with their biological functions. To understand the information of the protein structures, Structural Classification of Protein (SCOP) Database, which is manually constructed by human experts, classifies similar protein folds in the same domain hierarchy. Even though this approach is believed to be more reliable than applying traditional alignment methods in structural classifications, it is labor intensive. In this paper, we build a non-parametric classifier to predict possible SCOP domains for unknown protein structures. With supervised learning, the algorithm first maps tertiary structures of training proteins into two-dimensional distance matrices, and then extracts signatures from visual contents of matrices. A knowledge discovery and data mining (KDD) process further discovers relevant patterns in training signatures of each SCOP domain by mining association rules. Finally, the quantity of rules whose patterns match signatures of unknown proteins determines predicted domains in a ranked order. We select 7,702 protein chains from 150 domains of SCOP database 1.67 release as labelled data using 10 fold cross validation. Experimental results show that the prediction accuracy is 91.27% for the top ranked domain and 99.22% for the top 5 ranked domains. The average response time takes 6.34 seconds, exhibiting reasonably high prediction accuracy and efficiency.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"23 1","pages":"49-58"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Predicting Ranked SCOP Domains by Mining Associations of Visual Contents in Distance Matrices\",\"authors\":\"Pin-Hao Chi, C. Shyu\",\"doi\":\"10.1142/9781860947292_0008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein tertiary structures are known to have significant correlations with their biological functions. To understand the information of the protein structures, Structural Classification of Protein (SCOP) Database, which is manually constructed by human experts, classifies similar protein folds in the same domain hierarchy. Even though this approach is believed to be more reliable than applying traditional alignment methods in structural classifications, it is labor intensive. In this paper, we build a non-parametric classifier to predict possible SCOP domains for unknown protein structures. With supervised learning, the algorithm first maps tertiary structures of training proteins into two-dimensional distance matrices, and then extracts signatures from visual contents of matrices. A knowledge discovery and data mining (KDD) process further discovers relevant patterns in training signatures of each SCOP domain by mining association rules. Finally, the quantity of rules whose patterns match signatures of unknown proteins determines predicted domains in a ranked order. We select 7,702 protein chains from 150 domains of SCOP database 1.67 release as labelled data using 10 fold cross validation. Experimental results show that the prediction accuracy is 91.27% for the top ranked domain and 99.22% for the top 5 ranked domains. The average response time takes 6.34 seconds, exhibiting reasonably high prediction accuracy and efficiency.\",\"PeriodicalId\":74513,\"journal\":{\"name\":\"Proceedings of the ... Asia-Pacific bioinformatics conference\",\"volume\":\"23 1\",\"pages\":\"49-58\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... Asia-Pacific bioinformatics conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/9781860947292_0008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860947292_0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Predicting Ranked SCOP Domains by Mining Associations of Visual Contents in Distance Matrices
Protein tertiary structures are known to have significant correlations with their biological functions. To understand the information of the protein structures, Structural Classification of Protein (SCOP) Database, which is manually constructed by human experts, classifies similar protein folds in the same domain hierarchy. Even though this approach is believed to be more reliable than applying traditional alignment methods in structural classifications, it is labor intensive. In this paper, we build a non-parametric classifier to predict possible SCOP domains for unknown protein structures. With supervised learning, the algorithm first maps tertiary structures of training proteins into two-dimensional distance matrices, and then extracts signatures from visual contents of matrices. A knowledge discovery and data mining (KDD) process further discovers relevant patterns in training signatures of each SCOP domain by mining association rules. Finally, the quantity of rules whose patterns match signatures of unknown proteins determines predicted domains in a ranked order. We select 7,702 protein chains from 150 domains of SCOP database 1.67 release as labelled data using 10 fold cross validation. Experimental results show that the prediction accuracy is 91.27% for the top ranked domain and 99.22% for the top 5 ranked domains. The average response time takes 6.34 seconds, exhibiting reasonably high prediction accuracy and efficiency.