{"title":"蛋白质功能预测:结合统计特征和深度学习","authors":"Deepa Kumari, Ashish Ranjan, A. Deepak","doi":"10.2139/ssrn.3349575","DOIUrl":null,"url":null,"abstract":"Functional annotation of proteins to reduce gap between the available proteins and their known functional annotations based on protein sequences is a challenging task. This requires transformation of protein sequences into feature vectors for efficient analysis from computational perspective using machine learning algorithms. However, such transformation is difficult task due to high diversity among the protein sequences from the same family. Most existing sequence features performed low when annotating proteins with large number of functional classes. In this paper, three sequence features are combined with deep learning techniques for better performance. Evaluation scores show better results when combined with deep CNN. F1-score for PseAAC + CNN improves by a factor of +9.5% compared to PseAAC + DNN. The corresponding number for AAID + CNN and SGT + CNN is +3.22% and +2.33% respectively.","PeriodicalId":18731,"journal":{"name":"Materials Processing & Manufacturing eJournal","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Protein Function Prediction: Combining Statistical Features with Deep Learning\",\"authors\":\"Deepa Kumari, Ashish Ranjan, A. Deepak\",\"doi\":\"10.2139/ssrn.3349575\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Functional annotation of proteins to reduce gap between the available proteins and their known functional annotations based on protein sequences is a challenging task. This requires transformation of protein sequences into feature vectors for efficient analysis from computational perspective using machine learning algorithms. However, such transformation is difficult task due to high diversity among the protein sequences from the same family. Most existing sequence features performed low when annotating proteins with large number of functional classes. In this paper, three sequence features are combined with deep learning techniques for better performance. Evaluation scores show better results when combined with deep CNN. F1-score for PseAAC + CNN improves by a factor of +9.5% compared to PseAAC + DNN. The corresponding number for AAID + CNN and SGT + CNN is +3.22% and +2.33% respectively.\",\"PeriodicalId\":18731,\"journal\":{\"name\":\"Materials Processing & Manufacturing eJournal\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Materials Processing & Manufacturing eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3349575\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Processing & Manufacturing eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3349575","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Protein Function Prediction: Combining Statistical Features with Deep Learning
Functional annotation of proteins to reduce gap between the available proteins and their known functional annotations based on protein sequences is a challenging task. This requires transformation of protein sequences into feature vectors for efficient analysis from computational perspective using machine learning algorithms. However, such transformation is difficult task due to high diversity among the protein sequences from the same family. Most existing sequence features performed low when annotating proteins with large number of functional classes. In this paper, three sequence features are combined with deep learning techniques for better performance. Evaluation scores show better results when combined with deep CNN. F1-score for PseAAC + CNN improves by a factor of +9.5% compared to PseAAC + DNN. The corresponding number for AAID + CNN and SGT + CNN is +3.22% and +2.33% respectively.