Fan Zhang, Chaoyang Liu, Binjie Wang, Yiru He, Xinhong Zhang
{"title":"基于深度学习的非经典分泌蛋白预测模型","authors":"Fan Zhang, Chaoyang Liu, Binjie Wang, Yiru He, Xinhong Zhang","doi":"10.1002/cem.3553","DOIUrl":null,"url":null,"abstract":"<p>Most of the current nonclassical proteins prediction methods involve manual feature selection, such as constructing features of samples based on the physicochemical properties of proteins and position-specific scoring matrix (PSSM). However, these tasks require researchers to perform some tedious search work to obtain the physicochemical properties of proteins. This paper proposes an end-to-end nonclassical secreted protein prediction model based on deep learning, named DeepNCSPP, which employs the protein sequence information and sequence statistics information as input to predict whether it is a nonclassical secreted protein. The protein sequence information and sequence statistics information are extracted using bidirectional long- and short-term memory and convolutional neural networks, respectively. Among the experiments conducted on the independent test dataset, DeepNCSPP achieved excellent results with an accuracy of 88.24<i>%</i>, Matthews coefficient (MCC) of 77.01<i>%</i>, and F1-score of 87.50<i>%</i>. Independent test dataset testing and 10-fold cross-validation show that DeepNCSPP achieves competitive performance with state-of-the-art methods and can be used as a reliable nonclassical secreted protein prediction model. A web server has been constructed for the convenience of researchers. The web link is https://www.deepncspp.top/. The source code of DeepNCSPP has been hosted on GitHub and is available online (https://github.com/xiaoliu166370/DEEPNCSPP).</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 8","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A prediction model of nonclassical secreted protein based on deep learning\",\"authors\":\"Fan Zhang, Chaoyang Liu, Binjie Wang, Yiru He, Xinhong Zhang\",\"doi\":\"10.1002/cem.3553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Most of the current nonclassical proteins prediction methods involve manual feature selection, such as constructing features of samples based on the physicochemical properties of proteins and position-specific scoring matrix (PSSM). However, these tasks require researchers to perform some tedious search work to obtain the physicochemical properties of proteins. This paper proposes an end-to-end nonclassical secreted protein prediction model based on deep learning, named DeepNCSPP, which employs the protein sequence information and sequence statistics information as input to predict whether it is a nonclassical secreted protein. The protein sequence information and sequence statistics information are extracted using bidirectional long- and short-term memory and convolutional neural networks, respectively. Among the experiments conducted on the independent test dataset, DeepNCSPP achieved excellent results with an accuracy of 88.24<i>%</i>, Matthews coefficient (MCC) of 77.01<i>%</i>, and F1-score of 87.50<i>%</i>. Independent test dataset testing and 10-fold cross-validation show that DeepNCSPP achieves competitive performance with state-of-the-art methods and can be used as a reliable nonclassical secreted protein prediction model. A web server has been constructed for the convenience of researchers. The web link is https://www.deepncspp.top/. The source code of DeepNCSPP has been hosted on GitHub and is available online (https://github.com/xiaoliu166370/DEEPNCSPP).</p>\",\"PeriodicalId\":15274,\"journal\":{\"name\":\"Journal of Chemometrics\",\"volume\":\"38 8\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemometrics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cem.3553\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL WORK\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3553","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
A prediction model of nonclassical secreted protein based on deep learning
Most of the current nonclassical proteins prediction methods involve manual feature selection, such as constructing features of samples based on the physicochemical properties of proteins and position-specific scoring matrix (PSSM). However, these tasks require researchers to perform some tedious search work to obtain the physicochemical properties of proteins. This paper proposes an end-to-end nonclassical secreted protein prediction model based on deep learning, named DeepNCSPP, which employs the protein sequence information and sequence statistics information as input to predict whether it is a nonclassical secreted protein. The protein sequence information and sequence statistics information are extracted using bidirectional long- and short-term memory and convolutional neural networks, respectively. Among the experiments conducted on the independent test dataset, DeepNCSPP achieved excellent results with an accuracy of 88.24%, Matthews coefficient (MCC) of 77.01%, and F1-score of 87.50%. Independent test dataset testing and 10-fold cross-validation show that DeepNCSPP achieves competitive performance with state-of-the-art methods and can be used as a reliable nonclassical secreted protein prediction model. A web server has been constructed for the convenience of researchers. The web link is https://www.deepncspp.top/. The source code of DeepNCSPP has been hosted on GitHub and is available online (https://github.com/xiaoliu166370/DEEPNCSPP).
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.