{"title":"抗菌肽预测的耦合编码方法:高度准确的模型有多敏感?","authors":"Ivan Erjavac , Daniela Kalafatovic , Goran Mauša","doi":"10.1016/j.ailsci.2022.100034","DOIUrl":null,"url":null,"abstract":"<div><p>Current application of machine learning in the process of antimicrobial peptide discovery call for the reduction of the false positive predictions that are produced by the classification models. Considering that the positive predictions of high confidence drive modern experimental design, the model’s sensitivity is crucial to reduce the number of unnecessary <em>in vitro</em> tests. Furthermore, taking into account the expert-based design approaches that employ random mutations on confirmed sequences, the machine learning models are required to distinguish between subtle differences among shuffled sequences. With the goal of reducing the false positive rate and improving sensitivity, we propose a hybrid approach to antimicrobial peptide prediction that utilizes combined encoding models. To this end, we implement models that employ both the physico-chemical features and sequence ordering information to stress the importance of using both representations. We also investigate the usage of binary encoding for peptide representation purposes, a method that is insufficiently represented in related research, which proved to act as a viable low dimensional alternative to the one-hot encoding. Our results, supported by Cochran and McNemar statistical tests and Spearman correlation analysis, indicate that the sequence-based encodings complement the physico-chemical features and their synergic effect yields improvement in terms of every evaluation metric. Finally, the proposed hybrid approach that combines physico-chemical features and binary encoding using logical conjunction was shown to be superior to other single models by a factor of 2.96 in terms of fall-out and up to 6.1% in terms of precision.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100034"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000058/pdfft?md5=3f5cf3ee0ab97ece8587283b98a0d00f&pid=1-s2.0-S2667318522000058-main.pdf","citationCount":"8","resultStr":"{\"title\":\"Coupled encoding methods for antimicrobial peptide prediction: How sensitive is a highly accurate model?\",\"authors\":\"Ivan Erjavac , Daniela Kalafatovic , Goran Mauša\",\"doi\":\"10.1016/j.ailsci.2022.100034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Current application of machine learning in the process of antimicrobial peptide discovery call for the reduction of the false positive predictions that are produced by the classification models. Considering that the positive predictions of high confidence drive modern experimental design, the model’s sensitivity is crucial to reduce the number of unnecessary <em>in vitro</em> tests. Furthermore, taking into account the expert-based design approaches that employ random mutations on confirmed sequences, the machine learning models are required to distinguish between subtle differences among shuffled sequences. With the goal of reducing the false positive rate and improving sensitivity, we propose a hybrid approach to antimicrobial peptide prediction that utilizes combined encoding models. To this end, we implement models that employ both the physico-chemical features and sequence ordering information to stress the importance of using both representations. We also investigate the usage of binary encoding for peptide representation purposes, a method that is insufficiently represented in related research, which proved to act as a viable low dimensional alternative to the one-hot encoding. Our results, supported by Cochran and McNemar statistical tests and Spearman correlation analysis, indicate that the sequence-based encodings complement the physico-chemical features and their synergic effect yields improvement in terms of every evaluation metric. Finally, the proposed hybrid approach that combines physico-chemical features and binary encoding using logical conjunction was shown to be superior to other single models by a factor of 2.96 in terms of fall-out and up to 6.1% in terms of precision.</p></div>\",\"PeriodicalId\":72304,\"journal\":{\"name\":\"Artificial intelligence in the life sciences\",\"volume\":\"2 \",\"pages\":\"Article 100034\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2667318522000058/pdfft?md5=3f5cf3ee0ab97ece8587283b98a0d00f&pid=1-s2.0-S2667318522000058-main.pdf\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial intelligence in the life sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667318522000058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318522000058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Coupled encoding methods for antimicrobial peptide prediction: How sensitive is a highly accurate model?
Current application of machine learning in the process of antimicrobial peptide discovery call for the reduction of the false positive predictions that are produced by the classification models. Considering that the positive predictions of high confidence drive modern experimental design, the model’s sensitivity is crucial to reduce the number of unnecessary in vitro tests. Furthermore, taking into account the expert-based design approaches that employ random mutations on confirmed sequences, the machine learning models are required to distinguish between subtle differences among shuffled sequences. With the goal of reducing the false positive rate and improving sensitivity, we propose a hybrid approach to antimicrobial peptide prediction that utilizes combined encoding models. To this end, we implement models that employ both the physico-chemical features and sequence ordering information to stress the importance of using both representations. We also investigate the usage of binary encoding for peptide representation purposes, a method that is insufficiently represented in related research, which proved to act as a viable low dimensional alternative to the one-hot encoding. Our results, supported by Cochran and McNemar statistical tests and Spearman correlation analysis, indicate that the sequence-based encodings complement the physico-chemical features and their synergic effect yields improvement in terms of every evaluation metric. Finally, the proposed hybrid approach that combines physico-chemical features and binary encoding using logical conjunction was shown to be superior to other single models by a factor of 2.96 in terms of fall-out and up to 6.1% in terms of precision.
Artificial intelligence in the life sciencesPharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)