{"title":"A Systematic Implementation of Machine Learning Algorithms for Multifaceted Antimicrobial Screening of Lead Compounds","authors":"Justin Shen, Davesh Valagolam","doi":"10.3390/eca2022-12751","DOIUrl":null,"url":null,"abstract":": This study employed machine learning algorithms to identify lead compounds that inhibit 11 the antibiotic targets, DNA gyrase and Dihydrofolate reductase in Escherichia coli , and identified 12 new, multifaceted antimicrobial compounds. This study used three separate datasets: 1) 326 Esche-13 richia coli DNA gyrase inhibitors and 132 non-inhibitors, 2) 346 Escherichia coli Dihydrofolate re-14 ductase inhibitors and 176 non-inhibitors, and 3) 18387 non-specific drug-like chemicals. All da-15 tasets were then processed using ECFP-4 fingerprints and split into train, test, and validation da-16 tasets according to a 70-15-15 train-test-validation split. We explored the potential of six different 17 classification algorithms, all optimized with Bayesian optimization. Our results indicate that the 18 Gradient Boosting Classifier (GBC) performed the best at identifying a compound's efficacy towards 19 DNA gyrase with an accuracy, precision, recall, F1-score, and AUC of 0.91, 0.92, 0.86, 0.88, and 0.933, 20 respectively. The Random Forest Classifier (RFC) performed optimally for identifying a com-21 pound’s effectiveness towards Dihydrofolate reductase with an accuracy, precision, recall, F1 -score, 22 and AUC of 0.86, 0.83, 0.85, 0.84, and 0.944, respectively. As a result, the GBC and RFC were used 23 to search for compounds that inhibited both DNA gyrase and Dihydrofolate reductase. Out of 18387 24 compounds, we identified 5 novel compounds that have a predicted probability greater than 95% 25 to inhibit both DNA gyrase and Dihydrofolate reductase, suggesting a high antimicrobial potential. 26 The models evaluated in this study, particularly the GBC and RFC models, hold tremendous prom-27 ise in computationally screening large libraries of compounds for antimicrobial potential.","PeriodicalId":431431,"journal":{"name":"ECA 2022","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ECA 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/eca2022-12751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
: This study employed machine learning algorithms to identify lead compounds that inhibit 11 the antibiotic targets, DNA gyrase and Dihydrofolate reductase in Escherichia coli , and identified 12 new, multifaceted antimicrobial compounds. This study used three separate datasets: 1) 326 Esche-13 richia coli DNA gyrase inhibitors and 132 non-inhibitors, 2) 346 Escherichia coli Dihydrofolate re-14 ductase inhibitors and 176 non-inhibitors, and 3) 18387 non-specific drug-like chemicals. All da-15 tasets were then processed using ECFP-4 fingerprints and split into train, test, and validation da-16 tasets according to a 70-15-15 train-test-validation split. We explored the potential of six different 17 classification algorithms, all optimized with Bayesian optimization. Our results indicate that the 18 Gradient Boosting Classifier (GBC) performed the best at identifying a compound's efficacy towards 19 DNA gyrase with an accuracy, precision, recall, F1-score, and AUC of 0.91, 0.92, 0.86, 0.88, and 0.933, 20 respectively. The Random Forest Classifier (RFC) performed optimally for identifying a com-21 pound’s effectiveness towards Dihydrofolate reductase with an accuracy, precision, recall, F1 -score, 22 and AUC of 0.86, 0.83, 0.85, 0.84, and 0.944, respectively. As a result, the GBC and RFC were used 23 to search for compounds that inhibited both DNA gyrase and Dihydrofolate reductase. Out of 18387 24 compounds, we identified 5 novel compounds that have a predicted probability greater than 95% 25 to inhibit both DNA gyrase and Dihydrofolate reductase, suggesting a high antimicrobial potential. 26 The models evaluated in this study, particularly the GBC and RFC models, hold tremendous prom-27 ise in computationally screening large libraries of compounds for antimicrobial potential.