Fadi Shehadeh, LewisOscar Felix, Markos Kalligeros, Adnan Shehadeh, Beth Burgwyn Fuchs, Frederick M Ausubel, Paul P Sotiriadis, Eleftherios Mylonakis
{"title":"Machine learning-assisted high-throughput screening for Anti-MRSA compounds.","authors":"Fadi Shehadeh, LewisOscar Felix, Markos Kalligeros, Adnan Shehadeh, Beth Burgwyn Fuchs, Frederick M Ausubel, Paul P Sotiriadis, Eleftherios Mylonakis","doi":"10.1109/TCBB.2024.3434340","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Antimicrobial resistance is a major public health threat, and new agents are needed. Computational approaches have been proposed to reduce the cost and time needed for compound screening.</p><p><strong>Aims: </strong>A machine learning (ML) model was developed for the in silico screening of low molecular weight molecules.</p><p><strong>Methods: </strong>We used the results of a high-throughput Caenorhabditis elegans methicillin-resistant Staphylococcus aureus (MRSA) liquid infection assay to develop ML models for compound prioritization and quality control.</p><p><strong>Results: </strong>The compound prioritization model achieved an AUC of 0.795 with a sensitivity of 81% and a specificity of 70%. When applied to a validation set of 22,768 compounds, the model identified 81% of the active compounds identified by high-throughput screening (HTS) among only 30.6% of the total 22,768 compounds, resulting in a 2.67-fold increase in hit rate. When we retrained the model on all the compounds of the HTS dataset, it further identified 45 discordant molecules classified as non-hits by the HTS, with 42/45 (93%) having known antimicrobial activity.</p><p><strong>Conclusion: </strong>Our ML approach can be used to increase HTS efficiency by reducing the number of compounds that need to be physically screened and identifying potential missed hits, making HTS more accessible and reducing barriers to entry.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TCBB.2024.3434340","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Antimicrobial resistance is a major public health threat, and new agents are needed. Computational approaches have been proposed to reduce the cost and time needed for compound screening.
Aims: A machine learning (ML) model was developed for the in silico screening of low molecular weight molecules.
Methods: We used the results of a high-throughput Caenorhabditis elegans methicillin-resistant Staphylococcus aureus (MRSA) liquid infection assay to develop ML models for compound prioritization and quality control.
Results: The compound prioritization model achieved an AUC of 0.795 with a sensitivity of 81% and a specificity of 70%. When applied to a validation set of 22,768 compounds, the model identified 81% of the active compounds identified by high-throughput screening (HTS) among only 30.6% of the total 22,768 compounds, resulting in a 2.67-fold increase in hit rate. When we retrained the model on all the compounds of the HTS dataset, it further identified 45 discordant molecules classified as non-hits by the HTS, with 42/45 (93%) having known antimicrobial activity.
Conclusion: Our ML approach can be used to increase HTS efficiency by reducing the number of compounds that need to be physically screened and identifying potential missed hits, making HTS more accessible and reducing barriers to entry.
期刊介绍:
IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system