Jude Y Betow, Gemma Turon, Clovis S Metuge, Simeon Akame, Vanessa A Shu, Oyere T Ebob, Miquel Duran-Frigola, Fidele Ntie-Kang
{"title":"The Chemical Space Spanned by Manually Curated Datasets of Natural and Synthetic Compounds with Activities against SARS-CoV-2.","authors":"Jude Y Betow, Gemma Turon, Clovis S Metuge, Simeon Akame, Vanessa A Shu, Oyere T Ebob, Miquel Duran-Frigola, Fidele Ntie-Kang","doi":"10.1002/minf.202400293","DOIUrl":null,"url":null,"abstract":"<p><p>Diseases caused by viruses are challenging to contain, as their outbreak and spread could be very sudden, compounded by rapid mutations, making the development of drugs and vaccines a continued endeavour that requires fast discovery and preparedness. Targeting viral infections with small molecules remains one of the treatment options to reduce transmission and the disease burden. A lesson learned from the recent coronavirus disease (COVID-19) is to collect ready-to-screen small molecule libraries in preparation for the next viral outbreak, and potentially find a clinical candidate before it becomes a pandemic. Public availability of diverse compound libraries, well annotated in terms of chemical structures and scaffolds, modes of action, and bioactivities are, therefore, crucial to ensure the participation of academic laboratories in these screening efforts, especially in resource-limited settings where synthesis, testing and computing capacity are scarce. Here, we demonstrate a low-resource approach to populate the chemical space of naturally occurring and synthetic small molecules that have shown in vitro and/or in vivo activities against the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its target proteins. We have manually curated two datasets of small molecules (naturally occurring and synthetically derived) by reading and collecting (hand-curating) the published literature. Information from the literature reveals that a majority of the reported SARS-CoV-2 compounds act by inhibiting the main protease, while 25% of the compounds currently have no known target. Scaffold analysis and principal component analysis revealed that the most common scaffolds in the datasets are quite distinct. We then expanded the initially manually curated dataset of over 1200 compounds via an ultra-large scale 2D and 3D similarity search, obtaining an expanded collection of over 150 k purchasable compounds. The spanned chemical space significantly extends beyond that of a commercially available coronavirus library of more than 20 k small molecules and constitutes a good starting collection for virtual screening campaigns given its manageable size and proximity to hand-curated compounds.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400293"},"PeriodicalIF":2.8000,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202400293","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Diseases caused by viruses are challenging to contain, as their outbreak and spread could be very sudden, compounded by rapid mutations, making the development of drugs and vaccines a continued endeavour that requires fast discovery and preparedness. Targeting viral infections with small molecules remains one of the treatment options to reduce transmission and the disease burden. A lesson learned from the recent coronavirus disease (COVID-19) is to collect ready-to-screen small molecule libraries in preparation for the next viral outbreak, and potentially find a clinical candidate before it becomes a pandemic. Public availability of diverse compound libraries, well annotated in terms of chemical structures and scaffolds, modes of action, and bioactivities are, therefore, crucial to ensure the participation of academic laboratories in these screening efforts, especially in resource-limited settings where synthesis, testing and computing capacity are scarce. Here, we demonstrate a low-resource approach to populate the chemical space of naturally occurring and synthetic small molecules that have shown in vitro and/or in vivo activities against the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its target proteins. We have manually curated two datasets of small molecules (naturally occurring and synthetically derived) by reading and collecting (hand-curating) the published literature. Information from the literature reveals that a majority of the reported SARS-CoV-2 compounds act by inhibiting the main protease, while 25% of the compounds currently have no known target. Scaffold analysis and principal component analysis revealed that the most common scaffolds in the datasets are quite distinct. We then expanded the initially manually curated dataset of over 1200 compounds via an ultra-large scale 2D and 3D similarity search, obtaining an expanded collection of over 150 k purchasable compounds. The spanned chemical space significantly extends beyond that of a commercially available coronavirus library of more than 20 k small molecules and constitutes a good starting collection for virtual screening campaigns given its manageable size and proximity to hand-curated compounds.
期刊介绍:
Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010.
Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation.
The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.