The Chemical Space Spanned by Manually Curated Datasets of Natural and Synthetic Compounds with Activities against SARS-CoV-2.

IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Molecular Informatics Pub Date : 2024-11-23 DOI:10.1002/minf.202400293
Jude Y Betow, Gemma Turon, Clovis S Metuge, Simeon Akame, Vanessa A Shu, Oyere T Ebob, Miquel Duran-Frigola, Fidele Ntie-Kang
{"title":"The Chemical Space Spanned by Manually Curated Datasets of Natural and Synthetic Compounds with Activities against SARS-CoV-2.","authors":"Jude Y Betow, Gemma Turon, Clovis S Metuge, Simeon Akame, Vanessa A Shu, Oyere T Ebob, Miquel Duran-Frigola, Fidele Ntie-Kang","doi":"10.1002/minf.202400293","DOIUrl":null,"url":null,"abstract":"<p><p>Diseases caused by viruses are challenging to contain, as their outbreak and spread could be very sudden, compounded by rapid mutations, making the development of drugs and vaccines a continued endeavour that requires fast discovery and preparedness. Targeting viral infections with small molecules remains one of the treatment options to reduce transmission and the disease burden. A lesson learned from the recent coronavirus disease (COVID-19) is to collect ready-to-screen small molecule libraries in preparation for the next viral outbreak, and potentially find a clinical candidate before it becomes a pandemic. Public availability of diverse compound libraries, well annotated in terms of chemical structures and scaffolds, modes of action, and bioactivities are, therefore, crucial to ensure the participation of academic laboratories in these screening efforts, especially in resource-limited settings where synthesis, testing and computing capacity are scarce. Here, we demonstrate a low-resource approach to populate the chemical space of naturally occurring and synthetic small molecules that have shown in vitro and/or in vivo activities against the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its target proteins. We have manually curated two datasets of small molecules (naturally occurring and synthetically derived) by reading and collecting (hand-curating) the published literature. Information from the literature reveals that a majority of the reported SARS-CoV-2 compounds act by inhibiting the main protease, while 25% of the compounds currently have no known target. Scaffold analysis and principal component analysis revealed that the most common scaffolds in the datasets are quite distinct. We then expanded the initially manually curated dataset of over 1200 compounds via an ultra-large scale 2D and 3D similarity search, obtaining an expanded collection of over 150 k purchasable compounds. The spanned chemical space significantly extends beyond that of a commercially available coronavirus library of more than 20 k small molecules and constitutes a good starting collection for virtual screening campaigns given its manageable size and proximity to hand-curated compounds.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400293"},"PeriodicalIF":2.8000,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202400293","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Diseases caused by viruses are challenging to contain, as their outbreak and spread could be very sudden, compounded by rapid mutations, making the development of drugs and vaccines a continued endeavour that requires fast discovery and preparedness. Targeting viral infections with small molecules remains one of the treatment options to reduce transmission and the disease burden. A lesson learned from the recent coronavirus disease (COVID-19) is to collect ready-to-screen small molecule libraries in preparation for the next viral outbreak, and potentially find a clinical candidate before it becomes a pandemic. Public availability of diverse compound libraries, well annotated in terms of chemical structures and scaffolds, modes of action, and bioactivities are, therefore, crucial to ensure the participation of academic laboratories in these screening efforts, especially in resource-limited settings where synthesis, testing and computing capacity are scarce. Here, we demonstrate a low-resource approach to populate the chemical space of naturally occurring and synthetic small molecules that have shown in vitro and/or in vivo activities against the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its target proteins. We have manually curated two datasets of small molecules (naturally occurring and synthetically derived) by reading and collecting (hand-curating) the published literature. Information from the literature reveals that a majority of the reported SARS-CoV-2 compounds act by inhibiting the main protease, while 25% of the compounds currently have no known target. Scaffold analysis and principal component analysis revealed that the most common scaffolds in the datasets are quite distinct. We then expanded the initially manually curated dataset of over 1200 compounds via an ultra-large scale 2D and 3D similarity search, obtaining an expanded collection of over 150 k purchasable compounds. The spanned chemical space significantly extends beyond that of a commercially available coronavirus library of more than 20 k small molecules and constitutes a good starting collection for virtual screening campaigns given its manageable size and proximity to hand-curated compounds.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工编辑的具有抗 SARS-CoV-2 活性的天然和合成化合物数据集所跨越的化学空间。
由病毒引起的疾病难以控制,因为它们的爆发和传播可能非常突然,再加上快速变异,使得药物和疫苗的开发成为一项需要快速发现和准备的持续性工作。用小分子药物治疗病毒感染仍然是减少传播和疾病负担的治疗方法之一。从最近的冠状病毒疾病(COVID-19)中汲取的教训是,收集可随时筛选的小分子化合物库,为下一次病毒爆发做好准备,并有可能在病毒大流行之前找到临床候选药物。因此,向公众提供在化学结构和支架、作用模式和生物活性方面注释清楚的各种化合物库,对于确保学术实验室参与这些筛选工作至关重要,尤其是在合成、测试和计算能力稀缺的资源有限环境中。在这里,我们展示了一种低资源方法,用于填充针对严重急性呼吸系统综合征冠状病毒 2(SARS-CoV-2)及其靶蛋白具有体外和/或体内活性的天然小分子和合成小分子的化学空间。我们通过阅读和收集(手工整理)已发表的文献,手工整理了两个小分子(天然生成的和人工合成的)数据集。文献信息显示,大多数已报道的 SARS-CoV-2 化合物通过抑制主要蛋白酶发挥作用,而 25% 的化合物目前尚无已知靶点。支架分析和主成分分析表明,数据集中最常见的支架非常不同。随后,我们通过超大规模的二维和三维相似性搜索,扩展了最初人工编辑的 1200 多种化合物的数据集,获得了超过 15 万种可购买化合物的扩展集合。所跨越的化学空间大大超出了由 20 多万个小分子组成的商业化冠状病毒库的范围,而且由于其规模易于管理且接近人工整理的化合物,因此是虚拟筛选活动的良好起点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Molecular Informatics
Molecular Informatics CHEMISTRY, MEDICINAL-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.30
自引率
2.80%
发文量
70
审稿时长
3 months
期刊介绍: Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010. Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation. The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.
期刊最新文献
The Chemical Space Spanned by Manually Curated Datasets of Natural and Synthetic Compounds with Activities against SARS-CoV-2. Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions. BIOMX-DB: A web application for the BIOFACQUIM natural product database. Chemoinformatics for corrosion science: Data-driven modeling of corrosion inhibition by organic molecules. My 50 Years with Chemoinformatics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1