An Open-Source Implementation of the Scaffold Identification and Naming System (SCINS) and Example Applications

IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2024-10-15 DOI:10.1021/acs.jcim.4c01314
Kamen P. Petrov, Andreas Bender
{"title":"An Open-Source Implementation of the Scaffold Identification and Naming System (SCINS) and Example Applications","authors":"Kamen P. Petrov, Andreas Bender","doi":"10.1021/acs.jcim.4c01314","DOIUrl":null,"url":null,"abstract":"Organizing and partitioning sets of chemical structures is of considerable practical significance, e.g., in compound library analysis and the postprocessing of screening hit lists. Approaches such as unsupervised clustering are computationally demanding and dataset-dependent; on the other hand, rule-based methods, such as those based on Murcko scaffolds, have linear time complexity but are often too fine-grained, leading to a large number of singletons or sparsely populated classes. An alternative rule-based method that seeks to achieve an optimal balance when grouping compounds into sets is the ‘Scaffold Identification and Naming System’ (SCINS). To facilitate public use of this previously published method, here, we provide an open-source Python implementation of SCINS, dependent only on RDKit. We show that SCINS can be useful in identifying sparsely and densely populated regions in chemical space in large databases, here exemplified with Enamine REAL Diverse and ChEMBL. We find that Enamine REAL Diverse covers a much smaller SCINS space relative to ChEMBL, whereas the opposite is true when Murcko and generic Murcko scaffolds are considered. Additionally, we show that SCINS can result in chemically intuitive grouping of medium-sized sets of bioactive compounds, which can be useful in compound selection from virtual screening campaigns as well as postprocessing of experimental hit lists. Hence, in this work, we provide both an open-source implementation of SCINS and its characterization with relevant use cases.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01314","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Organizing and partitioning sets of chemical structures is of considerable practical significance, e.g., in compound library analysis and the postprocessing of screening hit lists. Approaches such as unsupervised clustering are computationally demanding and dataset-dependent; on the other hand, rule-based methods, such as those based on Murcko scaffolds, have linear time complexity but are often too fine-grained, leading to a large number of singletons or sparsely populated classes. An alternative rule-based method that seeks to achieve an optimal balance when grouping compounds into sets is the ‘Scaffold Identification and Naming System’ (SCINS). To facilitate public use of this previously published method, here, we provide an open-source Python implementation of SCINS, dependent only on RDKit. We show that SCINS can be useful in identifying sparsely and densely populated regions in chemical space in large databases, here exemplified with Enamine REAL Diverse and ChEMBL. We find that Enamine REAL Diverse covers a much smaller SCINS space relative to ChEMBL, whereas the opposite is true when Murcko and generic Murcko scaffolds are considered. Additionally, we show that SCINS can result in chemically intuitive grouping of medium-sized sets of bioactive compounds, which can be useful in compound selection from virtual screening campaigns as well as postprocessing of experimental hit lists. Hence, in this work, we provide both an open-source implementation of SCINS and its characterization with relevant use cases.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
脚手架识别和命名系统 (SCINS) 的开源实现和应用实例
组织和划分化学结构集具有重要的实际意义,例如在化合物库分析和筛选命中列表的后处理中。无监督聚类等方法对计算要求很高,而且依赖于数据集;另一方面,基于规则的方法(如基于 Murcko 支架的方法)具有线性时间复杂性,但往往过于精细,导致大量单体或稀疏类别的出现。支架识别和命名系统"(SCINS)是另一种基于规则的方法,旨在将化合物分组时实现最佳平衡。为了方便公众使用这一先前已发表的方法,我们在此提供了 SCINS 的开源 Python 实现,仅依赖于 RDKit。我们表明,SCINS 可用于识别大型数据库中化学空间的稀疏和密集区域,这里以 Enamine REAL Diverse 和 ChEMBL 为例。我们发现,与 ChEMBL 相比,Enamine REAL Diverse 涵盖的 SCINS 空间要小得多,而考虑 Murcko 和通用 Murcko 支架时,情况则恰恰相反。此外,我们还发现 SCINS 可以对中等规模的生物活性化合物进行直观的化学分组,这对于从虚拟筛选活动中选择化合物以及对实验结果列表进行后处理非常有用。因此,在这项工作中,我们提供了 SCINS 的开源实现及其相关用例的特征描述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
期刊最新文献
Multimodal Representation Learning via Graph Isomorphism Network for Toxicity Multitask Learning Multirelational Hypergraph Representation Learning for Predicting circRNA-miRNA Associations Ramachandran-like Conformational Space for DNA Exploration of Cryptic Pockets Using Enhanced Sampling Along Normal Modes: A Case Study of KRAS G12D Analysis of Glycan Recognition by Concanavalin A Using Absolute Binding Free Energy Calculations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1