Hilbert-curve assisted structure embedding method

IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of Cheminformatics Pub Date : 2024-07-29 DOI:10.1186/s13321-024-00850-z
Gergely Zahoránszky-Kőhalmi, Kanny K. Wan, Alexander G. Godfrey
{"title":"Hilbert-curve assisted structure embedding method","authors":"Gergely Zahoránszky-Kőhalmi,&nbsp;Kanny K. Wan,&nbsp;Alexander G. Godfrey","doi":"10.1186/s13321-024-00850-z","DOIUrl":null,"url":null,"abstract":"<div><h3>Motivation</h3><p>Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret, and the ‘‘landscape’’ on the map is prone to ‘‘rearrangement’’ when embedding different sets of compounds.</p><h3>Results</h3><p>In this study we present the Hilbert-Curve Assisted Space Embedding (HCASE) method which was designed to create maps by organizing structures according to a logic familiar to medicinal chemists. First, a chemical space is created with the help of a set of ‘‘reference scaffolds’’. These scaffolds are sorted according to the medicinal chemistry inspired Scaffold-Key algorithm found in prior art. Next, the ordered scaffolds are mapped to a line which is folded into a higher dimensional (here: 2D) space. The intricately folded line is referred to as a pseudo-Hilbert-Curve. The embedding of a compound happens by locating its most similar reference scaffold in the pseudo-Hilbert-Curve and assuming the respective position. Through a series of experiments, we demonstrate the properties of the maps generated by the HCASE method. Subjects of embeddings were compounds of the DrugBank and CANVASS libraries, and the chemical spaces were defined by scaffolds extracted from the ChEMBL database.</p><h3>Scientific contribution</h3><p>The novelty of HCASE method lies in generating robust and intuitive chemical space embeddings that are reflective of a medicinal chemist’s reasoning, and the precedential use of space filling (Hilbert) curve in the process.</p><h3>Availability</h3><p>https://github.com/ncats/hcase</p><h3>Graphical Abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00850-z","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00850-z","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation

Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret, and the ‘‘landscape’’ on the map is prone to ‘‘rearrangement’’ when embedding different sets of compounds.

Results

In this study we present the Hilbert-Curve Assisted Space Embedding (HCASE) method which was designed to create maps by organizing structures according to a logic familiar to medicinal chemists. First, a chemical space is created with the help of a set of ‘‘reference scaffolds’’. These scaffolds are sorted according to the medicinal chemistry inspired Scaffold-Key algorithm found in prior art. Next, the ordered scaffolds are mapped to a line which is folded into a higher dimensional (here: 2D) space. The intricately folded line is referred to as a pseudo-Hilbert-Curve. The embedding of a compound happens by locating its most similar reference scaffold in the pseudo-Hilbert-Curve and assuming the respective position. Through a series of experiments, we demonstrate the properties of the maps generated by the HCASE method. Subjects of embeddings were compounds of the DrugBank and CANVASS libraries, and the chemical spaces were defined by scaffolds extracted from the ChEMBL database.

Scientific contribution

The novelty of HCASE method lies in generating robust and intuitive chemical space embeddings that are reflective of a medicinal chemist’s reasoning, and the precedential use of space filling (Hilbert) curve in the process.

Availability

https://github.com/ncats/hcase

Graphical Abstract

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
希尔伯特曲线辅助结构嵌入法
化学空间嵌入方法在各种研究环境中被广泛用于降维、聚类和有效可视化。嵌入过程生成的图谱可以为药物化学家提供化合物结构、物理化学和生物特性之间关系的宝贵见解。然而,众所周知,这些图谱难以解释,而且在嵌入不同化合物集时,图谱上的 "景观 "容易发生 "重新排列"。在本研究中,我们介绍了希尔伯特曲线辅助空间嵌入(HCASE)方法,该方法旨在根据药物化学家熟悉的逻辑组织结构,从而创建地图。首先,借助一组 "参考支架 "创建化学空间。这些支架根据现有技术中受药物化学启发的 "支架-键 "算法进行排序。接下来,有序的支架被映射到一条折叠到更高维度(此处为二维)空间的线上。错综复杂的折叠线被称为伪希尔伯特曲线。化合物的嵌入是通过在伪希尔伯特曲线中找到其最相似的参考支架并假设相应的位置来实现的。通过一系列实验,我们证明了 HCASE 方法生成的图谱的特性。嵌入的对象是 DrugBank 和 CANVASS 库中的化合物,化学空间由 ChEMBL 数据库中提取的支架定义。HCASE 方法的新颖之处在于能生成反映药物化学家推理的稳健而直观的化学空间嵌入图,并在此过程中优先使用了空间填充(希尔伯特)曲线。https://github.com/ncats/hcase。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
期刊最新文献
GT-NMR: a novel graph transformer-based approach for accurate prediction of NMR chemical shifts Suitability of large language models for extraction of high-quality chemical reaction dataset from patent literature Molecular identification via molecular fingerprint extraction from atomic force microscopy images A systematic review of deep learning chemical language models in recent era Accelerated hit identification with target evaluation, deep learning and automated labs: prospective validation in IRAK1
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1