Improving route development using convergent retrosynthesis planning

IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of Cheminformatics Pub Date : 2025-02-27 DOI:10.1186/s13321-025-00953-1
Paula Torren-Peraire, Jonas Verhoeven, Dorota Herman, Hugo Ceulemans, Igor V. Tetko, Jörg K. Wegner
{"title":"Improving route development using convergent retrosynthesis planning","authors":"Paula Torren-Peraire,&nbsp;Jonas Verhoeven,&nbsp;Dorota Herman,&nbsp;Hugo Ceulemans,&nbsp;Igor V. Tetko,&nbsp;Jörg K. Wegner","doi":"10.1186/s13321-025-00953-1","DOIUrl":null,"url":null,"abstract":"<div><p>Retrosynthesis consists of recursively breaking down a target molecule to produce a synthesis route composed of readily accessible building blocks. In recent years, computer-aided synthesis planning methods have allowed a greater exploration of potential synthesis routes, combining state-of-the-art machine-learning methods with chemical knowledge. However, these methods are generally developed to produce individual routes from a singular product to a set of proposed building blocks and are not designed to leverage potential shared paths between targets. These methods do not necessarily encompass real-world use cases in medicinal chemistry, where one seeks to synthesize sets of target compounds in a library mode, looking for maximal convergence into a shared retrosynthetic path going via advanced key intermediate compounds. Using a graph-based processing pipeline, we explore Johnson &amp; Johnson Electronic Laboratory Notebooks (J&amp;J ELN) and publicly available datasets to identify complex routes with multiple target molecules sharing common intermediates, producing convergent synthesis routes. We find that over 70% of all reactions are involved in convergent synthesis, covering over 80% of all projects in the case of J&amp;J ELN data. </p><p><b>Scientific contribution</b></p><p>We introduce a novel planning approach to develop convergent synthesis routes, which can search multiple products and intermediates simultaneously guided by state-of-the-art machine learning single-step retrosynthesis models, enhancing the overall efficiency and practical applicability of retrosynthetic planning. We evaluate the multi-step synthesis planning approach using the extracted convergent routes and observe that solvability is generally high across those routes, being able to identify a convergent route for over 80% of the test routes and showing an individual compound solvability of over 90%. We find that by using a convergent search approach, we can synthesize almost 30% more compounds simultaneously for J&amp;J ELN as compared to using an individual search, while providing an increased use of common intermediates.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00953-1","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-00953-1","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Retrosynthesis consists of recursively breaking down a target molecule to produce a synthesis route composed of readily accessible building blocks. In recent years, computer-aided synthesis planning methods have allowed a greater exploration of potential synthesis routes, combining state-of-the-art machine-learning methods with chemical knowledge. However, these methods are generally developed to produce individual routes from a singular product to a set of proposed building blocks and are not designed to leverage potential shared paths between targets. These methods do not necessarily encompass real-world use cases in medicinal chemistry, where one seeks to synthesize sets of target compounds in a library mode, looking for maximal convergence into a shared retrosynthetic path going via advanced key intermediate compounds. Using a graph-based processing pipeline, we explore Johnson & Johnson Electronic Laboratory Notebooks (J&J ELN) and publicly available datasets to identify complex routes with multiple target molecules sharing common intermediates, producing convergent synthesis routes. We find that over 70% of all reactions are involved in convergent synthesis, covering over 80% of all projects in the case of J&J ELN data.

Scientific contribution

We introduce a novel planning approach to develop convergent synthesis routes, which can search multiple products and intermediates simultaneously guided by state-of-the-art machine learning single-step retrosynthesis models, enhancing the overall efficiency and practical applicability of retrosynthetic planning. We evaluate the multi-step synthesis planning approach using the extracted convergent routes and observe that solvability is generally high across those routes, being able to identify a convergent route for over 80% of the test routes and showing an individual compound solvability of over 90%. We find that by using a convergent search approach, we can synthesize almost 30% more compounds simultaneously for J&J ELN as compared to using an individual search, while providing an increased use of common intermediates.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用收敛式逆合成规划改进路线开发
反转录合成包括递归地分解目标分子,以产生由易于获取的构建块组成的合成路线。近年来,计算机辅助合成规划方法将最先进的机器学习方法与化学知识相结合,可以对潜在的合成路线进行更大的探索。然而,这些方法通常被开发为产生从单一产品到一组提议的构建块的单独路线,而不是设计为利用目标之间潜在的共享路径。这些方法不一定包含药物化学中的实际用例,在药物化学中,人们寻求以文库模式合成一系列目标化合物,通过高级关键中间化合物寻找最大的收敛到共享的反合成路径。使用基于图形的处理管道,我们探索了Johnson &;Johnson电子实验室笔记本(J&J ELN)和公开可用的数据集,以识别具有多个目标分子共享共同中间体的复杂路线,产生聚合合成路线。我们发现超过70%的反应涉及收敛合成,在J&; jeln数据的情况下,超过80%的项目都涉及收敛合成。本文提出了一种新的规划方法来开发收敛合成路线,该路线可以在最先进的机器学习单步反合成模型的指导下同时搜索多个产品和中间体,提高了反合成规划的整体效率和实用性。我们使用提取的收敛路径对多步综合规划方法进行了评估,并观察到这些路径的可解性通常很高,能够识别超过80%的测试路线的收敛路径,并且显示出超过90%的单个化合物可解性。我们发现,通过使用收敛搜索方法,与使用单个搜索相比,我们可以同时为J&; jeln合成近30%的化合物,同时增加了常见中间体的使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
期刊最新文献
Predicting toxicity and bioactivity of the chemical exposome: a case study for the blood exposome database. A light-weight Graph Neural Network for the prediction of 31P Nuclear Magnetic Resonance signals. Evolve with your research: stepwise system evolution from document-driven to fact-centric research data management in materials science. Graph latent diffusion-based molecular representation learning for enhanced generalization in molecular property prediction. Molecular embedding-based algorithm selection in protein-ligand docking.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1