Data-driven approaches for identifying hyperparameters in multi-step retrosynthesis.

IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Molecular Informatics Pub Date : 2023-11-01 Epub Date: 2023-09-27 DOI:10.1002/minf.202300128
Annie M Westerlund, Bente Barge, Lewis Mervin, Samuel Genheden
{"title":"Data-driven approaches for identifying hyperparameters in multi-step retrosynthesis.","authors":"Annie M Westerlund, Bente Barge, Lewis Mervin, Samuel Genheden","doi":"10.1002/minf.202300128","DOIUrl":null,"url":null,"abstract":"<p><p>The multi-step retrosynthesis problem can be solved by a search algorithm, such as Monte Carlo tree search (MCTS). The performance of multistep retrosynthesis, as measured by a trade-off in search time and route solvability, therefore depends on the hyperparameters of the search algorithm. In this paper, we demonstrated the effect of three MCTS hyperparameters (number of iterations, tree depth, and tree width) on metrics such as Linear integrated speed-accuracy score (LISAS) and Inverse efficiency score which consider both route solvability and search time. This exploration was conducted by employing three data-driven approaches, namely a systematic grid search, Bayesian optimization over an ensemble of molecules to obtain static MCTS hyperparameters, and a machine learning approach to dynamically predict optimal MCTS hyperparameters given an input target molecule. With the obtained results on the internal dataset, we demonstrated that it is possible to identify a hyperparameter set which outperforms the current AiZynthFinder default setting. It appeared optimal across a variety of target input molecules, both on proprietary and public datasets. The settings identified with the in-house dataset reached a solvability of 93 % and median search time of 151 s for the in-house dataset, and a 74 % solvability and 114 s for the ChEMBL dataset. These numbers can be compared to the current default settings which solved 85 % and 73 % during a median time of 110s and 84 s, for in-house and ChEMBL, respectively.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300128"},"PeriodicalIF":2.8000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202300128","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/27 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

The multi-step retrosynthesis problem can be solved by a search algorithm, such as Monte Carlo tree search (MCTS). The performance of multistep retrosynthesis, as measured by a trade-off in search time and route solvability, therefore depends on the hyperparameters of the search algorithm. In this paper, we demonstrated the effect of three MCTS hyperparameters (number of iterations, tree depth, and tree width) on metrics such as Linear integrated speed-accuracy score (LISAS) and Inverse efficiency score which consider both route solvability and search time. This exploration was conducted by employing three data-driven approaches, namely a systematic grid search, Bayesian optimization over an ensemble of molecules to obtain static MCTS hyperparameters, and a machine learning approach to dynamically predict optimal MCTS hyperparameters given an input target molecule. With the obtained results on the internal dataset, we demonstrated that it is possible to identify a hyperparameter set which outperforms the current AiZynthFinder default setting. It appeared optimal across a variety of target input molecules, both on proprietary and public datasets. The settings identified with the in-house dataset reached a solvability of 93 % and median search time of 151 s for the in-house dataset, and a 74 % solvability and 114 s for the ChEMBL dataset. These numbers can be compared to the current default settings which solved 85 % and 73 % during a median time of 110s and 84 s, for in-house and ChEMBL, respectively.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于识别多步骤逆转录合成中的超参数的数据驱动方法。
多步逆合成问题可以通过搜索算法来解决,例如蒙特卡罗树搜索(MCTS)。因此,通过搜索时间和路径可解性的权衡来衡量多步逆合成的性能,取决于搜索算法的超参数。在本文中,我们证明了三个MCTS超参数(迭代次数、树深度和树宽度)对线性综合速度精度分数(LISAS)和反向效率分数等指标的影响,这些指标同时考虑了路径可解性和搜索时间。这项探索是通过采用三种数据驱动的方法进行的,即系统网格搜索、对分子集合进行贝叶斯优化以获得静态MCTS超参数,以及在给定输入目标分子的情况下动态预测最佳MCTS超参数的机器学习方法。根据在内部数据集上获得的结果,我们证明了可以识别出一个超参数集,该超参数集的性能优于当前的AiZynhFinder默认设置。在专有和公共数据集上,它似乎在各种目标输入分子中都是最优的。内部数据集确定的设置达到了93的可解性 % 搜索时间中位数为151 s表示内部数据集,74 % 可解性和114 s用于ChEMBL数据集。这些数字可以与当前默认设置进行比较,解决了85 % 和73 % 在110秒和84秒的中间时间内 s、 分别用于内部和ChEMBL。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Molecular Informatics
Molecular Informatics CHEMISTRY, MEDICINAL-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.30
自引率
2.80%
发文量
70
审稿时长
3 months
期刊介绍: Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010. Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation. The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.
期刊最新文献
Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions. BIOMX-DB: A web application for the BIOFACQUIM natural product database. Chemoinformatics for corrosion science: Data-driven modeling of corrosion inhibition by organic molecules. My 50 Years with Chemoinformatics. Pathway-based prediction of the therapeutic effects and mode of action of custom-made multiherbal medicines.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1