Reaction Impurity Prediction using a Data Mining Approach**

IF 3.6 Q1 CHEMISTRY, MULTIDISCIPLINARY Chemistry methods : new approaches to solving problems in chemistry Pub Date : 2023-03-27 DOI:10.1002/cmtd.202200062

Adarsh Arun, Dr. Zhen Guo, Dr. Simon Sung, Prof. Alexei A. Lapkin

{"title":"Reaction Impurity Prediction using a Data Mining Approach**","authors":"Adarsh Arun, Dr. Zhen Guo, Dr. Simon Sung, Prof. Alexei A. Lapkin","doi":"10.1002/cmtd.202200062","DOIUrl":null,"url":null,"abstract":"<p>Automated prediction of reaction impurities is useful in early-stage reaction development, synthesis planning and optimization. Existing reaction predictors are catered towards <i>main</i> product prediction, and are often black-box, making it difficult to troubleshoot erroneous outcomes. This work aims to present an automated, interpretable impurity prediction workflow based on data mining large chemical reaction databases. A 14-step workflow was implemented in Python and RDKit using Reaxys® data. Evaluation of potential chemical reactions between functional groups present in the same reaction environment in the user-supplied query species can be accurately performed by directly mining the Reaxys® database for similar or ‘analogue’ reactions involving these functional groups. Reaction templates can then be extracted from analogue reactions and applied to the relevant species in the original query to return impurities and transformations of interest. Three proof-of-concept case studies (paracetamol, agomelatine and lersivirine) were conducted, with the workflow correctly suggesting impurities within the top two outcomes. At all stages, suggested impurities can be traced back to the originating template and analogue reaction in the literature, allowing for closer inspection and user validation. Ultimately, this work could be useful as a benchmark for more sophisticated algorithms or models since it is interpretable, as opposed to purely black-box solutions.</p>","PeriodicalId":72562,"journal":{"name":"Chemistry methods : new approaches to solving problems in chemistry","volume":"3 6","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cmtd.202200062","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemistry methods : new approaches to solving problems in chemistry","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cmtd.202200062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 1

Abstract

Automated prediction of reaction impurities is useful in early-stage reaction development, synthesis planning and optimization. Existing reaction predictors are catered towards main product prediction, and are often black-box, making it difficult to troubleshoot erroneous outcomes. This work aims to present an automated, interpretable impurity prediction workflow based on data mining large chemical reaction databases. A 14-step workflow was implemented in Python and RDKit using Reaxys® data. Evaluation of potential chemical reactions between functional groups present in the same reaction environment in the user-supplied query species can be accurately performed by directly mining the Reaxys® database for similar or ‘analogue’ reactions involving these functional groups. Reaction templates can then be extracted from analogue reactions and applied to the relevant species in the original query to return impurities and transformations of interest. Three proof-of-concept case studies (paracetamol, agomelatine and lersivirine) were conducted, with the workflow correctly suggesting impurities within the top two outcomes. At all stages, suggested impurities can be traced back to the originating template and analogue reaction in the literature, allowing for closer inspection and user validation. Ultimately, this work could be useful as a benchmark for more sophisticated algorithms or models since it is interpretable, as opposed to purely black-box solutions.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用数据挖掘方法预测反应杂质**

反应杂质的自动预测在早期反应开发、合成规划和优化中是有用的。现有的反应预测因子是针对主要产品预测的，并且通常是黑匣子，因此很难排除错误结果。这项工作旨在提出一种基于数据挖掘的大型化学反应数据库的自动化、可解释的杂质预测工作流程。使用Reaxys®数据在Python和RDKit中实现了14步工作流程。通过直接挖掘Reaxys®数据库中涉及这些官能团的类似或“类似”反应，可以准确评估用户提供的查询物种中相同反应环境中存在的官能团之间的潜在化学反应。然后可以从类似反应中提取反应模板，并将其应用于原始查询中的相关物种，以返回感兴趣的杂质和转化。进行了三项概念验证案例研究（扑热息痛、阿戈美拉汀和乐西韦林），工作流程正确地表明前两项结果中存在杂质。在所有阶段，建议的杂质都可以追溯到文献中的原始模板和类似物反应，以便进行更仔细的检查和用户验证。最终，这项工作可以作为更复杂算法或模型的基准，因为它是可解释的，而不是纯粹的黑盒解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Chemistry methods : new approaches to solving problems in chemistry

CiteScore

7.30

自引率

0.00%

发文量