Evaluating Predictive Accuracy in Asymmetric Catalysis: A Machine Learning Perspective on Local Reaction Space

IF 13.1 1区 化学 Q1 CHEMISTRY, PHYSICAL ACS Catalysis Pub Date : 2025-03-31 DOI:10.1021/acscatal.5c01051
Isaiah O. Betinol, Aleksandra Demchenko, Jolene P. Reid
{"title":"Evaluating Predictive Accuracy in Asymmetric Catalysis: A Machine Learning Perspective on Local Reaction Space","authors":"Isaiah O. Betinol, Aleksandra Demchenko, Jolene P. Reid","doi":"10.1021/acscatal.5c01051","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) models are increasingly being employed in asymmetric catalysis to predict reaction outcomes and optimize enantioselective processes. Despite the trend of expanding data set sizes to improve model performance, asymmetric catalysis presents unique challenges, including the difficulty of acquiring high-quality experimental data and the often-limited availability of structurally diverse examples. Consequently, rational data set design requires the practitioner to choose whether to collect data that maximizes diversity in the training set or data that maximizes representation around a target prediction. A key challenge in these studies is understanding the role of local reaction space─specifically, how much predictive accuracy is driven by nearest neighbors (structurally and electronically similar data points) and the next-nearest neighbors? This study investigates the predictive power of ML models trained with varying levels of local representation in the reaction space. We provide a framework, a radius-based random forest (RaRF) algorithm, to systematically probe the effects of including diverse reactions dissimilar to a target prediction. We show that when the training set is representative of the target reaction, the gains from increasing data set diversity are modest─typically less than 0.1 kcal/mol in predictive error─and increasing to only 0.5 kcal/mol for extrapolative tests, highlighting the need for targeted data set design. Furthermore, these findings hold even for complex architectures and features. Finally, we demonstrate that a targeted, neighborhood-oriented strategy greatly accelerates the identification of predictive models compared to diversity-driven approaches.","PeriodicalId":9,"journal":{"name":"ACS Catalysis ","volume":"16 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Catalysis ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acscatal.5c01051","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) models are increasingly being employed in asymmetric catalysis to predict reaction outcomes and optimize enantioselective processes. Despite the trend of expanding data set sizes to improve model performance, asymmetric catalysis presents unique challenges, including the difficulty of acquiring high-quality experimental data and the often-limited availability of structurally diverse examples. Consequently, rational data set design requires the practitioner to choose whether to collect data that maximizes diversity in the training set or data that maximizes representation around a target prediction. A key challenge in these studies is understanding the role of local reaction space─specifically, how much predictive accuracy is driven by nearest neighbors (structurally and electronically similar data points) and the next-nearest neighbors? This study investigates the predictive power of ML models trained with varying levels of local representation in the reaction space. We provide a framework, a radius-based random forest (RaRF) algorithm, to systematically probe the effects of including diverse reactions dissimilar to a target prediction. We show that when the training set is representative of the target reaction, the gains from increasing data set diversity are modest─typically less than 0.1 kcal/mol in predictive error─and increasing to only 0.5 kcal/mol for extrapolative tests, highlighting the need for targeted data set design. Furthermore, these findings hold even for complex architectures and features. Finally, we demonstrate that a targeted, neighborhood-oriented strategy greatly accelerates the identification of predictive models compared to diversity-driven approaches.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估不对称催化的预测准确性:局部反应空间的机器学习视角
机器学习(ML)模型越来越多地应用于不对称催化,以预测反应结果和优化对映选择性过程。尽管有扩大数据集规模以提高模型性能的趋势,但不对称催化提出了独特的挑战,包括获取高质量实验数据的困难以及结构多样化示例的可用性通常有限。因此,合理的数据集设计要求实践者选择是收集训练集中最大限度多样性的数据,还是收集围绕目标预测最大限度表示的数据。这些研究中的一个关键挑战是理解局部反应空间的作用──具体来说,是由最近的邻居(结构和电子相似的数据点)和次近邻驱动的预测准确性有多大?本研究探讨了在反应空间中使用不同水平的局部表示训练的ML模型的预测能力。我们提供了一个框架,一个基于半径的随机森林(RaRF)算法,系统地探索包括不同于目标预测的不同反应的影响。我们表明,当训练集代表目标反应时,增加数据集多样性的收益是适度的──预测误差通常小于0.1 kcal/mol──外推测试的收益仅增加到0.5 kcal/mol,突出了目标数据集设计的必要性。此外,这些发现甚至适用于复杂的体系结构和特性。最后,我们证明了与多样性驱动的方法相比,有针对性的、面向社区的策略大大加快了预测模型的识别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACS Catalysis
ACS Catalysis CHEMISTRY, PHYSICAL-
CiteScore
20.80
自引率
6.20%
发文量
1253
审稿时长
1.5 months
期刊介绍: ACS Catalysis is an esteemed journal that publishes original research in the fields of heterogeneous catalysis, molecular catalysis, and biocatalysis. It offers broad coverage across diverse areas such as life sciences, organometallics and synthesis, photochemistry and electrochemistry, drug discovery and synthesis, materials science, environmental protection, polymer discovery and synthesis, and energy and fuels. The scope of the journal is to showcase innovative work in various aspects of catalysis. This includes new reactions and novel synthetic approaches utilizing known catalysts, the discovery or modification of new catalysts, elucidation of catalytic mechanisms through cutting-edge investigations, practical enhancements of existing processes, as well as conceptual advances in the field. Contributions to ACS Catalysis can encompass both experimental and theoretical research focused on catalytic molecules, macromolecules, and materials that exhibit catalytic turnover.
期刊最新文献
Orbital Engineering in Single-Atom Catalysts for Benzene to Phenol Oxidation The Rational Design of Catalyst Surfaces via Crystal Phase-Confined Enrichment Impact of Nitrogen and Other Heteroatoms on Catalytic Cracking of Crude Waste Plastic Pyrolysis Oil for Light-Olefin Production Designing an Enzyme Cascade System for N-Heterocycle Synthesis Electrolyte-Mediated Cu0/Cu+ Interface Stabilization and Interfacial Water Regulation for Enhanced CO2 Electroreduction to Ethylene
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1