为复杂底物的区域选择性预测设计目标特异性数据集

IF 15.6 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of the American Chemical Society Pub Date : 2025-02-21 DOI:10.1021/jacs.4c15902
Jules Schleinitz, Alba Carretero-Cerdán, Anjali Gurajapu, Yonatan Harnik, Gina Lee, Amitesh Pandey, Anat Milo, Sarah E. Reisman
{"title":"为复杂底物的区域选择性预测设计目标特异性数据集","authors":"Jules Schleinitz, Alba Carretero-Cerdán, Anjali Gurajapu, Yonatan Harnik, Gina Lee, Amitesh Pandey, Anat Milo, Sarah E. Reisman","doi":"10.1021/jacs.4c15902","DOIUrl":null,"url":null,"abstract":"The development of machine learning models to predict the regioselectivity of C(sp<sup>3</sup>)–H functionalization reactions is reported. A data set for dioxirane oxidations was curated from the literature and used to generate a model to predict the regioselectivity of C–H oxidation. To assess whether smaller, intentionally designed data sets could provide accuracy on complex targets, a series of acquisition functions were developed to select the most informative molecules for the specific target. Active learning-based acquisition functions that leverage predicted reactivity and model uncertainty were found to outperform those based on molecular and site similarity alone. The use of acquisition functions for data set elaboration significantly reduced the number of data points needed to perform accurate prediction, and it was found that smaller, machine-designed data sets can give accurate predictions when larger, randomly selected data sets fail. Finally, the workflow was experimentally validated on five complex substrates and shown to be applicable to predicting the regioselectivity of arene C–H radical borylation. These studies provide a quantitative alternative to the intuitive extrapolation from “model substrates” that is frequently used to estimate reactivity on complex molecules.","PeriodicalId":49,"journal":{"name":"Journal of the American Chemical Society","volume":"14 1","pages":""},"PeriodicalIF":15.6000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates\",\"authors\":\"Jules Schleinitz, Alba Carretero-Cerdán, Anjali Gurajapu, Yonatan Harnik, Gina Lee, Amitesh Pandey, Anat Milo, Sarah E. Reisman\",\"doi\":\"10.1021/jacs.4c15902\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The development of machine learning models to predict the regioselectivity of C(sp<sup>3</sup>)–H functionalization reactions is reported. A data set for dioxirane oxidations was curated from the literature and used to generate a model to predict the regioselectivity of C–H oxidation. To assess whether smaller, intentionally designed data sets could provide accuracy on complex targets, a series of acquisition functions were developed to select the most informative molecules for the specific target. Active learning-based acquisition functions that leverage predicted reactivity and model uncertainty were found to outperform those based on molecular and site similarity alone. The use of acquisition functions for data set elaboration significantly reduced the number of data points needed to perform accurate prediction, and it was found that smaller, machine-designed data sets can give accurate predictions when larger, randomly selected data sets fail. Finally, the workflow was experimentally validated on five complex substrates and shown to be applicable to predicting the regioselectivity of arene C–H radical borylation. These studies provide a quantitative alternative to the intuitive extrapolation from “model substrates” that is frequently used to estimate reactivity on complex molecules.\",\"PeriodicalId\":49,\"journal\":{\"name\":\"Journal of the American Chemical Society\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":15.6000,\"publicationDate\":\"2025-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Chemical Society\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/jacs.4c15902\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Chemical Society","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/jacs.4c15902","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

本文报道了用于预测C(sp3) -H官能化反应区域选择性的机器学习模型的发展。从文献中整理了二氧环烷氧化的数据集,并用于生成预测C-H氧化区域选择性的模型。为了评估更小的、有意设计的数据集是否能在复杂目标上提供准确性,开发了一系列采集函数来为特定目标选择最具信息量的分子。研究发现,利用预测反应性和模型不确定性的基于主动学习的获取函数优于仅基于分子和位点相似性的获取函数。使用采集函数对数据集进行细化,大大减少了执行准确预测所需的数据点数量,并且发现,当较大的随机选择的数据集失败时,较小的机器设计的数据集可以给出准确的预测。最后,在五种复杂底物上进行了实验验证,并证明该工作流程适用于预测芳烃C-H自由基硼化反应的区域选择性。这些研究提供了一种定量替代从“模型底物”的直观推断,通常用于估计复杂分子的反应性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates
The development of machine learning models to predict the regioselectivity of C(sp3)–H functionalization reactions is reported. A data set for dioxirane oxidations was curated from the literature and used to generate a model to predict the regioselectivity of C–H oxidation. To assess whether smaller, intentionally designed data sets could provide accuracy on complex targets, a series of acquisition functions were developed to select the most informative molecules for the specific target. Active learning-based acquisition functions that leverage predicted reactivity and model uncertainty were found to outperform those based on molecular and site similarity alone. The use of acquisition functions for data set elaboration significantly reduced the number of data points needed to perform accurate prediction, and it was found that smaller, machine-designed data sets can give accurate predictions when larger, randomly selected data sets fail. Finally, the workflow was experimentally validated on five complex substrates and shown to be applicable to predicting the regioselectivity of arene C–H radical borylation. These studies provide a quantitative alternative to the intuitive extrapolation from “model substrates” that is frequently used to estimate reactivity on complex molecules.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
24.40
自引率
6.00%
发文量
2398
审稿时长
1.6 months
期刊介绍: The flagship journal of the American Chemical Society, known as the Journal of the American Chemical Society (JACS), has been a prestigious publication since its establishment in 1879. It holds a preeminent position in the field of chemistry and related interdisciplinary sciences. JACS is committed to disseminating cutting-edge research papers, covering a wide range of topics, and encompasses approximately 19,000 pages of Articles, Communications, and Perspectives annually. With a weekly publication frequency, JACS plays a vital role in advancing the field of chemistry by providing essential research.
期刊最新文献
Issue Editorial Masthead Issue Publication Information Aminyl Radical-Enabled Photoredox/Nickel-Catalyzed C(sp3)–C(sp3) Suzuki–Miyaura Cross-Coupling via Halogen-Atom Transfer Strategy Perfectly Alternating Copolymerizations of Methacrylates and Coumarin to Robust C–C Main-Chain Polymers with Full Degradability and Depolymerizability Synergy between Brønsted Acid Sites and Cu/Zn Single-Atom Sites in Zeolite for the Direct Oxidation of Methane
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1