Revealing the Relationship between Publication Bias and Chemical Reactivity with Contrastive Learning

IF 15.6 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of the American Chemical Society Pub Date : 2025-03-02 DOI:10.1021/jacs.5c01120
Wenhao Gao, Priyanka Raghavan, Ron Shprints, Connor W. Coley
{"title":"Revealing the Relationship between Publication Bias and Chemical Reactivity with Contrastive Learning","authors":"Wenhao Gao, Priyanka Raghavan, Ron Shprints, Connor W. Coley","doi":"10.1021/jacs.5c01120","DOIUrl":null,"url":null,"abstract":"A synthetic method’s substrate tolerance and generality are often showcased in a “substrate scope” table. However, substrate selection exhibits a frequently discussed publication bias: unsuccessful experiments or low-yielding results are rarely reported. In this work, we explore more deeply the relationship between such a publication bias and chemical reactivity beyond the simple analysis of yield distributions using a novel neural network training strategy, <i>substrate scope contrastive learning</i>. By treating reported substrates as positive samples and nonreported substrates as negative samples, our contrastive learning strategy teaches a model to group molecules within a numerical embedding space, based on historical trends in published substrate scope tables. Training on 20,798 aryl halides in the CAS Content Collection<sup>TM</sup>, spanning thousands of publications from 2010 to 2015, we demonstrate that the learned embeddings exhibit a correlation with physical organic reactivity descriptors through both intuitive visualizations and quantitative regression analyses. Additionally, these embeddings are applicable to various reaction modeling tasks like yield prediction and regioselectivity prediction, underscoring the potential to use historical reaction data as a pretraining task. This work not only presents a chemistry-specific machine learning training strategy to learn from literature data in a new way but also represents a unique approach to uncover trends in chemical reactivity reflected by trends in substrate selection in publications.","PeriodicalId":49,"journal":{"name":"Journal of the American Chemical Society","volume":"15 1","pages":""},"PeriodicalIF":15.6000,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Chemical Society","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/jacs.5c01120","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

A synthetic method’s substrate tolerance and generality are often showcased in a “substrate scope” table. However, substrate selection exhibits a frequently discussed publication bias: unsuccessful experiments or low-yielding results are rarely reported. In this work, we explore more deeply the relationship between such a publication bias and chemical reactivity beyond the simple analysis of yield distributions using a novel neural network training strategy, substrate scope contrastive learning. By treating reported substrates as positive samples and nonreported substrates as negative samples, our contrastive learning strategy teaches a model to group molecules within a numerical embedding space, based on historical trends in published substrate scope tables. Training on 20,798 aryl halides in the CAS Content CollectionTM, spanning thousands of publications from 2010 to 2015, we demonstrate that the learned embeddings exhibit a correlation with physical organic reactivity descriptors through both intuitive visualizations and quantitative regression analyses. Additionally, these embeddings are applicable to various reaction modeling tasks like yield prediction and regioselectivity prediction, underscoring the potential to use historical reaction data as a pretraining task. This work not only presents a chemistry-specific machine learning training strategy to learn from literature data in a new way but also represents a unique approach to uncover trends in chemical reactivity reflected by trends in substrate selection in publications.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过对比学习揭示出版偏差与化学反应性之间的关系
合成方法的衬底公差和通用性通常在“衬底范围”表中显示。然而,底物选择表现出经常讨论的发表偏倚:不成功的实验或低产量的结果很少被报道。在这项工作中,我们使用一种新的神经网络训练策略,即基质范围对比学习,更深入地探讨了这种发表偏倚与化学反应性之间的关系,而不是简单地分析产率分布。通过将已报告的底物作为阳性样本,将未报告的底物作为阴性样本,我们的对比学习策略教导模型根据已公布的底物范围表的历史趋势,在数字嵌入空间内对分子进行分组。通过对CAS Content CollectionTM中的20,798种芳基卤化物(涵盖2010年至2015年的数千种出版物)进行训练,我们通过直观的可视化和定量回归分析证明,学习到的嵌入与物理有机反应性描述符具有相关性。此外,这些嵌入适用于各种反应建模任务,如产率预测和区域选择性预测,强调了使用历史反应数据作为预训练任务的潜力。这项工作不仅提出了一种化学特定的机器学习训练策略,以一种新的方式从文献数据中学习,而且还代表了一种独特的方法来揭示由出版物中底物选择趋势所反映的化学反应性趋势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
24.40
自引率
6.00%
发文量
2398
审稿时长
1.6 months
期刊介绍: The flagship journal of the American Chemical Society, known as the Journal of the American Chemical Society (JACS), has been a prestigious publication since its establishment in 1879. It holds a preeminent position in the field of chemistry and related interdisciplinary sciences. JACS is committed to disseminating cutting-edge research papers, covering a wide range of topics, and encompasses approximately 19,000 pages of Articles, Communications, and Perspectives annually. With a weekly publication frequency, JACS plays a vital role in advancing the field of chemistry by providing essential research.
期刊最新文献
Asymmetric Total Synthesis and Structure Revision of (+)-Mangicol D. Enzyme Catalysis Induced Nanocluster Assembly into Micrometer-Size Monolayered Nanosheets with Enhanced Near-Infrared Region II Emission. Exceptionally Fast Heterogeneous Catalysis Using Perovskite Nanocrystals for Hydrocarbon Aromatization. Gas-Solid Reaction to Rapidly Construct Chiral MOFs for Efficient Enantioselective Sensing. Decoupling Cyanide Activation from C-C Bond Formation in Ni-Catalyzed Cyanation of Strained Ketones Using Benzonitriles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1