Improving Bond Dissociations of Reactive Machine Learning Potentials through Physics-Constrained Data Augmentation.

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2025-01-28 DOI:10.1021/acs.jcim.4c01847
Luan G F Dos Santos,Benjamin T Nebgen,Alice E A Allen,Brenden W Hamilton,Sakib Matin,Justin S Smith,Richard A Messerly
{"title":"Improving Bond Dissociations of Reactive Machine Learning Potentials through Physics-Constrained Data Augmentation.","authors":"Luan G F Dos Santos,Benjamin T Nebgen,Alice E A Allen,Brenden W Hamilton,Sakib Matin,Justin S Smith,Richard A Messerly","doi":"10.1021/acs.jcim.4c01847","DOIUrl":null,"url":null,"abstract":"In the field of computational chemistry, predicting bond dissociation energies (BDEs) presents well-known challenges, particularly due to the multireference character of reactive systems. Many chemical reactions involve configurations where single-reference methods fall short, as the electronic structure can significantly change during bond breaking. As generating training data for partially broken bonds is a challenging task, even state-of-the-art reactive machine learning interatomic potentials (MLIPs) often fail to predict reliable BDEs and smooth dissociation curves. By contrast, simple and inexpensive physics-based models, such as the well-established Morse potential, do not suffer from any such limitations. This work leverages the Morse potential to improve reactive MLIPs by augmenting the training data set with inexpensive Morse data along the dissociation pathways. This physics-constrained data augmentation (PCDA) approach results in MLIPs with smooth bond dissociation curves as well as near coupled-cluster level BDEs, all without requiring any expensive multireference quantum mechanical calculations. A case study for methane combustion demonstrates how the PCDA approach can improve an existing reactive MLIP, namely, ANI-1xnr. Not only are the BDEs and bond dissociation curves for all radicals and molecules significantly improved compared to ANI-1xnr but the PCDA-trained MLIP retains the reliability of ANI-1xnr when performing reactive molecular dynamics simulations.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"79 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01847","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

In the field of computational chemistry, predicting bond dissociation energies (BDEs) presents well-known challenges, particularly due to the multireference character of reactive systems. Many chemical reactions involve configurations where single-reference methods fall short, as the electronic structure can significantly change during bond breaking. As generating training data for partially broken bonds is a challenging task, even state-of-the-art reactive machine learning interatomic potentials (MLIPs) often fail to predict reliable BDEs and smooth dissociation curves. By contrast, simple and inexpensive physics-based models, such as the well-established Morse potential, do not suffer from any such limitations. This work leverages the Morse potential to improve reactive MLIPs by augmenting the training data set with inexpensive Morse data along the dissociation pathways. This physics-constrained data augmentation (PCDA) approach results in MLIPs with smooth bond dissociation curves as well as near coupled-cluster level BDEs, all without requiring any expensive multireference quantum mechanical calculations. A case study for methane combustion demonstrates how the PCDA approach can improve an existing reactive MLIP, namely, ANI-1xnr. Not only are the BDEs and bond dissociation curves for all radicals and molecules significantly improved compared to ANI-1xnr but the PCDA-trained MLIP retains the reliability of ANI-1xnr when performing reactive molecular dynamics simulations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过物理约束数据增强改善反应性机器学习电位的键解离。
在计算化学领域,预测键离解能(BDEs)提出了众所周知的挑战,特别是由于反应系统的多参考特性。许多化学反应涉及到单参考方法无法达到的构型,因为电子结构在断键过程中会发生显著变化。由于生成部分断裂键的训练数据是一项具有挑战性的任务,即使是最先进的反应性机器学习原子间势(MLIPs)也经常无法预测可靠的bde和平滑的解离曲线。相比之下,简单而廉价的基于物理的模型,如公认的莫尔斯电势,就不会受到任何这样的限制。这项工作利用莫尔斯电位,通过增加沿解离路径的廉价莫尔斯数据的训练数据集来改善反应性mlip。这种物理约束的数据增强(PCDA)方法可以产生具有光滑键解离曲线的mlip以及接近耦合簇水平的bde,所有这些都不需要任何昂贵的多参考量子力学计算。甲烷燃烧的一个案例研究展示了PCDA方法如何改进现有的反应性MLIP,即ANI-1xnr。与ANI-1xnr相比,不仅所有自由基和分子的BDEs和键解离曲线显著改善,而且在进行反应性分子动力学模拟时,pcda训练的MLIP保留了ANI-1xnr的可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
期刊最新文献
RinPy, a Python Package for Residue Interaction Network Model to Analyze Protein Structures and Predict Ligand Binding Sites. DUW-MGCA: A Dynamic Uncertainty-Weighted Multi-Granularity Coattention Framework for Protein–Ligand Interaction Prediction from Hybrid QM/MD Data Controllable Protein Design by Prefix-Tuning Protein Language Models Benchmarking pKa Prediction Algorithms against an Extensive, Public Data Set. Estimating Protein Conformational States from High-Speed AFM Images with Molecular Dynamics and Deep Learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1