ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials.

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Digital discovery Pub Date : 2024-10-30 DOI:10.1039/d4dd00209a
Rolf David, Miguel de la Puente, Axel Gomez, Olaia Anton, Guillaume Stirnemann, Damien Laage
{"title":"ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials.","authors":"Rolf David, Miguel de la Puente, Axel Gomez, Olaia Anton, Guillaume Stirnemann, Damien Laage","doi":"10.1039/d4dd00209a","DOIUrl":null,"url":null,"abstract":"<p><p>The emergence of artificial intelligence is profoundly impacting computational chemistry, particularly through machine-learning interatomic potentials (MLIPs). Unlike traditional potential energy surface representations, MLIPs overcome the conventional computational scaling limitations by offering an effective combination of accuracy and efficiency for calculating atomic energies and forces to be used in molecular simulations. These MLIPs have significantly enhanced molecular simulations across various applications, including large-scale simulations of materials, interfaces, chemical reactions, and beyond. Despite these advances, the construction of training datasets-a critical component for the accuracy of MLIPs-has not received proportional attention, especially in the context of chemical reactivity, which depends on rare barrier-crossing events that are not easily included in the datasets. Here we address this gap by introducing ArcaNN, a comprehensive framework designed for generating training datasets for reactive MLIPs. ArcaNN employs a concurrent learning approach combined with advanced sampling techniques to ensure an accurate representation of high-energy geometries. The framework integrates automated processes for iterative training, exploration, new configuration selection, and energy and force labeling, all while ensuring reproducibility and documentation. We demonstrate ArcaNN's capabilities through two paradigm reactions: a nucleophilic substitution and a Diels-Alder reaction. These examples showcase its effectiveness, the uniformly low error of the resulting MLIP everywhere along the chemical reaction coordinate, and its potential for broad applications in reactive molecular dynamics. Finally, we provide guidelines for assessing the quality of MLIPs in reactive systems.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" ","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11563209/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1039/d4dd00209a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The emergence of artificial intelligence is profoundly impacting computational chemistry, particularly through machine-learning interatomic potentials (MLIPs). Unlike traditional potential energy surface representations, MLIPs overcome the conventional computational scaling limitations by offering an effective combination of accuracy and efficiency for calculating atomic energies and forces to be used in molecular simulations. These MLIPs have significantly enhanced molecular simulations across various applications, including large-scale simulations of materials, interfaces, chemical reactions, and beyond. Despite these advances, the construction of training datasets-a critical component for the accuracy of MLIPs-has not received proportional attention, especially in the context of chemical reactivity, which depends on rare barrier-crossing events that are not easily included in the datasets. Here we address this gap by introducing ArcaNN, a comprehensive framework designed for generating training datasets for reactive MLIPs. ArcaNN employs a concurrent learning approach combined with advanced sampling techniques to ensure an accurate representation of high-energy geometries. The framework integrates automated processes for iterative training, exploration, new configuration selection, and energy and force labeling, all while ensuring reproducibility and documentation. We demonstrate ArcaNN's capabilities through two paradigm reactions: a nucleophilic substitution and a Diels-Alder reaction. These examples showcase its effectiveness, the uniformly low error of the resulting MLIP everywhere along the chemical reaction coordinate, and its potential for broad applications in reactive molecular dynamics. Finally, we provide guidelines for assessing the quality of MLIPs in reactive systems.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ArcaNN:化学反应机器学习原子间电位训练集的自动增强采样生成。
人工智能的出现正在对计算化学产生深远影响,特别是通过机器学习原子间势能(MLIP)。与传统的势能面表示法不同,MLIP 克服了传统计算规模的限制,有效地结合了计算分子模拟中使用的原子能量和力的精度和效率。这些 MLIPs 极大地增强了各种应用中的分子模拟,包括材料、界面、化学反应等的大规模模拟。尽管取得了这些进步,但训练数据集的构建--MLIPs 准确性的关键组成部分--并没有得到相应的关注,尤其是在化学反应性的背景下,因为化学反应性依赖于罕见的越障事件,而这些事件并不容易包含在数据集中。在此,我们通过介绍 ArcaNN 来填补这一空白,ArcaNN 是一个综合框架,旨在为反应性 MLIP 生成训练数据集。ArcaNN 采用并发学习方法与先进的采样技术相结合,以确保准确呈现高能几何图形。该框架集成了用于迭代训练、探索、新配置选择以及能量和力标记的自动化流程,同时确保了可重复性和文档记录。我们通过两个范例反应展示了 ArcaNN 的功能:亲核取代反应和 Diels-Alder 反应。这些例子展示了 ArcaNN 的有效性、沿化学反应坐标各处生成的 MLIP 的均匀低误差,以及在反应分子动力学中的广泛应用潜力。最后,我们提供了评估反应体系中 MLIP 质量的指南。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
期刊最新文献
Back cover ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials. Sorting polyolefins with near-infrared spectroscopy: identification of optimal data analysis pipelines and machine learning classifiers†‡ High accuracy uncertainty-aware interatomic force modeling with equivariant Bayesian neural networks† Correction: A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1