Automatic Feature Selection for Atom-Centered Neural Network Potentials Using a Gradient Boosting Decision Algorithm.

IF 5.7 1区 化学 Q2 CHEMISTRY, PHYSICAL Journal of Chemical Theory and Computation Pub Date : 2024-11-18 DOI:10.1021/acs.jctc.4c01176
Renzhe Li, Jiaqi Wang, Akksay Singh, Bai Li, Zichen Song, Chuan Zhou, Lei Li
{"title":"Automatic Feature Selection for Atom-Centered Neural Network Potentials Using a Gradient Boosting Decision Algorithm.","authors":"Renzhe Li, Jiaqi Wang, Akksay Singh, Bai Li, Zichen Song, Chuan Zhou, Lei Li","doi":"10.1021/acs.jctc.4c01176","DOIUrl":null,"url":null,"abstract":"<p><p>Atom-centered neural network (ANN) potentials have shown high accuracy and computational efficiency in modeling atomic systems. A crucial step in developing reliable ANN potentials is the proper selection of atom-centered symmetry functions (ACSFs), also known as atomic features, to describe atomic environments. Inappropriate selection of ACSFs can lead to poor-quality ANN potentials. Here, we propose a gradient boosting decision tree (GBDT)-based framework for the automatic selection of optimal ACSFs. This framework takes uniformly distributed sets of ACSFs as input and evaluates their relative importance. The ACSFs with high average importance scores are selected and used to train an ANN potential. We applied this method to the Ge system, resulting in an ANN potential with root-mean-square errors (RMSE) of 10.2 meV/atom for energy and 84.8 meV/Å for force predictions, utilizing only 18 ACSFs to achieve a balance between accuracy and computational efficiency. The framework is validated using the grid searching method, demonstrating that ACSFs selected with our framework are in the optimal region. Furthermore, we also compared our method with commonly used feature selection algorithms. The results show that our algorithm outperforms the others in terms of effectiveness and accuracy. This study highlights the significance of the ACSF parameter effect on the ANN performance and presents a promising method for automatic ACSF selection, facilitating the development of machine learning potentials.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.4c01176","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Atom-centered neural network (ANN) potentials have shown high accuracy and computational efficiency in modeling atomic systems. A crucial step in developing reliable ANN potentials is the proper selection of atom-centered symmetry functions (ACSFs), also known as atomic features, to describe atomic environments. Inappropriate selection of ACSFs can lead to poor-quality ANN potentials. Here, we propose a gradient boosting decision tree (GBDT)-based framework for the automatic selection of optimal ACSFs. This framework takes uniformly distributed sets of ACSFs as input and evaluates their relative importance. The ACSFs with high average importance scores are selected and used to train an ANN potential. We applied this method to the Ge system, resulting in an ANN potential with root-mean-square errors (RMSE) of 10.2 meV/atom for energy and 84.8 meV/Å for force predictions, utilizing only 18 ACSFs to achieve a balance between accuracy and computational efficiency. The framework is validated using the grid searching method, demonstrating that ACSFs selected with our framework are in the optimal region. Furthermore, we also compared our method with commonly used feature selection algorithms. The results show that our algorithm outperforms the others in terms of effectiveness and accuracy. This study highlights the significance of the ACSF parameter effect on the ANN performance and presents a promising method for automatic ACSF selection, facilitating the development of machine learning potentials.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用梯度提升决策算法为原子中心神经网络电位自动选择特征
以原子为中心的神经网络(ANN)势在原子系统建模中表现出很高的准确性和计算效率。开发可靠的原子中心神经网络势的关键步骤是正确选择原子中心对称函数(ACSF),也称为原子特征,以描述原子环境。不恰当地选择 ACSFs 会导致劣质的 ANN 电位。在此,我们提出了一种基于梯度提升决策树 (GBDT) 的框架,用于自动选择最佳 ACSF。该框架将均匀分布的 ACSF 作为输入,并评估它们的相对重要性。平均重要度得分高的 ACSF 将被选中并用于训练 ANN 势。我们将这一方法应用于 Ge 系统,结果只用了 18 个 ACSF,就得到了能量均方根误差 (RMSE) 为 10.2 meV/原子和力预测均方根误差 (RMSE) 为 84.8 meV/Å的 ANN 电位,从而实现了准确性和计算效率之间的平衡。我们使用网格搜索法对该框架进行了验证,结果表明用我们的框架选择的 ACSF 都处于最佳区域。此外,我们还将我们的方法与常用的特征选择算法进行了比较。结果表明,我们的算法在有效性和准确性方面都优于其他算法。这项研究强调了 ACSF 参数对 ANN 性能影响的重要性,并提出了一种很有前途的自动 ACSF 选择方法,促进了机器学习潜力的开发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
期刊最新文献
Bayesian Approach for Computing Free Energy on Perturbation Graphs with Cycles. Deterministic and Faster GW Calculations with a Reduced Number of Valence States: O(N2 ln N) Scaling in the Plane-Waves Formalism. The Dynamic Diversity and Invariance of Ab Initio Water. Automatic Feature Selection for Atom-Centered Neural Network Potentials Using a Gradient Boosting Decision Algorithm. Data Quality in the Fitting of Approximate Models: A Computational Chemistry Perspective.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1