{"title":"高保真图深度学习原子间位势的数据高效构建","authors":"Tsz Wai Ko, Shyue Ping Ong","doi":"arxiv-2409.00957","DOIUrl":null,"url":null,"abstract":"Machine learning potentials (MLPs) have become an indispensable tool in\nlarge-scale atomistic simulations because of their ability to reproduce ab\ninitio potential energy surfaces (PESs) very accurately at a fraction of\ncomputational cost. For computational efficiency, the training data for most\nMLPs today are computed using relatively cheap density functional theory (DFT)\nmethods such as the Perdew-Burke-Ernzerhof (PBE) generalized gradient\napproximation (GGA) functional. Meta-GGAs such as the recently developed\nstrongly constrained and appropriately normed (SCAN) functional have been shown\nto yield significantly improved descriptions of atomic interactions for\ndiversely bonded systems, but their higher computational cost remains an\nimpediment to their use in MLP development. In this work, we outline a\ndata-efficient multi-fidelity approach to constructing Materials 3-body Graph\nNetwork (M3GNet) interatomic potentials that integrate different levels of\ntheory within a single model. Using silicon and water as examples, we show that\na multi-fidelity M3GNet model trained on a combined dataset of low-fidelity GGA\ncalculations with 10% of high-fidelity SCAN calculations can achieve accuracies\ncomparable to a single-fidelity M3GNet model trained on a dataset comprising 8x\nthe number of SCAN calculations. This work paves the way for the development of\nhigh-fidelity MLPs in a cost-effective manner by leveraging existing\nlow-fidelity datasets.","PeriodicalId":501369,"journal":{"name":"arXiv - PHYS - Computational Physics","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-Efficient Construction of High-Fidelity Graph Deep Learning Interatomic Potentials\",\"authors\":\"Tsz Wai Ko, Shyue Ping Ong\",\"doi\":\"arxiv-2409.00957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning potentials (MLPs) have become an indispensable tool in\\nlarge-scale atomistic simulations because of their ability to reproduce ab\\ninitio potential energy surfaces (PESs) very accurately at a fraction of\\ncomputational cost. For computational efficiency, the training data for most\\nMLPs today are computed using relatively cheap density functional theory (DFT)\\nmethods such as the Perdew-Burke-Ernzerhof (PBE) generalized gradient\\napproximation (GGA) functional. Meta-GGAs such as the recently developed\\nstrongly constrained and appropriately normed (SCAN) functional have been shown\\nto yield significantly improved descriptions of atomic interactions for\\ndiversely bonded systems, but their higher computational cost remains an\\nimpediment to their use in MLP development. In this work, we outline a\\ndata-efficient multi-fidelity approach to constructing Materials 3-body Graph\\nNetwork (M3GNet) interatomic potentials that integrate different levels of\\ntheory within a single model. Using silicon and water as examples, we show that\\na multi-fidelity M3GNet model trained on a combined dataset of low-fidelity GGA\\ncalculations with 10% of high-fidelity SCAN calculations can achieve accuracies\\ncomparable to a single-fidelity M3GNet model trained on a dataset comprising 8x\\nthe number of SCAN calculations. This work paves the way for the development of\\nhigh-fidelity MLPs in a cost-effective manner by leveraging existing\\nlow-fidelity datasets.\",\"PeriodicalId\":501369,\"journal\":{\"name\":\"arXiv - PHYS - Computational Physics\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Computational Physics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.00957\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Computational Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data-Efficient Construction of High-Fidelity Graph Deep Learning Interatomic Potentials
Machine learning potentials (MLPs) have become an indispensable tool in
large-scale atomistic simulations because of their ability to reproduce ab
initio potential energy surfaces (PESs) very accurately at a fraction of
computational cost. For computational efficiency, the training data for most
MLPs today are computed using relatively cheap density functional theory (DFT)
methods such as the Perdew-Burke-Ernzerhof (PBE) generalized gradient
approximation (GGA) functional. Meta-GGAs such as the recently developed
strongly constrained and appropriately normed (SCAN) functional have been shown
to yield significantly improved descriptions of atomic interactions for
diversely bonded systems, but their higher computational cost remains an
impediment to their use in MLP development. In this work, we outline a
data-efficient multi-fidelity approach to constructing Materials 3-body Graph
Network (M3GNet) interatomic potentials that integrate different levels of
theory within a single model. Using silicon and water as examples, we show that
a multi-fidelity M3GNet model trained on a combined dataset of low-fidelity GGA
calculations with 10% of high-fidelity SCAN calculations can achieve accuracies
comparable to a single-fidelity M3GNet model trained on a dataset comprising 8x
the number of SCAN calculations. This work paves the way for the development of
high-fidelity MLPs in a cost-effective manner by leveraging existing
low-fidelity datasets.