Jaesun Kim, Jisu Kim, Jaehoon Kim, Jiho Lee, Yutack Park, Youngho Kang, Seungwu Han
{"title":"Data-efficient multi-fidelity training for high-fidelity machine learning interatomic potentials","authors":"Jaesun Kim, Jisu Kim, Jaehoon Kim, Jiho Lee, Yutack Park, Youngho Kang, Seungwu Han","doi":"arxiv-2409.07947","DOIUrl":null,"url":null,"abstract":"Machine learning interatomic potentials (MLIPs) are used to estimate\npotential energy surfaces (PES) from ab initio calculations, providing near\nquantum-level accuracy with reduced computational costs. However, the high cost\nof assembling high-fidelity databases hampers the application of MLIPs to\nsystems that require high chemical accuracy. Utilizing an equivariant graph\nneural network, we present an MLIP framework that trains on multi-fidelity\ndatabases simultaneously. This approach enables the accurate learning of\nhigh-fidelity PES with minimal high-fidelity data. We test this framework on\nthe Li$_6$PS$_5$Cl and In$_x$Ga$_{1-x}$N systems. The computational results\nindicate that geometric and compositional spaces not covered by the\nhigh-fidelity meta-gradient generalized approximation (meta-GGA) database can\nbe effectively inferred from low-fidelity GGA data, thus enhancing accuracy and\nmolecular dynamics stability. We also develop a general-purpose MLIP that\nutilizes both GGA and meta-GGA data from the Materials Project, significantly\nenhancing MLIP performance for high-accuracy tasks such as predicting energies\nabove hull for crystals in general. Furthermore, we demonstrate that the\npresent multi-fidelity learning is more effective than transfer learning or\n$\\Delta$-learning an d that it can also be applied to learn higher-fidelity up\nto the coupled-cluster level. We believe this methodology holds promise for\ncreating highly accurate bespoke or universal MLIPs by effectively expanding\nthe high-fidelity dataset.","PeriodicalId":501234,"journal":{"name":"arXiv - PHYS - Materials Science","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Materials Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning interatomic potentials (MLIPs) are used to estimate
potential energy surfaces (PES) from ab initio calculations, providing near
quantum-level accuracy with reduced computational costs. However, the high cost
of assembling high-fidelity databases hampers the application of MLIPs to
systems that require high chemical accuracy. Utilizing an equivariant graph
neural network, we present an MLIP framework that trains on multi-fidelity
databases simultaneously. This approach enables the accurate learning of
high-fidelity PES with minimal high-fidelity data. We test this framework on
the Li$_6$PS$_5$Cl and In$_x$Ga$_{1-x}$N systems. The computational results
indicate that geometric and compositional spaces not covered by the
high-fidelity meta-gradient generalized approximation (meta-GGA) database can
be effectively inferred from low-fidelity GGA data, thus enhancing accuracy and
molecular dynamics stability. We also develop a general-purpose MLIP that
utilizes both GGA and meta-GGA data from the Materials Project, significantly
enhancing MLIP performance for high-accuracy tasks such as predicting energies
above hull for crystals in general. Furthermore, we demonstrate that the
present multi-fidelity learning is more effective than transfer learning or
$\Delta$-learning an d that it can also be applied to learn higher-fidelity up
to the coupled-cluster level. We believe this methodology holds promise for
creating highly accurate bespoke or universal MLIPs by effectively expanding
the high-fidelity dataset.