Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites

IF 4.4 2区 物理与天体物理 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY Results in Physics Pub Date : 2024-10-01 DOI:10.1016/j.rinp.2024.107978
{"title":"Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites","authors":"","doi":"10.1016/j.rinp.2024.107978","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes the utility of interpretable ensemble learning models for predicting the mechanical properties (bulk, shear and Young moduli) of ABX<sub>3</sub> perovskite compounds with the A, B, and X referring to the 3 elements that make the cubic 3-dimensional framework of the perovskite compounds. These models consist of 3 ensemble learning techniques namely CatBoost, Random Forest, and XGBoost. To expand the feature space, robust first-principles density functional theory calculations were used to generate some of the input features, namely elastic constants, density, volume per atom, and ground state energy per atom. The order of the input feature ranking that influences the machine learning (ML) model decisions was then determined. For this, we performed correlation analysis on the multi-dimensional input feature space, suppressed features with high collinearity, and selected features with limited correlation. We trained the three ensemble learning techniques on the desired vectorial input feature representation to predict the mechanical properties. Furthermore, we employed the Shapley Additive Explanations (SHAP) algorithm for analysing the intrinsic decision-making rationality of the ensemble learning models. We measured the performance in the context of the error metrics and coefficient of determination, R<sup>2</sup>. The results show that XGBoost outperforms other approaches when predicting the shear modulus or Young modulus of the perovskite compounds yielding the least error metrics and the highest R<sup>2</sup> value (0.97) in the testing phase. However, both CatBoost and Random Forest outperformed XGBoost when attempting to predict the bulk modulus in the testing phase. The deficiency of the XGBoost in predicting the bulk modulus can be ascribed to an overfitting problem which can occur when the ML model gives accurate predictions for training data but not for test data. Furthermore, the SHAP algorithm provides an insight into the order of feature importance (from highest to lowest). Additionally, we conducted a post-analysis using a holistic ranking to analyse the relative importance of the SHAP feature impact comprehension for the examined ensemble learning techniques. Our findings indicate that the elastic constants are the most important input features influencing the predictive decision of the ensemble learning models.</div></div>","PeriodicalId":21042,"journal":{"name":"Results in Physics","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Physics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211379724006636","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes the utility of interpretable ensemble learning models for predicting the mechanical properties (bulk, shear and Young moduli) of ABX3 perovskite compounds with the A, B, and X referring to the 3 elements that make the cubic 3-dimensional framework of the perovskite compounds. These models consist of 3 ensemble learning techniques namely CatBoost, Random Forest, and XGBoost. To expand the feature space, robust first-principles density functional theory calculations were used to generate some of the input features, namely elastic constants, density, volume per atom, and ground state energy per atom. The order of the input feature ranking that influences the machine learning (ML) model decisions was then determined. For this, we performed correlation analysis on the multi-dimensional input feature space, suppressed features with high collinearity, and selected features with limited correlation. We trained the three ensemble learning techniques on the desired vectorial input feature representation to predict the mechanical properties. Furthermore, we employed the Shapley Additive Explanations (SHAP) algorithm for analysing the intrinsic decision-making rationality of the ensemble learning models. We measured the performance in the context of the error metrics and coefficient of determination, R2. The results show that XGBoost outperforms other approaches when predicting the shear modulus or Young modulus of the perovskite compounds yielding the least error metrics and the highest R2 value (0.97) in the testing phase. However, both CatBoost and Random Forest outperformed XGBoost when attempting to predict the bulk modulus in the testing phase. The deficiency of the XGBoost in predicting the bulk modulus can be ascribed to an overfitting problem which can occur when the ML model gives accurate predictions for training data but not for test data. Furthermore, the SHAP algorithm provides an insight into the order of feature importance (from highest to lowest). Additionally, we conducted a post-analysis using a holistic ranking to analyse the relative importance of the SHAP feature impact comprehension for the examined ensemble learning techniques. Our findings indicate that the elastic constants are the most important input features influencing the predictive decision of the ensemble learning models.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用可解释的机器学习方法预测 ABX3 包晶的机械特性
本文提出了可解释的集合学习模型,用于预测 ABX3 包晶化合物的机械性能(体积、剪切和杨氏模量),其中 A、B 和 X 指的是构成包晶化合物立方三维框架的 3 个元素。这些模型由 3 种集合学习技术组成,即 CatBoost、Random Forest 和 XGBoost。为了扩展特征空间,使用了稳健的第一原理密度泛函理论计算来生成一些输入特征,即弹性常数、密度、每个原子的体积和每个原子的基态能量。然后确定了影响机器学习(ML)模型决策的输入特征排序。为此,我们对多维输入特征空间进行了相关性分析,抑制了具有高度共线性的特征,并选择了相关性有限的特征。我们在所需的矢量输入特征表示上训练了三种集合学习技术,以预测机械性能。此外,我们还采用了夏普利相加解释(SHAP)算法来分析集合学习模型的内在决策合理性。我们用误差指标和决定系数 R2 来衡量模型的性能。结果表明,在预测包晶化合物的剪切模量或杨氏模量时,XGBoost 的性能优于其他方法,在测试阶段产生的误差指标最小,R2 值最高(0.97)。然而,在测试阶段,当尝试预测体积模量时,CatBoost 和随机森林的表现均优于 XGBoost。XGBoost 在预测体积模量方面的不足可归因于过拟合问题,当 ML 模型对训练数据给出准确预测,但对测试数据却不能给出准确预测时,就会出现过拟合问题。此外,SHAP 算法还能帮助我们了解特征的重要性顺序(从高到低)。此外,我们还进行了一项后分析,使用整体排序来分析 SHAP 特征对所研究的集合学习技术的影响理解的相对重要性。我们的研究结果表明,弹性常数是影响集合学习模型预测决策的最重要输入特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Results in Physics
Results in Physics MATERIALS SCIENCE, MULTIDISCIPLINARYPHYSIC-PHYSICS, MULTIDISCIPLINARY
CiteScore
8.70
自引率
9.40%
发文量
754
审稿时长
50 days
期刊介绍: Results in Physics is an open access journal offering authors the opportunity to publish in all fundamental and interdisciplinary areas of physics, materials science, and applied physics. Papers of a theoretical, computational, and experimental nature are all welcome. Results in Physics accepts papers that are scientifically sound, technically correct and provide valuable new knowledge to the physics community. Topics such as three-dimensional flow and magnetohydrodynamics are not within the scope of Results in Physics. Results in Physics welcomes three types of papers: 1. Full research papers 2. Microarticles: very short papers, no longer than two pages. They may consist of a single, but well-described piece of information, such as: - Data and/or a plot plus a description - Description of a new method or instrumentation - Negative results - Concept or design study 3. Letters to the Editor: Letters discussing a recent article published in Results in Physics are welcome. These are objective, constructive, or educational critiques of papers published in Results in Physics. Accepted letters will be sent to the author of the original paper for a response. Each letter and response is published together. Letters should be received within 8 weeks of the article''s publication. They should not exceed 750 words of text and 10 references.
期刊最新文献
Large and reversible elastocaloric effect induced by low stress in a Ga-doped Ni-Mn-Ti alloy Modification of microstructural, mechanical and optical properties with carbon ion irradiation in Y2SiO5 crystals Combined etching technology for controlling surface damage precursors to improve laser damage resistance of fused silica Study the electronic, optical, and thermoelectric characteristics of cubic perovskite BXO3 (X = P, As, Sb, Bi): DFT calculations Time-dependent relaxation dynamics of remanent strain in textured PMN–PZ–PT and PMN–PIN–PT piezoceramics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1