Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites

IF 4.4 2区物理与天体物理 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY Results in Physics Pub Date : 2024-10-01 DOI:10.1016/j.rinp.2024.107978

S.B. Akinpelu , S.A. Abolade , E. Okafor , D.O. Obada , A.M. Ukpong , S. Kumar R. , J. Healy , A. Akande

{"title":"Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites","authors":"S.B. Akinpelu , S.A. Abolade , E. Okafor , D.O. Obada , A.M. Ukpong , S. Kumar R. , J. Healy , A. Akande","doi":"10.1016/j.rinp.2024.107978","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes the utility of interpretable ensemble learning models for predicting the mechanical properties (bulk, shear and Young moduli) of ABX<sub>3</sub> perovskite compounds with the A, B, and X referring to the 3 elements that make the cubic 3-dimensional framework of the perovskite compounds. These models consist of 3 ensemble learning techniques namely CatBoost, Random Forest, and XGBoost. To expand the feature space, robust first-principles density functional theory calculations were used to generate some of the input features, namely elastic constants, density, volume per atom, and ground state energy per atom. The order of the input feature ranking that influences the machine learning (ML) model decisions was then determined. For this, we performed correlation analysis on the multi-dimensional input feature space, suppressed features with high collinearity, and selected features with limited correlation. We trained the three ensemble learning techniques on the desired vectorial input feature representation to predict the mechanical properties. Furthermore, we employed the Shapley Additive Explanations (SHAP) algorithm for analysing the intrinsic decision-making rationality of the ensemble learning models. We measured the performance in the context of the error metrics and coefficient of determination, R<sup>2</sup>. The results show that XGBoost outperforms other approaches when predicting the shear modulus or Young modulus of the perovskite compounds yielding the least error metrics and the highest R<sup>2</sup> value (0.97) in the testing phase. However, both CatBoost and Random Forest outperformed XGBoost when attempting to predict the bulk modulus in the testing phase. The deficiency of the XGBoost in predicting the bulk modulus can be ascribed to an overfitting problem which can occur when the ML model gives accurate predictions for training data but not for test data. Furthermore, the SHAP algorithm provides an insight into the order of feature importance (from highest to lowest). Additionally, we conducted a post-analysis using a holistic ranking to analyse the relative importance of the SHAP feature impact comprehension for the examined ensemble learning techniques. Our findings indicate that the elastic constants are the most important input features influencing the predictive decision of the ensemble learning models.</div></div>","PeriodicalId":21042,"journal":{"name":"Results in Physics","volume":"65 ","pages":"Article 107978"},"PeriodicalIF":4.4000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Physics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211379724006636","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

This paper proposes the utility of interpretable ensemble learning models for predicting the mechanical properties (bulk, shear and Young moduli) of ABX₃ perovskite compounds with the A, B, and X referring to the 3 elements that make the cubic 3-dimensional framework of the perovskite compounds. These models consist of 3 ensemble learning techniques namely CatBoost, Random Forest, and XGBoost. To expand the feature space, robust first-principles density functional theory calculations were used to generate some of the input features, namely elastic constants, density, volume per atom, and ground state energy per atom. The order of the input feature ranking that influences the machine learning (ML) model decisions was then determined. For this, we performed correlation analysis on the multi-dimensional input feature space, suppressed features with high collinearity, and selected features with limited correlation. We trained the three ensemble learning techniques on the desired vectorial input feature representation to predict the mechanical properties. Furthermore, we employed the Shapley Additive Explanations (SHAP) algorithm for analysing the intrinsic decision-making rationality of the ensemble learning models. We measured the performance in the context of the error metrics and coefficient of determination, R². The results show that XGBoost outperforms other approaches when predicting the shear modulus or Young modulus of the perovskite compounds yielding the least error metrics and the highest R² value (0.97) in the testing phase. However, both CatBoost and Random Forest outperformed XGBoost when attempting to predict the bulk modulus in the testing phase. The deficiency of the XGBoost in predicting the bulk modulus can be ascribed to an overfitting problem which can occur when the ML model gives accurate predictions for training data but not for test data. Furthermore, the SHAP algorithm provides an insight into the order of feature importance (from highest to lowest). Additionally, we conducted a post-analysis using a holistic ranking to analyse the relative importance of the SHAP feature impact comprehension for the examined ensemble learning techniques. Our findings indicate that the elastic constants are the most important input features influencing the predictive decision of the ensemble learning models.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用可解释的机器学习方法预测 ABX3 包晶的机械特性

本文提出了可解释的集合学习模型，用于预测 ABX3 包晶化合物的机械性能（体积、剪切和杨氏模量），其中 A、B 和 X 指的是构成包晶化合物立方三维框架的 3 个元素。这些模型由 3 种集合学习技术组成，即 CatBoost、Random Forest 和 XGBoost。为了扩展特征空间，使用了稳健的第一原理密度泛函理论计算来生成一些输入特征，即弹性常数、密度、每个原子的体积和每个原子的基态能量。然后确定了影响机器学习（ML）模型决策的输入特征排序。为此，我们对多维输入特征空间进行了相关性分析，抑制了具有高度共线性的特征，并选择了相关性有限的特征。我们在所需的矢量输入特征表示上训练了三种集合学习技术，以预测机械性能。此外，我们还采用了夏普利相加解释（SHAP）算法来分析集合学习模型的内在决策合理性。我们用误差指标和决定系数 R2 来衡量模型的性能。结果表明，在预测包晶化合物的剪切模量或杨氏模量时，XGBoost 的性能优于其他方法，在测试阶段产生的误差指标最小，R2 值最高（0.97）。然而，在测试阶段，当尝试预测体积模量时，CatBoost 和随机森林的表现均优于 XGBoost。XGBoost 在预测体积模量方面的不足可归因于过拟合问题，当 ML 模型对训练数据给出准确预测，但对测试数据却不能给出准确预测时，就会出现过拟合问题。此外，SHAP 算法还能帮助我们了解特征的重要性顺序（从高到低）。此外，我们还进行了一项后分析，使用整体排序来分析 SHAP 特征对所研究的集合学习技术的影响理解的相对重要性。我们的研究结果表明，弹性常数是影响集合学习模型预测决策的最重要输入特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Results in Physics MATERIALS SCIENCE, MULTIDISCIPLINARYPHYSIC-PHYSICS, MULTIDISCIPLINARY

CiteScore

8.70

自引率

9.40%

发文量

754

审稿时长

50 days

期刊介绍： Results in Physics is an open access journal offering authors the opportunity to publish in all fundamental and interdisciplinary areas of physics, materials science, and applied physics. Papers of a theoretical, computational, and experimental nature are all welcome. Results in Physics accepts papers that are scientifically sound, technically correct and provide valuable new knowledge to the physics community. Topics such as three-dimensional flow and magnetohydrodynamics are not within the scope of Results in Physics. Results in Physics welcomes three types of papers: 1. Full research papers 2. Microarticles: very short papers, no longer than two pages. They may consist of a single, but well-described piece of information, such as: - Data and/or a plot plus a description - Description of a new method or instrumentation - Negative results - Concept or design study 3. Letters to the Editor: Letters discussing a recent article published in Results in Physics are welcome. These are objective, constructive, or educational critiques of papers published in Results in Physics. Accepted letters will be sent to the author of the original paper for a response. Each letter and response is published together. Letters should be received within 8 weeks of the article''s publication. They should not exceed 750 words of text and 10 references.