Interpretable Machine Learning Model on Thermal Conductivity Using Publicly Available Datasets and Our Internal Lab Dataset

IF 7.2 2区 材料科学 Q2 CHEMISTRY, PHYSICAL Chemistry of Materials Pub Date : 2024-07-03 DOI:10.1021/acs.chemmater.4c01696
Nikhil K. Barua, Evan Hall, Yifei Cheng, Anton O. Oliynyk, Holger Kleinke
{"title":"Interpretable Machine Learning Model on Thermal Conductivity Using Publicly Available Datasets and Our Internal Lab Dataset","authors":"Nikhil K. Barua, Evan Hall, Yifei Cheng, Anton O. Oliynyk, Holger Kleinke","doi":"10.1021/acs.chemmater.4c01696","DOIUrl":null,"url":null,"abstract":"Machine learning (ML), a subdiscipline of artificial intelligence studies, has gained importance in predicting or suggesting efficient thermoelectric materials. Previous ML studies have used different literature sources or density functional theory calculations as input. In this work, we develop a ML pipeline trained with multivariable inputs on a massive public dataset of ∼200,000 data utilizing a high-performance computing cluster to predict the thermal conductivity (κ) using four test sets: three publicly available datasets and a dataset built using previously published data from our own group. By taking advantage of this massive dataset, our model presents an opportunity to further expand the understanding of the selection of features with various thermoelectric materials. Among the several supervised ML models implemented, the eXtreme Gradient Boosting algorithm (XGBoost) turned out to be the best method during the 5-fold cross-validation method, with their averaged evaluation coefficients of <i>R</i><sup>2</sup> = 0.96, root mean squared error (<i>RMSE</i>) = 0.38 W m<sup>−1</sup>K<sup>−1</sup>, and mean absolute error (<i>MAE</i>) = 0.23 W m<sup>−1</sup>K<sup>−1</sup>. Additionally, with the aid of feature selection and importance analysis, useful chemical features were chosen that ultimately led to reasonably good accuracy in the series of test sets measured as per the evaluation coefficients of <i>R</i><sup>2</sup>, <i>RMSE</i>, and <i>MAE</i>, with values ranging from 0.72 to 0.89, 0.52 to 1.08, and 0.40 to 0.66 W m<sup>−1</sup>K<sup>−1</sup>, respectively. Checking the worst outliers led to the discovery of some errors in the literature. Postmodel prediction, the SHapley Additive exPlanations (SHAP) algorithm was implemented on the XGBoost model to analyze the features that were the key drivers for the model’s decisions. Overall, the developed interpretable methodology produces the prediction of κ of a large variety of materials through the influence of chemical and physical property features. The conclusions drawn apply to the research and applications of thermoelectric and heat insulation materials.","PeriodicalId":33,"journal":{"name":"Chemistry of Materials","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemistry of Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1021/acs.chemmater.4c01696","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML), a subdiscipline of artificial intelligence studies, has gained importance in predicting or suggesting efficient thermoelectric materials. Previous ML studies have used different literature sources or density functional theory calculations as input. In this work, we develop a ML pipeline trained with multivariable inputs on a massive public dataset of ∼200,000 data utilizing a high-performance computing cluster to predict the thermal conductivity (κ) using four test sets: three publicly available datasets and a dataset built using previously published data from our own group. By taking advantage of this massive dataset, our model presents an opportunity to further expand the understanding of the selection of features with various thermoelectric materials. Among the several supervised ML models implemented, the eXtreme Gradient Boosting algorithm (XGBoost) turned out to be the best method during the 5-fold cross-validation method, with their averaged evaluation coefficients of R2 = 0.96, root mean squared error (RMSE) = 0.38 W m−1K−1, and mean absolute error (MAE) = 0.23 W m−1K−1. Additionally, with the aid of feature selection and importance analysis, useful chemical features were chosen that ultimately led to reasonably good accuracy in the series of test sets measured as per the evaluation coefficients of R2, RMSE, and MAE, with values ranging from 0.72 to 0.89, 0.52 to 1.08, and 0.40 to 0.66 W m−1K−1, respectively. Checking the worst outliers led to the discovery of some errors in the literature. Postmodel prediction, the SHapley Additive exPlanations (SHAP) algorithm was implemented on the XGBoost model to analyze the features that were the key drivers for the model’s decisions. Overall, the developed interpretable methodology produces the prediction of κ of a large variety of materials through the influence of chemical and physical property features. The conclusions drawn apply to the research and applications of thermoelectric and heat insulation materials.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用公开数据集和实验室内部数据集建立可解释的导热性机器学习模型
机器学习(ML)是人工智能研究的一个分支学科,在预测或提出高效热电材料方面的重要性日益凸显。以往的 ML 研究使用不同的文献来源或密度泛函理论计算作为输入。在这项工作中,我们利用高性能计算集群,在一个包含 200,000 个数据的海量公开数据集上开发了一个多变量输入训练的 ML 管道,使用四个测试集预测热导率 (κ):三个公开数据集和一个使用我们自己小组以前发布的数据构建的数据集。通过利用这一海量数据集,我们的模型为进一步扩展对各种热电材料特征选择的理解提供了机会。在实施的几个有监督 ML 模型中,最高梯度提升算法 (XGBoost) 被证明是 5 倍交叉验证方法中的最佳方法,其平均评估系数为 R2 = 0.96,均方根误差 (RMSE) = 0.38 W m-1K-1,平均绝对误差 (MAE) = 0.23 W m-1K-1。此外,在特征选择和重要性分析的帮助下,选择了有用的化学特征,最终在一系列测试集中获得了相当高的准确度,根据 R2、RMSE 和 MAE 的评价系数测量,其值分别为 0.72 至 0.89、0.52 至 1.08 和 0.40 至 0.66 W m-1K-1。对最差异常值的检查发现了文献中的一些错误。模型预测后,在 XGBoost 模型上实施了 SHapley Additive exPlanations(SHAP)算法,以分析作为模型决策关键驱动因素的特征。总之,所开发的可解释方法可通过化学和物理特性特征的影响预测多种材料的 κ。得出的结论适用于热电和隔热材料的研究和应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Chemistry of Materials
Chemistry of Materials 工程技术-材料科学:综合
CiteScore
14.10
自引率
5.80%
发文量
929
审稿时长
1.5 months
期刊介绍: The journal Chemistry of Materials focuses on publishing original research at the intersection of materials science and chemistry. The studies published in the journal involve chemistry as a prominent component and explore topics such as the design, synthesis, characterization, processing, understanding, and application of functional or potentially functional materials. The journal covers various areas of interest, including inorganic and organic solid-state chemistry, nanomaterials, biomaterials, thin films and polymers, and composite/hybrid materials. The journal particularly seeks papers that highlight the creation or development of innovative materials with novel optical, electrical, magnetic, catalytic, or mechanical properties. It is essential that manuscripts on these topics have a primary focus on the chemistry of materials and represent a significant advancement compared to prior research. Before external reviews are sought, submitted manuscripts undergo a review process by a minimum of two editors to ensure their appropriateness for the journal and the presence of sufficient evidence of a significant advance that will be of broad interest to the materials chemistry community.
期刊最新文献
Lanthanide Contraction Eliminates Disorder while Holding Robust Second Harmonic Generation in a Series of Polyiodates Unveiling Cellular Secrets: Illuminating Carbon Dot Lighthouses for Improved Mitochondrial Exploration Decoupling Interlayer Spacing and Cation Dipole on Exciton Binding Energy in Layered Halide Perovskites New Mn and V-rich Phosphate Fluoride Obtained by Topochemical Reaction for Na-ion Batteries Positive Electrode Br-Induced Suppression of Low-Temperature Phase Transitions in Mixed-Cation Mixed-Halide Perovskites
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1