Nikhil K. Barua, Evan Hall, Yifei Cheng, Anton O. Oliynyk, Holger Kleinke
{"title":"Interpretable Machine Learning Model on Thermal Conductivity Using Publicly Available Datasets and Our Internal Lab Dataset","authors":"Nikhil K. Barua, Evan Hall, Yifei Cheng, Anton O. Oliynyk, Holger Kleinke","doi":"10.1021/acs.chemmater.4c01696","DOIUrl":null,"url":null,"abstract":"Machine learning (ML), a subdiscipline of artificial intelligence studies, has gained importance in predicting or suggesting efficient thermoelectric materials. Previous ML studies have used different literature sources or density functional theory calculations as input. In this work, we develop a ML pipeline trained with multivariable inputs on a massive public dataset of ∼200,000 data utilizing a high-performance computing cluster to predict the thermal conductivity (κ) using four test sets: three publicly available datasets and a dataset built using previously published data from our own group. By taking advantage of this massive dataset, our model presents an opportunity to further expand the understanding of the selection of features with various thermoelectric materials. Among the several supervised ML models implemented, the eXtreme Gradient Boosting algorithm (XGBoost) turned out to be the best method during the 5-fold cross-validation method, with their averaged evaluation coefficients of <i>R</i><sup>2</sup> = 0.96, root mean squared error (<i>RMSE</i>) = 0.38 W m<sup>−1</sup>K<sup>−1</sup>, and mean absolute error (<i>MAE</i>) = 0.23 W m<sup>−1</sup>K<sup>−1</sup>. Additionally, with the aid of feature selection and importance analysis, useful chemical features were chosen that ultimately led to reasonably good accuracy in the series of test sets measured as per the evaluation coefficients of <i>R</i><sup>2</sup>, <i>RMSE</i>, and <i>MAE</i>, with values ranging from 0.72 to 0.89, 0.52 to 1.08, and 0.40 to 0.66 W m<sup>−1</sup>K<sup>−1</sup>, respectively. Checking the worst outliers led to the discovery of some errors in the literature. Postmodel prediction, the SHapley Additive exPlanations (SHAP) algorithm was implemented on the XGBoost model to analyze the features that were the key drivers for the model’s decisions. Overall, the developed interpretable methodology produces the prediction of κ of a large variety of materials through the influence of chemical and physical property features. The conclusions drawn apply to the research and applications of thermoelectric and heat insulation materials.","PeriodicalId":33,"journal":{"name":"Chemistry of Materials","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemistry of Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1021/acs.chemmater.4c01696","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML), a subdiscipline of artificial intelligence studies, has gained importance in predicting or suggesting efficient thermoelectric materials. Previous ML studies have used different literature sources or density functional theory calculations as input. In this work, we develop a ML pipeline trained with multivariable inputs on a massive public dataset of ∼200,000 data utilizing a high-performance computing cluster to predict the thermal conductivity (κ) using four test sets: three publicly available datasets and a dataset built using previously published data from our own group. By taking advantage of this massive dataset, our model presents an opportunity to further expand the understanding of the selection of features with various thermoelectric materials. Among the several supervised ML models implemented, the eXtreme Gradient Boosting algorithm (XGBoost) turned out to be the best method during the 5-fold cross-validation method, with their averaged evaluation coefficients of R2 = 0.96, root mean squared error (RMSE) = 0.38 W m−1K−1, and mean absolute error (MAE) = 0.23 W m−1K−1. Additionally, with the aid of feature selection and importance analysis, useful chemical features were chosen that ultimately led to reasonably good accuracy in the series of test sets measured as per the evaluation coefficients of R2, RMSE, and MAE, with values ranging from 0.72 to 0.89, 0.52 to 1.08, and 0.40 to 0.66 W m−1K−1, respectively. Checking the worst outliers led to the discovery of some errors in the literature. Postmodel prediction, the SHapley Additive exPlanations (SHAP) algorithm was implemented on the XGBoost model to analyze the features that were the key drivers for the model’s decisions. Overall, the developed interpretable methodology produces the prediction of κ of a large variety of materials through the influence of chemical and physical property features. The conclusions drawn apply to the research and applications of thermoelectric and heat insulation materials.
期刊介绍:
The journal Chemistry of Materials focuses on publishing original research at the intersection of materials science and chemistry. The studies published in the journal involve chemistry as a prominent component and explore topics such as the design, synthesis, characterization, processing, understanding, and application of functional or potentially functional materials. The journal covers various areas of interest, including inorganic and organic solid-state chemistry, nanomaterials, biomaterials, thin films and polymers, and composite/hybrid materials. The journal particularly seeks papers that highlight the creation or development of innovative materials with novel optical, electrical, magnetic, catalytic, or mechanical properties. It is essential that manuscripts on these topics have a primary focus on the chemistry of materials and represent a significant advancement compared to prior research. Before external reviews are sought, submitted manuscripts undergo a review process by a minimum of two editors to ensure their appropriateness for the journal and the presence of sufficient evidence of a significant advance that will be of broad interest to the materials chemistry community.