Developing new high-entropy alloys with enhanced hardness using a hybrid machine learning approach: integrating interpretability and NSGA-II optimization
{"title":"Developing new high-entropy alloys with enhanced hardness using a hybrid machine learning approach: integrating interpretability and NSGA-II optimization","authors":"Debsundar Dey, Anik Pal, Pranjal Biyani, Pritam Mandal, Snehanshu Pal, Suchandan Das, Santanu Dey, Manojit Ghosh","doi":"10.1007/s10853-025-10729-5","DOIUrl":null,"url":null,"abstract":"<div><p>This study uses machine learning (ML) to simplify the complex and time-consuming process of predicting the hardness of high-entropy alloys (HEAs). A stacking regression model combined with a Transformed Target Regressor (TTR) is proposed, utilizing three top-performing base models such as support vector regression (SVR), LightGBM (LGBM), and random forest (RF). The model incorporates 20 key thermodynamic, mismatch, and combination parameters (physical features) along with 18 different elements to enhance generalization and account for various input feature effects, specifically to predict the hardness of HEAs. Feature selection was done in two stages using the Pearson correlation coefficient (<i>P</i><sub><i>c</i></sub>) and conditional mutual information-based feature selection (CMIFS) methods. The impact of alloy composition and physical features on hardness was analyzed with SHapley Additive exPlanations (SHAP) values and partial dependence plots (PDPs), helping to better understand the model’s predictions. The stacked model outperformed the individual models, achieving an overall <i>R</i><sup>2</sup> score of 0.88 and 0.99 for composition and physical features-based data, respectively. Additionally, the non-dominated sorting genetic algorithm II (NSGA-II) was used to optimize the hardness of the HEAs, resulting in a more than 24% increase in hardness compared to the initial data. The optimized composition of Al<sub>17.24</sub>Fe<sub>24.79</sub>Cr<sub>1.95</sub>Mo<sub>6.84</sub> Ti<sub>13.03</sub> Nb<sub>7.89</sub> Hf<sub>8.26</sub> was identified as having the highest hardness. This ML workflow serves as a general framework to optimize alloy chemical spaces and input features to achieve desired properties. Overall, this model provides interpretability and generalization through ensemble learning, offering insights for designing high hardness HEAs.</p><h3>Graphical abstract</h3>\n<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":645,"journal":{"name":"Journal of Materials Science","volume":"60 10","pages":"4820 - 4845"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Materials Science","FirstCategoryId":"88","ListUrlMain":"https://link.springer.com/article/10.1007/s10853-025-10729-5","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
This study uses machine learning (ML) to simplify the complex and time-consuming process of predicting the hardness of high-entropy alloys (HEAs). A stacking regression model combined with a Transformed Target Regressor (TTR) is proposed, utilizing three top-performing base models such as support vector regression (SVR), LightGBM (LGBM), and random forest (RF). The model incorporates 20 key thermodynamic, mismatch, and combination parameters (physical features) along with 18 different elements to enhance generalization and account for various input feature effects, specifically to predict the hardness of HEAs. Feature selection was done in two stages using the Pearson correlation coefficient (Pc) and conditional mutual information-based feature selection (CMIFS) methods. The impact of alloy composition and physical features on hardness was analyzed with SHapley Additive exPlanations (SHAP) values and partial dependence plots (PDPs), helping to better understand the model’s predictions. The stacked model outperformed the individual models, achieving an overall R2 score of 0.88 and 0.99 for composition and physical features-based data, respectively. Additionally, the non-dominated sorting genetic algorithm II (NSGA-II) was used to optimize the hardness of the HEAs, resulting in a more than 24% increase in hardness compared to the initial data. The optimized composition of Al17.24Fe24.79Cr1.95Mo6.84 Ti13.03 Nb7.89 Hf8.26 was identified as having the highest hardness. This ML workflow serves as a general framework to optimize alloy chemical spaces and input features to achieve desired properties. Overall, this model provides interpretability and generalization through ensemble learning, offering insights for designing high hardness HEAs.
期刊介绍:
The Journal of Materials Science publishes reviews, full-length papers, and short Communications recording original research results on, or techniques for studying the relationship between structure, properties, and uses of materials. The subjects are seen from international and interdisciplinary perspectives covering areas including metals, ceramics, glasses, polymers, electrical materials, composite materials, fibers, nanostructured materials, nanocomposites, and biological and biomedical materials. The Journal of Materials Science is now firmly established as the leading source of primary communication for scientists investigating the structure and properties of all engineering materials.