Taehyun Park, JunHo Song, Jinyoung Jeong, Seungpyo Kang, Joonchul Kim, Joonghee Won, Jungim Han, Kyoungmin Min
{"title":"Interpretable machine learning boosting the discovery of targeted organometallic compounds with optimal bandgap","authors":"Taehyun Park, JunHo Song, Jinyoung Jeong, Seungpyo Kang, Joonchul Kim, Joonghee Won, Jungim Han, Kyoungmin Min","doi":"10.1016/j.mtadv.2024.100520","DOIUrl":null,"url":null,"abstract":"Organometallic compounds (OMCs) have attracted tremendous attention in various fields, such as photovoltaic cell and high-k dielectric application, due to their beneficial properties. Despite their potential, the progression of OMCs into industrial applications is hindered by the limited databases available for their properties and the absence of efficient surrogate models. To address this, in this study, optimally selected feature-based surrogate models for predicting the electronic properties of OMCs are constructed via various multiscale features and extensive database. To this end, high-throughput calculation was performed to obtain electronic properties of more than 18k materials generally known as organometallics, augmenting around 12k organic materials obtained from the public open data set, OMDB-GAP1. For generating features closely related to OMCs, descriptors encapsulating the information ranging local to global, also other widely-used composition-, structure-based features (more than 3.5k in total) were employed. Among these descriptors, we identified 48 critical features that elucidates the physicochemical underpinnings of OMCs’ properties, suggesting their impact on the properties of OMCs. The light gradient boosting machine model achieved high-accuracy predictions across the entire database with just 1 % of the total descriptors, sufficiently compared to the entire sets (decreased of around 0.01 by R score and 0.01 eV by MAE). Furthermore, the efficacy of active learning process was demonstrated to find OMCs with optimal properties rapidly. As a result, expected improvement outperforms other methods by identifying 69 % of the target materials only searching 46 % of the total search space. Our constructed platform with a high-throughput calculated database can pave the way for the rapid screening of OMCs for the targeted industrial application, and suggest a comprehensive grasp of the intrinsic properties of OMCs and related compounds.","PeriodicalId":48495,"journal":{"name":"Materials Today Advances","volume":"122 1","pages":""},"PeriodicalIF":8.1000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Today Advances","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1016/j.mtadv.2024.100520","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Organometallic compounds (OMCs) have attracted tremendous attention in various fields, such as photovoltaic cell and high-k dielectric application, due to their beneficial properties. Despite their potential, the progression of OMCs into industrial applications is hindered by the limited databases available for their properties and the absence of efficient surrogate models. To address this, in this study, optimally selected feature-based surrogate models for predicting the electronic properties of OMCs are constructed via various multiscale features and extensive database. To this end, high-throughput calculation was performed to obtain electronic properties of more than 18k materials generally known as organometallics, augmenting around 12k organic materials obtained from the public open data set, OMDB-GAP1. For generating features closely related to OMCs, descriptors encapsulating the information ranging local to global, also other widely-used composition-, structure-based features (more than 3.5k in total) were employed. Among these descriptors, we identified 48 critical features that elucidates the physicochemical underpinnings of OMCs’ properties, suggesting their impact on the properties of OMCs. The light gradient boosting machine model achieved high-accuracy predictions across the entire database with just 1 % of the total descriptors, sufficiently compared to the entire sets (decreased of around 0.01 by R score and 0.01 eV by MAE). Furthermore, the efficacy of active learning process was demonstrated to find OMCs with optimal properties rapidly. As a result, expected improvement outperforms other methods by identifying 69 % of the target materials only searching 46 % of the total search space. Our constructed platform with a high-throughput calculated database can pave the way for the rapid screening of OMCs for the targeted industrial application, and suggest a comprehensive grasp of the intrinsic properties of OMCs and related compounds.
期刊介绍:
Materials Today Advances is a multi-disciplinary, open access journal that aims to connect different communities within materials science. It covers all aspects of materials science and related disciplines, including fundamental and applied research. The focus is on studies with broad impact that can cross traditional subject boundaries. The journal welcomes the submissions of articles at the forefront of materials science, advancing the field. It is part of the Materials Today family and offers authors rigorous peer review, rapid decisions, and high visibility.