{"title":"A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalysts","authors":"Avan Kumar , Sreedevi Upadhyayula , Hariprasad Kodamana","doi":"10.1016/j.dche.2023.100109","DOIUrl":null,"url":null,"abstract":"<div><p>A recent trend in chemical synthesis is photo-catalysis, which uses photo-active catalyst materials that are semiconductor materials. A well-known electronic property of semiconducting materials is the band gap. A photo-catalyst’s desired band gap range is between 1.5 eV and 6.2 eV. A rational design and synthesis of photo-active catalysts require knowledge of the band gap as an initial screening parameter. Herein, we propose an integrated deep learning-based framework to classify the photo-active catalysts and predict their band gap using compositional features. To this extent, we have utilized the dataset extracted from the “catalyst hub” site by web scraping with the help of a Python script. Extensive data cleaning and pre-processing are done to make input data amenable for training the models. Also, more valuable features are made using two methods: (a) one hot-encoding and (b) calculating the mean of the embeddings of catalysts computed by Mat2Vec, a pre-trained transformer-based model. With the help of this generated feature set, we have proposed a two-stage deep-learning framework for classification and regression tasks. In the first stage, a 2D-Convolutional Neural Net (CNN)-based classifier is used to classify whether a catalyst belongs to the photo-active catalyst class. After the first stage screening, in the second stage, we use a 1D-VGG-based gradient boosting framework to predict the band gap of the photo-active catalyst only using compositional features as inputs. 2D-CNN for the classification task has an accuracy of 0.903 and 0.886 for the train and test datasets, respectively. Further, the proposed integrated model that uses 1D-Convolutional layers of VGG followed by the XGBoostRegressor has a test <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> of 0.750, much higher than baseline models reported in the literature.</p></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"8 ","pages":"Article 100109"},"PeriodicalIF":3.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Chemical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772508123000273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 2
Abstract
A recent trend in chemical synthesis is photo-catalysis, which uses photo-active catalyst materials that are semiconductor materials. A well-known electronic property of semiconducting materials is the band gap. A photo-catalyst’s desired band gap range is between 1.5 eV and 6.2 eV. A rational design and synthesis of photo-active catalysts require knowledge of the band gap as an initial screening parameter. Herein, we propose an integrated deep learning-based framework to classify the photo-active catalysts and predict their band gap using compositional features. To this extent, we have utilized the dataset extracted from the “catalyst hub” site by web scraping with the help of a Python script. Extensive data cleaning and pre-processing are done to make input data amenable for training the models. Also, more valuable features are made using two methods: (a) one hot-encoding and (b) calculating the mean of the embeddings of catalysts computed by Mat2Vec, a pre-trained transformer-based model. With the help of this generated feature set, we have proposed a two-stage deep-learning framework for classification and regression tasks. In the first stage, a 2D-Convolutional Neural Net (CNN)-based classifier is used to classify whether a catalyst belongs to the photo-active catalyst class. After the first stage screening, in the second stage, we use a 1D-VGG-based gradient boosting framework to predict the band gap of the photo-active catalyst only using compositional features as inputs. 2D-CNN for the classification task has an accuracy of 0.903 and 0.886 for the train and test datasets, respectively. Further, the proposed integrated model that uses 1D-Convolutional layers of VGG followed by the XGBoostRegressor has a test of 0.750, much higher than baseline models reported in the literature.