Jiaqi Chen , Junqing Li , Ziyi Liu, Shitao Sun, Shijia Zhou, Dongqi Wang
{"title":"Small-dataset-orientated data-driven screening for catalytic propane activation","authors":"Jiaqi Chen , Junqing Li , Ziyi Liu, Shitao Sun, Shijia Zhou, Dongqi Wang","doi":"10.1016/j.aichem.2024.100083","DOIUrl":null,"url":null,"abstract":"<div><div>This work aims at the proper application of machine learning screening of direct propane dehydrogenation (PDH) reaction and oxidative dehydrogenation (ODH) of propane, which are two main protocols to convert propane to propylene and featured by limited available experimental data. Current studies mainly adopt trial-and-error strategy, which is time consuming and raises concerns on environment and health owing to the release of chemical waste. This motivates the introduction of data-driven research paradigm to alleviate the deficiency of the traditional trial-and-error strategy, which however relies on large quantity of high quality data. In this work, a dataset enveloping PDH and ODH data was constructed, and the performance of machine learning algorithms in the study of light alkane activation was evaluated, based on which a strategy appropriate for small dataset was proposed: for small unbalanced datasets, it is sensible to train the model by treating the dataset as a whole rather than to fuse multiple specific models based on divided smaller pieces of data. The results show that the trained models using ensemble algorithms exhibited the best predictability of propylene selectivity, i.e. CatBoost and random forest for PDH and LightGBM for ODH, respectively. Based on the optimal model, the key influencing factors in PDH and ODH were identified. This study demonstrates the proper use of data-driven strategy in the catalytic science, which can be adopted in other scientific problems that suffer from the limited available high quality data and contribute to the gain of novel understanding, e.g. the rational design and optimization of the catalytic systems.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 1","pages":"Article 100083"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949747724000411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This work aims at the proper application of machine learning screening of direct propane dehydrogenation (PDH) reaction and oxidative dehydrogenation (ODH) of propane, which are two main protocols to convert propane to propylene and featured by limited available experimental data. Current studies mainly adopt trial-and-error strategy, which is time consuming and raises concerns on environment and health owing to the release of chemical waste. This motivates the introduction of data-driven research paradigm to alleviate the deficiency of the traditional trial-and-error strategy, which however relies on large quantity of high quality data. In this work, a dataset enveloping PDH and ODH data was constructed, and the performance of machine learning algorithms in the study of light alkane activation was evaluated, based on which a strategy appropriate for small dataset was proposed: for small unbalanced datasets, it is sensible to train the model by treating the dataset as a whole rather than to fuse multiple specific models based on divided smaller pieces of data. The results show that the trained models using ensemble algorithms exhibited the best predictability of propylene selectivity, i.e. CatBoost and random forest for PDH and LightGBM for ODH, respectively. Based on the optimal model, the key influencing factors in PDH and ODH were identified. This study demonstrates the proper use of data-driven strategy in the catalytic science, which can be adopted in other scientific problems that suffer from the limited available high quality data and contribute to the gain of novel understanding, e.g. the rational design and optimization of the catalytic systems.