{"title":"A Multimodal Learning Model based on a QSPR approach for the estimation of RON, MON and CN, for any C, H, O hydrocarbons","authors":"","doi":"10.1016/j.fuel.2024.133438","DOIUrl":null,"url":null,"abstract":"<div><div>With the increasing demand for alternative fuels and the use of biomass as feedstock for fuel production, a wider range of oxygenated hydrocarbons as fuel additives needs to be considered. Consequently, the development of robust methods for predicting criteria such as Research Octane Number (RON), Motor Octane Number (MON), and Cetane Number (CN) will play a crucial role in characterizing novel fuels.</div><div>In this paper, we propose a robust deep-learning model based on a Quantitative Structure-Property Relationship (QSPR) approach for estimating RON, MON, and CN of any C, H, and O molecules. We developed a multimodal learning model that combines two types of data, using an Artificial Neural Network (ANN) as the foundation. The Mordred algorithm was used to determine 457 descriptors to characterize hydrocarbons. These numerical values represent the first type of data considered in this study.</div><div>To account for the effects of mesomerism or chirality in molecules, the InChIKey notation was used. This 27-character notation represents the second type of data, treated as text data. To encode these textual variables into numeric data, we employed a Word Embedding method.</div><div>The predictions from the final model were successfully tested against a large set of experimental data and compared with those from five recent learning models—GNN, ANN, GPR, DLMO, and PIGNN—found in the literature. The GNN (Graph Neural Networks) model relies on the molecular architecture, the ANN (Artificial Neural Networks) model is based on a limited number of chemical groups, while the GPR (Gaussian Process Regression) model is primarily based on Joback groups. The two most recent methodologies, DLMO (Deep Learning Mixing Operator) and PIGNN (Physics-Informed Graph Neural Network), utilize more sophisticated algorithms.</div><div>Full comparisons and multiple tests demonstrate the very robust and predictive capabilities of our newly proposed multimodal learning model. The prediction tool is available via a web page at <span><span>http://ehlcathol.eu/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":325,"journal":{"name":"Fuel","volume":null,"pages":null},"PeriodicalIF":6.7000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fuel","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016236124025870","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0
Abstract
With the increasing demand for alternative fuels and the use of biomass as feedstock for fuel production, a wider range of oxygenated hydrocarbons as fuel additives needs to be considered. Consequently, the development of robust methods for predicting criteria such as Research Octane Number (RON), Motor Octane Number (MON), and Cetane Number (CN) will play a crucial role in characterizing novel fuels.
In this paper, we propose a robust deep-learning model based on a Quantitative Structure-Property Relationship (QSPR) approach for estimating RON, MON, and CN of any C, H, and O molecules. We developed a multimodal learning model that combines two types of data, using an Artificial Neural Network (ANN) as the foundation. The Mordred algorithm was used to determine 457 descriptors to characterize hydrocarbons. These numerical values represent the first type of data considered in this study.
To account for the effects of mesomerism or chirality in molecules, the InChIKey notation was used. This 27-character notation represents the second type of data, treated as text data. To encode these textual variables into numeric data, we employed a Word Embedding method.
The predictions from the final model were successfully tested against a large set of experimental data and compared with those from five recent learning models—GNN, ANN, GPR, DLMO, and PIGNN—found in the literature. The GNN (Graph Neural Networks) model relies on the molecular architecture, the ANN (Artificial Neural Networks) model is based on a limited number of chemical groups, while the GPR (Gaussian Process Regression) model is primarily based on Joback groups. The two most recent methodologies, DLMO (Deep Learning Mixing Operator) and PIGNN (Physics-Informed Graph Neural Network), utilize more sophisticated algorithms.
Full comparisons and multiple tests demonstrate the very robust and predictive capabilities of our newly proposed multimodal learning model. The prediction tool is available via a web page at http://ehlcathol.eu/.
期刊介绍:
The exploration of energy sources remains a critical matter of study. For the past nine decades, fuel has consistently held the forefront in primary research efforts within the field of energy science. This area of investigation encompasses a wide range of subjects, with a particular emphasis on emerging concerns like environmental factors and pollution.