Ensemble machine learning to accelerate industrial decarbonization: Prediction of Hansen solubility parameters for streamlined chemical solvent selection
Eslam G. Al-Sakkari , Ahmed Ragab , Mostafa Amer , Olumoye Ajao , Marzouk Benali , Daria C. Boffito , Hanane Dagdougui , Mouloud Amazouz
{"title":"Ensemble machine learning to accelerate industrial decarbonization: Prediction of Hansen solubility parameters for streamlined chemical solvent selection","authors":"Eslam G. Al-Sakkari , Ahmed Ragab , Mostafa Amer , Olumoye Ajao , Marzouk Benali , Daria C. Boffito , Hanane Dagdougui , Mouloud Amazouz","doi":"10.1016/j.dche.2024.100207","DOIUrl":null,"url":null,"abstract":"<div><div>Several processes and strategies have been developed to promote the utilization of lignin and to facilitate its market adoption across a broad spectrum of applications within the expanding lignin bioeconomy. However, the inherent variability in lignin properties, resulting from diverse feedstock sources and varied recovery and downstream processing methods, remains a significant challenge. This highlights the critical need to investigate lignin's miscibility and reactivity with polymers and solvents, as most lignin valorization pathways involve mixing, blending, or solubilization. Accurate estimation of Hansen solubility parameters (HSP) is crucial for solvent selection in several fields such as polymer science, coatings, adhesives, lignin-based biorefineries and solvent-based carbon capture. Traditional methods for predicting HSP are time-consuming and involve complex experiments, especially in applications dealing with carbon dioxide and lignin solubility. This paper introduces a novel ensemble modeling methodology based on machine learning (ML) techniques for accurate HSP prediction using Simplified Molecular Input Line Entry System (SMILES) codes as entries. The methodology integrates different ML approaches, including deep and shallow learning, to enhance prediction accuracy. Decision fusion of individual ML models is achieved through a hybrid approach combining non-learnable and learnable methods, resulting in reduced errors and enhanced accuracy. The results highlight the effectiveness of the ensemble-based methodology, which achieved 99% accuracy in predicting dispersion solubility parameters, outperforming other individual ML techniques. The proposed generic methodology, from data preprocessing to decision fusion through diverse ML algorithms, can be applied to various chemical analytics beyond HSP prediction.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"14 ","pages":"Article 100207"},"PeriodicalIF":3.0000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Chemical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772508124000693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Several processes and strategies have been developed to promote the utilization of lignin and to facilitate its market adoption across a broad spectrum of applications within the expanding lignin bioeconomy. However, the inherent variability in lignin properties, resulting from diverse feedstock sources and varied recovery and downstream processing methods, remains a significant challenge. This highlights the critical need to investigate lignin's miscibility and reactivity with polymers and solvents, as most lignin valorization pathways involve mixing, blending, or solubilization. Accurate estimation of Hansen solubility parameters (HSP) is crucial for solvent selection in several fields such as polymer science, coatings, adhesives, lignin-based biorefineries and solvent-based carbon capture. Traditional methods for predicting HSP are time-consuming and involve complex experiments, especially in applications dealing with carbon dioxide and lignin solubility. This paper introduces a novel ensemble modeling methodology based on machine learning (ML) techniques for accurate HSP prediction using Simplified Molecular Input Line Entry System (SMILES) codes as entries. The methodology integrates different ML approaches, including deep and shallow learning, to enhance prediction accuracy. Decision fusion of individual ML models is achieved through a hybrid approach combining non-learnable and learnable methods, resulting in reduced errors and enhanced accuracy. The results highlight the effectiveness of the ensemble-based methodology, which achieved 99% accuracy in predicting dispersion solubility parameters, outperforming other individual ML techniques. The proposed generic methodology, from data preprocessing to decision fusion through diverse ML algorithms, can be applied to various chemical analytics beyond HSP prediction.