Pub Date : 2025-12-08DOI: 10.1016/j.dche.2025.100283
Tossapon Katongtung , Nattawut Khuenkaeo , Yuttana Mona , Pana Suttakul , James C. Moran , Korrakot Y. Tippayawong , Nakorn Tippayawong
Dimensionality reduction plays a critical role in efficiently managing large and complex datasets in machine learning (ML) applications. This study presents an innovative integration of principal component analysis (PCA) and extreme gradient boosting (XGB) to model the hydrothermal carbonization (HTC) process. PCA effectively reduced the feature space from 18 to 9 principal components with minimal impact on model accuracy (R² decreased slightly from 0.8900 to 0.8480), significantly simplifying the model complexity. To enhance interpretability, one- and two-dimensional partial dependence plots (PDP) were employed, revealing key features and their interactions influencing HTC outcomes. This combined approach not only improves predictive performance but also provides meaningful insights into the underlying process variables, addressing common challenges of ML opacity. While the model demonstrates strong predictive capability, further experimental validation and extension to diverse biomass types are recommended to confirm practical applicability and enhance versatility. The proposed methodology offers a robust, interpretable, and computationally efficient framework for optimizing HTC and can guide future research involving high-dimensional datasets.
{"title":"Data driven prediction of hydrochar yields from biomass hydrothermal carbonization using extreme gradient boosting algorithm with principal component analysis","authors":"Tossapon Katongtung , Nattawut Khuenkaeo , Yuttana Mona , Pana Suttakul , James C. Moran , Korrakot Y. Tippayawong , Nakorn Tippayawong","doi":"10.1016/j.dche.2025.100283","DOIUrl":"10.1016/j.dche.2025.100283","url":null,"abstract":"<div><div>Dimensionality reduction plays a critical role in efficiently managing large and complex datasets in machine learning (ML) applications. This study presents an innovative integration of principal component analysis (PCA) and extreme gradient boosting (XGB) to model the hydrothermal carbonization (HTC) process. PCA effectively reduced the feature space from 18 to 9 principal components with minimal impact on model accuracy (R² decreased slightly from 0.8900 to 0.8480), significantly simplifying the model complexity. To enhance interpretability, one- and two-dimensional partial dependence plots (PDP) were employed, revealing key features and their interactions influencing HTC outcomes. This combined approach not only improves predictive performance but also provides meaningful insights into the underlying process variables, addressing common challenges of ML opacity. While the model demonstrates strong predictive capability, further experimental validation and extension to diverse biomass types are recommended to confirm practical applicability and enhance versatility. The proposed methodology offers a robust, interpretable, and computationally efficient framework for optimizing HTC and can guide future research involving high-dimensional datasets.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"18 ","pages":"Article 100283"},"PeriodicalIF":4.1,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Late-stage development of complex chemical processes presents significant challenges due to the high dimensionality and interactions of operating parameters. This complexity renders traditional factorial experimental designs impractical. Consequently, there is often a default reliance on suboptimal legacy technologies, which can lead to reduced overall performance and a larger environmental footprint. This work introduces a novel integrated methodology for combined process and product attribute screening specifically designed to overcome these limitations. The approach strategically integrates expert knowledge, high-fidelity first-principle modeling, and data mining techniques to accelerate the generation of critical process understanding. This supports the confident adoption of sustainable high-performance manufacturing routes. The sequential framework begins with expert knowledge to define promising technological pathways, which are then modeled using first-principle approaches, potentially enhanced by contemporary Artificial Intelligence (AI) techniques. Afterward, extensive parametric optimizations are performed, generating rich synthetic datasets. These data are then subjected to data mining algorithms for pattern recognition, identification of different clusters of the operational regime, and estimation of key product properties. The effectiveness of this methodology is demonstrated through a challenging case study that focuses on the crystallization of conglomerates, which combines deracemization and particle formation, steps traditionally performed sequentially with associated inefficiencies. Our analysis reveals that optimal operations form 12 distinct clusters within which the expected product properties can vary considerably. A key finding is that incorporating data from a strategically designed preliminary experiment enables the exclusion of difficult-to-measure material-specific parameters and enhances the cluster classification and product property estimation.
{"title":"Early-stage chemical process screening through hybrid modeling: Introduction and case study of a reaction–crystallization process","authors":"Diana Wiederschitz , Edith-Alice Kovacs , Botond Szilagyi","doi":"10.1016/j.dche.2025.100280","DOIUrl":"10.1016/j.dche.2025.100280","url":null,"abstract":"<div><div>Late-stage development of complex chemical processes presents significant challenges due to the high dimensionality and interactions of operating parameters. This complexity renders traditional factorial experimental designs impractical. Consequently, there is often a default reliance on suboptimal legacy technologies, which can lead to reduced overall performance and a larger environmental footprint. This work introduces a novel integrated methodology for combined process and product attribute screening specifically designed to overcome these limitations. The approach strategically integrates expert knowledge, high-fidelity first-principle modeling, and data mining techniques to accelerate the generation of critical process understanding. This supports the confident adoption of sustainable high-performance manufacturing routes. The sequential framework begins with expert knowledge to define promising technological pathways, which are then modeled using first-principle approaches, potentially enhanced by contemporary Artificial Intelligence (AI) techniques. Afterward, extensive parametric optimizations are performed, generating rich synthetic datasets. These data are then subjected to data mining algorithms for pattern recognition, identification of different clusters of the operational regime, and estimation of key product properties. The effectiveness of this methodology is demonstrated through a challenging case study that focuses on the crystallization of conglomerates, which combines deracemization and particle formation, steps traditionally performed sequentially with associated inefficiencies. Our analysis reveals that optimal operations form 12 distinct clusters within which the expected product properties can vary considerably. A key finding is that incorporating data from a strategically designed preliminary experiment enables the exclusion of difficult-to-measure material-specific parameters and enhances the cluster classification and product property estimation.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"18 ","pages":"Article 100280"},"PeriodicalIF":4.1,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1016/j.dche.2025.100279
Shilpa Narasimhan , Nael H. El-Farra , Matthew J. Ellis
Control-enabled cyberattack detection approaches are necessary for enhancing the cybersecurity of process control systems (PCSs), as evidenced by recent successful cyberattacks against these systems. One type of cyberattack is false data injection attacks (FDIAs), which manipulate data over sensor-controller and/or controller–actuator communication links. This work presents an active detection strategy based on control mode switching, where the control parameters and/or the set-point are adjusted to induce perturbations that reveal stealthy FDIAs which would otherwise go undetected. To guarantee attack detection, the perturbations introduced by the detection method must be “attack-revealing”, a concept formally defined using reachability analysis in this work. Building on this foundation and considering a specific class of FDIAs, a screening algorithm is developed for selecting control modes that guarantee attack-revealing perturbations in the presence of an attack. A theoretical result is established, identifying control modes incapable of guaranteeing attack detection for a subset of these attacks—specifically, non-bias adding attacks, which do not cause a steady-state offset. This result simplifies the screening process by reducing the candidate control mode set and ensuring that only effective control modes are considered. The applicability of the screening algorithm is demonstrated for several FDIAs, including: (1) multiplicative attacks, (2) non-bias adding multiplicative attacks, and (3) replay attacks, where historic process data is injected into communication channels. The simulation results on an illustrative process validate the effectiveness of the modified screening algorithm and the active detection method in detecting non-biased additive and multiplicative replay attacks.
{"title":"Control mode switching for guaranteed detection of false data injection attacks on process control systems","authors":"Shilpa Narasimhan , Nael H. El-Farra , Matthew J. Ellis","doi":"10.1016/j.dche.2025.100279","DOIUrl":"10.1016/j.dche.2025.100279","url":null,"abstract":"<div><div>Control-enabled cyberattack detection approaches are necessary for enhancing the cybersecurity of process control systems (PCSs), as evidenced by recent successful cyberattacks against these systems. One type of cyberattack is false data injection attacks (FDIAs), which manipulate data over sensor-controller and/or controller–actuator communication links. This work presents an active detection strategy based on control mode switching, where the control parameters and/or the set-point are adjusted to induce perturbations that reveal stealthy FDIAs which would otherwise go undetected. To guarantee attack detection, the perturbations introduced by the detection method must be “attack-revealing”, a concept formally defined using reachability analysis in this work. Building on this foundation and considering a specific class of FDIAs, a screening algorithm is developed for selecting control modes that guarantee attack-revealing perturbations in the presence of an attack. A theoretical result is established, identifying control modes incapable of guaranteeing attack detection for a subset of these attacks—specifically, non-bias adding attacks, which do not cause a steady-state offset. This result simplifies the screening process by reducing the candidate control mode set and ensuring that only effective control modes are considered. The applicability of the screening algorithm is demonstrated for several FDIAs, including: (1) multiplicative attacks, (2) non-bias adding multiplicative attacks, and (3) replay attacks, where historic process data is injected into communication channels. The simulation results on an illustrative process validate the effectiveness of the modified screening algorithm and the active detection method in detecting non-biased additive and multiplicative replay attacks.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"18 ","pages":"Article 100279"},"PeriodicalIF":4.1,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145665348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.dche.2025.100277
Arthur Khodaverdian , Xiaodong Cui , Panagiotis D. Christofides
This work explores the implementation of reinforcement learning (RL)-based approaches to replace model predictive control (MPC) in cases where practical implementations of MPC are infeasible due to excessive computation times. Specifically, with the use of externally enforced stability guarantees, an RL-based controller that is trained to optimize the same cost function as the MPC with a long horizon that achieves the desirable closed-loop performance can serve as a potentially more appealing real-time option as opposed to using the same MPC with a shorter horizon. A benchmark nonlinear chemical process model is used to demonstrate the feasibility of this RL-based framework that simultaneously guarantees stability and enables improvements in computational efficiency and potential control quality of the closed-loop system. To explore the influence of the RL training method, two RL algorithms are explored, with one imitation learning method used as a reference.
{"title":"Utilizing reinforcement learning in feedback control of nonlinear processes with stability guarantees","authors":"Arthur Khodaverdian , Xiaodong Cui , Panagiotis D. Christofides","doi":"10.1016/j.dche.2025.100277","DOIUrl":"10.1016/j.dche.2025.100277","url":null,"abstract":"<div><div>This work explores the implementation of reinforcement learning (RL)-based approaches to replace model predictive control (MPC) in cases where practical implementations of MPC are infeasible due to excessive computation times. Specifically, with the use of externally enforced stability guarantees, an RL-based controller that is trained to optimize the same cost function as the MPC with a long horizon that achieves the desirable closed-loop performance can serve as a potentially more appealing real-time option as opposed to using the same MPC with a shorter horizon. A benchmark nonlinear chemical process model is used to demonstrate the feasibility of this RL-based framework that simultaneously guarantees stability and enables improvements in computational efficiency and potential control quality of the closed-loop system. To explore the influence of the RL training method, two RL algorithms are explored, with one imitation learning method used as a reference.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100277"},"PeriodicalIF":4.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.dche.2025.100278
José Pedreira , José Pinto , Daniel Gonçalves , Pedro Barahona , Rui Oliveira , Rafael S. Costa
Hybrid modeling is gaining prominence in various industrial sectors because it offers a flexible balance between mechanistic and data-driven modeling. However, the adoption of such hybrid modeling techniques has been rather limited. Only few expert researchers using in-house tools have technical background and skills to develop such hybrid models worldwide. Additionally, freely available and user-friendly software tools for developing hybrid models in bioprocesses and biological systems are lacking.
To address these gaps, we developed HYBpy. HYBpy is a user-friendly web-based framework based on a generalized step-by-step pipeline for quick and easy generation/training of hybrid models compliant with current file formats. We demonstrated the HYBpy functionalities using two literature case studies in the biological engineering domain. HYBpy is expected to greatly facilitate the usage of hybrid modeling, making these approaches accessible for the nonexpert community.
Availability: HYBpy and two case examples can be accessed online at www.hybpy.com. Although HYBpy is offered as a web-based tool, it can also be installed locally as described in the GitHub repository instructions. The source code is hosted and publicly available on GitHub at https://github.com/joko1712/HYBpy under the GNU General Public License v3.0.
{"title":"HYBpy: A web-based framework for hybrid modeling of biological systems","authors":"José Pedreira , José Pinto , Daniel Gonçalves , Pedro Barahona , Rui Oliveira , Rafael S. Costa","doi":"10.1016/j.dche.2025.100278","DOIUrl":"10.1016/j.dche.2025.100278","url":null,"abstract":"<div><div>Hybrid modeling is gaining prominence in various industrial sectors because it offers a flexible balance between mechanistic and data-driven modeling. However, the adoption of such hybrid modeling techniques has been rather limited. Only few expert researchers using in-house tools have technical background and skills to develop such hybrid models worldwide. Additionally, freely available and user-friendly software tools for developing hybrid models in bioprocesses and biological systems are lacking.</div><div>To address these gaps, we developed HYBpy. HYBpy is a user-friendly web-based framework based on a generalized step-by-step pipeline for quick and easy generation/training of hybrid models compliant with current file formats. We demonstrated the HYBpy functionalities using two literature case studies in the biological engineering domain. HYBpy is expected to greatly facilitate the usage of hybrid modeling, making these approaches accessible for the nonexpert community.</div><div>Availability: HYBpy and two case examples can be accessed online at <span><span>www.hybpy.com</span><svg><path></path></svg></span>. Although HYBpy is offered as a web-based tool, it can also be installed locally as described in the GitHub repository instructions. The source code is hosted and publicly available on GitHub at <span><span>https://github.com/joko1712/HYBpy</span><svg><path></path></svg></span> under the GNU General Public License v3.0.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100278"},"PeriodicalIF":4.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.dche.2025.100276
Z. Tabrizi , E. Barbera , W.R. Leal da Silva , F. Bezzo
Mathematical modelling plays a critical role in the design, optimisation, and control of dynamic systems in the process industry. While mechanistic models offer strong explanatory and predictive power, their effectiveness depends on informed model selection and precise parameter calibration. Model-based design of experiments (MBDoE) provides a framework for addressing these challenges by designing experiments that accelerate model discrimination and parameter precision tasks. However, its practical application is frequently constrained by fragmented digital tools that lack integration and make MBDoE implementation a task for expert users. To address that – thus supporting the widespread use of MBDoE – MIDDoE, a modular and user-friendly Python-based framework centred on MBDoE is introduced. MIDDoE supports both model discrimination and parameter precision design strategies, incorporating physical constraints and non-convex design spaces. To provide a comprehensive MBDoE digital tool, the framework integrates numerical techniques such as Global Sensitivity Analysis, Estimability Analysis, parameter estimation, uncertainty analysis, and model validation. Its architecture decouples simulation from analysis, enabling compatibility with both built-in and external simulators, which allows MIDDoE to be applied across different systems. MIDDoE practical application is demonstrated through two case studies in bioprocess and pharmaceutical systems for model discrimination and parameter precision tasks.
{"title":"MIDDoE: An MBDoE Python package for model identification, discrimination, and calibration","authors":"Z. Tabrizi , E. Barbera , W.R. Leal da Silva , F. Bezzo","doi":"10.1016/j.dche.2025.100276","DOIUrl":"10.1016/j.dche.2025.100276","url":null,"abstract":"<div><div>Mathematical modelling plays a critical role in the design, optimisation, and control of dynamic systems in the process industry. While mechanistic models offer strong explanatory and predictive power, their effectiveness depends on informed model selection and precise parameter calibration. Model-based design of experiments (MBDoE) provides a framework for addressing these challenges by designing experiments that accelerate model discrimination and parameter precision tasks. However, its practical application is frequently constrained by fragmented digital tools that lack integration and make MBDoE implementation a task for expert users. To address that – thus supporting the widespread use of MBDoE – <em>MIDDoE</em>, a modular and user-friendly <em>Python</em>-based framework centred on MBDoE is introduced. <em>MIDDoE</em> supports both model discrimination and parameter precision design strategies, incorporating physical constraints and non-convex design spaces. To provide a comprehensive MBDoE digital tool, the framework integrates numerical techniques such as Global Sensitivity Analysis, Estimability Analysis, parameter estimation, uncertainty analysis, and model validation. Its architecture decouples simulation from analysis, enabling compatibility with both built-in and external simulators, which allows <em>MIDDoE</em> to be applied across different systems. <em>MIDDoE</em> practical application is demonstrated through two case studies in bioprocess and pharmaceutical systems for model discrimination and parameter precision tasks.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100276"},"PeriodicalIF":4.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145623996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1016/j.dche.2025.100274
Nian Ran , Fayez M. Al-Alweet , Richard Allmendinger , Ahmad Almakhlafi
Accurate classification of flow patterns in multiphase systems is pivotal for optimizing fluid transport and enhancing overall system performance. Conventional methods—such as visual inspection, standard video analysis, and high-speed imaging—remain widely used in industrial and laboratory settings. However, these approaches are often constrained by subjective interpretation, limited applicability to non-transparent pipelines, and inconsistent performance under varying operating conditions. To overcome these limitations, this study introduces a novel framework that integrates capacitance sensing with Artificial Intelligence (AI)-driven classification. The proposed methodology employs a one-dimensional Squeeze-and-Excitation Network (1D SENet) to extract and interpret time-series features from raw capacitance signals. Experimental validation demonstrates robust classification accuracies, achieving over 85 % on in-distribution datasets and 71 % on out-of-distribution scenarios—substantially outperforming traditional techniques. These findings underscore the enhanced generalization and reliability of the proposed system. This work establishes a scalable foundation for real-time flow regime monitoring and predictive analytics, offering transformative potential for intelligent fluid management in complex industrial environments.
{"title":"Automated flow pattern classification in multiphase systems using artificial intelligence and capacitance sensing techniques","authors":"Nian Ran , Fayez M. Al-Alweet , Richard Allmendinger , Ahmad Almakhlafi","doi":"10.1016/j.dche.2025.100274","DOIUrl":"10.1016/j.dche.2025.100274","url":null,"abstract":"<div><div>Accurate classification of flow patterns in multiphase systems is pivotal for optimizing fluid transport and enhancing overall system performance. Conventional methods—such as visual inspection, standard video analysis, and high-speed imaging—remain widely used in industrial and laboratory settings. However, these approaches are often constrained by subjective interpretation, limited applicability to non-transparent pipelines, and inconsistent performance under varying operating conditions. To overcome these limitations, this study introduces a novel framework that integrates capacitance sensing with Artificial Intelligence (AI)-driven classification. The proposed methodology employs a one-dimensional Squeeze-and-Excitation Network (1D SENet) to extract and interpret time-series features from raw capacitance signals. Experimental validation demonstrates robust classification accuracies, achieving over 85 % on in-distribution datasets and 71 % on out-of-distribution scenarios—substantially outperforming traditional techniques. These findings underscore the enhanced generalization and reliability of the proposed system. This work establishes a scalable foundation for real-time flow regime monitoring and predictive analytics, offering transformative potential for intelligent fluid management in complex industrial environments.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100274"},"PeriodicalIF":4.1,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145578596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of hydrogen into underground storage systems is pivotal for large-scale energy management, often involving blends with methane to leverage existing infrastructure. Accurate viscosity prediction of hydrogen – methane blends under subsurface conditions is essential for optimizing flow assurance and operational safety. Accordingly, this study employs three data-driven models, namely Genetic Expression Programming (GEP), Group Method of Data Handling (GMDH), and Multi-Gene Genetic Programming (MGGP), to predict the viscosity of hydrogen – methane mixtures for transportation and underground storage applications. A comprehensive dataset of 313 experimentally measured values from the literature were utilized to develop and validate the established correlations. The MGGP paradigm emerged as the top performer, achieving a root mean square error (RMSE) of 0.4054 and an R2 value of 0.9940, outperforming both GEP and GMDH, as well as prior predictive models. The consistency of the dataset was confirmed using the Leverage approach, ensuring robust predictions. In addition, the Shapley Additive Explanations technique revealed key factors influencing the viscosity predictions, enhancing the interpretability of the best-performing correlation. Furthermore, comparative trend analysis demonstrated the MGGP correlation's superior accuracy and robustness across varying blend compositions and operational conditions. These findings offer a reliable and simple-to-use predictive correlation for engineers and researchers designing hydrogen transport and storage systems, supporting efficient energy storage and the transition to a low-carbon economy.
{"title":"Predicting the viscosity of hydrogen – methane blends at high pressure for hydrogen transportation and geo-storage: Integration of robust white-box machine learning frameworks","authors":"Saad Alatefi , Mohamed Riad Youcefi , Menad Nait Amar , Hakim Djema","doi":"10.1016/j.dche.2025.100273","DOIUrl":"10.1016/j.dche.2025.100273","url":null,"abstract":"<div><div>The integration of hydrogen into underground storage systems is pivotal for large-scale energy management, often involving blends with methane to leverage existing infrastructure. Accurate viscosity prediction of hydrogen – methane blends under subsurface conditions is essential for optimizing flow assurance and operational safety. Accordingly, this study employs three data-driven models, namely Genetic Expression Programming (GEP), Group Method of Data Handling (GMDH), and Multi-Gene Genetic Programming (MGGP), to predict the viscosity of hydrogen – methane mixtures for transportation and underground storage applications. A comprehensive dataset of 313 experimentally measured values from the literature were utilized to develop and validate the established correlations. The MGGP paradigm emerged as the top performer, achieving a root mean square error (RMSE) of 0.4054 and an R<sup>2</sup> value of 0.9940, outperforming both GEP and GMDH, as well as prior predictive models. The consistency of the dataset was confirmed using the Leverage approach, ensuring robust predictions. In addition, the Shapley Additive Explanations technique revealed key factors influencing the viscosity predictions, enhancing the interpretability of the best-performing correlation. Furthermore, comparative trend analysis demonstrated the MGGP correlation's superior accuracy and robustness across varying blend compositions and operational conditions. These findings offer a reliable and simple-to-use predictive correlation for engineers and researchers designing hydrogen transport and storage systems, supporting efficient energy storage and the transition to a low-carbon economy.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100273"},"PeriodicalIF":4.1,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145473898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-27DOI: 10.1016/j.dche.2025.100272
Haoran Ji, Lena Fuhrmann, Juan Fernando Meza Gonzalez, Frank Rhein
This study presents a robust, parallelized optimization framework for kernel parameter identification that is adaptable to any population balance equation (PBE) formulation and process type. The framework addresses the challenge of incomplete 2D particle size distribution (PSD) measurements in multi-material systems by combining a reduced 2D PSD with complementary 1D datasets. The framework was validated by using noisy synthetic PSD data and evaluating both the error in PSD and kernel values across eight kernel parameters. Hyperparameter and sensitivity analyses provided configuration recommendations and insights into the influence of individual parameters, thus guiding kernel model selection. Incorporating prior knowledge of one kernel parameter (e.g., through multi-scale simulations) mitigated non-unique solutions and enhanced noise tolerance, ultimately improving the framework’s robustness and reliability. A case study based on experimental data from a dispersion process demonstrated the framework’s flexibility and practical relevance.
{"title":"Optimization-based framework for kernel parameter identification in multi-material population balance models","authors":"Haoran Ji, Lena Fuhrmann, Juan Fernando Meza Gonzalez, Frank Rhein","doi":"10.1016/j.dche.2025.100272","DOIUrl":"10.1016/j.dche.2025.100272","url":null,"abstract":"<div><div>This study presents a robust, parallelized optimization framework for kernel parameter identification that is adaptable to any population balance equation (PBE) formulation and process type. The framework addresses the challenge of incomplete 2D particle size distribution (PSD) measurements in multi-material systems by combining a reduced 2D PSD with complementary 1D datasets. The framework was validated by using noisy synthetic PSD data and evaluating both the error in PSD and kernel values across eight kernel parameters. Hyperparameter and sensitivity analyses provided configuration recommendations and insights into the influence of individual parameters, thus guiding kernel model selection. Incorporating prior knowledge of one kernel parameter (e.g., through multi-scale simulations) mitigated non-unique solutions and enhanced noise tolerance, ultimately improving the framework’s robustness and reliability. A case study based on experimental data from a dispersion process demonstrated the framework’s flexibility and practical relevance.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100272"},"PeriodicalIF":4.1,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-24DOI: 10.1016/j.dche.2025.100271
Pál Péter Hanzelik , Szilveszter Gergely , János Abonyi , Alex Kummer
Developing accurate industrial prediction models for complex industrial and geological applications remains a significant challenge, particularly when relying on limited and disparate spectroscopic data. Traditional data fusion methods often fall short in effectively integrating complementary information across different spectral sources, limiting predictive performance. Complex-level ensemble fusion (CLF) is presented as a two-layer chemometric algorithm that jointly selects variables from concatenated mid-infrared (MIR) and Raman spectra with a genetic algorithm, projects them with partial least squares and stacks the latent variables into an XGBoost regressor, thereby capturing feature- and model-level complementarities in a single workflow. When benchmarked against single-source models and classical low-, mid-, and high-level data-fusion schemes, the CLF technique consistently demonstrated significantly improved predictive accuracy. Evaluated on paired Mid-Infrared (MIR) and Raman datasets from industrial lubricant additives and RRUFF minerals, CLF robustly outperformed established methodologies by effectively leveraging complementary spectral information. Mid-level fusion yielded no improvement, underscoring the need for supervised integration. These results constitute the first evidence that a stacked, complex-level scheme can surpass all established fusion levels on real-world spectroscopic regressions comprising fewer than one hundred samples and provide a transferable recipe for building more accurate and resilient soft sensors in quality-control and geochemical applications.
{"title":"Data fusion of spectroscopic data for enhancing machine learning model performance","authors":"Pál Péter Hanzelik , Szilveszter Gergely , János Abonyi , Alex Kummer","doi":"10.1016/j.dche.2025.100271","DOIUrl":"10.1016/j.dche.2025.100271","url":null,"abstract":"<div><div>Developing accurate industrial prediction models for complex industrial and geological applications remains a significant challenge, particularly when relying on limited and disparate spectroscopic data. Traditional data fusion methods often fall short in effectively integrating complementary information across different spectral sources, limiting predictive performance. Complex-level ensemble fusion (CLF) is presented as a two-layer chemometric algorithm that jointly selects variables from concatenated mid-infrared (MIR) and Raman spectra with a genetic algorithm, projects them with partial least squares and stacks the latent variables into an XGBoost regressor, thereby capturing feature- and model-level complementarities in a single workflow. When benchmarked against single-source models and classical low-, mid-, and high-level data-fusion schemes, the CLF technique consistently demonstrated significantly improved predictive accuracy. Evaluated on paired Mid-Infrared (MIR) and Raman datasets from industrial lubricant additives and RRUFF minerals, CLF robustly outperformed established methodologies by effectively leveraging complementary spectral information. Mid-level fusion yielded no improvement, underscoring the need for supervised integration. These results constitute the first evidence that a stacked, complex-level scheme can surpass all established fusion levels on real-world spectroscopic regressions comprising fewer than one hundred samples and provide a transferable recipe for building more accurate and resilient soft sensors in quality-control and geochemical applications.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100271"},"PeriodicalIF":4.1,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}