Pub Date : 2025-12-01DOI: 10.1016/j.dche.2025.100277
Arthur Khodaverdian , Xiaodong Cui , Panagiotis D. Christofides
This work explores the implementation of reinforcement learning (RL)-based approaches to replace model predictive control (MPC) in cases where practical implementations of MPC are infeasible due to excessive computation times. Specifically, with the use of externally enforced stability guarantees, an RL-based controller that is trained to optimize the same cost function as the MPC with a long horizon that achieves the desirable closed-loop performance can serve as a potentially more appealing real-time option as opposed to using the same MPC with a shorter horizon. A benchmark nonlinear chemical process model is used to demonstrate the feasibility of this RL-based framework that simultaneously guarantees stability and enables improvements in computational efficiency and potential control quality of the closed-loop system. To explore the influence of the RL training method, two RL algorithms are explored, with one imitation learning method used as a reference.
{"title":"Utilizing reinforcement learning in feedback control of nonlinear processes with stability guarantees","authors":"Arthur Khodaverdian , Xiaodong Cui , Panagiotis D. Christofides","doi":"10.1016/j.dche.2025.100277","DOIUrl":"10.1016/j.dche.2025.100277","url":null,"abstract":"<div><div>This work explores the implementation of reinforcement learning (RL)-based approaches to replace model predictive control (MPC) in cases where practical implementations of MPC are infeasible due to excessive computation times. Specifically, with the use of externally enforced stability guarantees, an RL-based controller that is trained to optimize the same cost function as the MPC with a long horizon that achieves the desirable closed-loop performance can serve as a potentially more appealing real-time option as opposed to using the same MPC with a shorter horizon. A benchmark nonlinear chemical process model is used to demonstrate the feasibility of this RL-based framework that simultaneously guarantees stability and enables improvements in computational efficiency and potential control quality of the closed-loop system. To explore the influence of the RL training method, two RL algorithms are explored, with one imitation learning method used as a reference.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100277"},"PeriodicalIF":4.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.dche.2025.100278
José Pedreira , José Pinto , Daniel Gonçalves , Pedro Barahona , Rui Oliveira , Rafael S. Costa
Hybrid modeling is gaining prominence in various industrial sectors because it offers a flexible balance between mechanistic and data-driven modeling. However, the adoption of such hybrid modeling techniques has been rather limited. Only few expert researchers using in-house tools have technical background and skills to develop such hybrid models worldwide. Additionally, freely available and user-friendly software tools for developing hybrid models in bioprocesses and biological systems are lacking.
To address these gaps, we developed HYBpy. HYBpy is a user-friendly web-based framework based on a generalized step-by-step pipeline for quick and easy generation/training of hybrid models compliant with current file formats. We demonstrated the HYBpy functionalities using two literature case studies in the biological engineering domain. HYBpy is expected to greatly facilitate the usage of hybrid modeling, making these approaches accessible for the nonexpert community.
Availability: HYBpy and two case examples can be accessed online at www.hybpy.com. Although HYBpy is offered as a web-based tool, it can also be installed locally as described in the GitHub repository instructions. The source code is hosted and publicly available on GitHub at https://github.com/joko1712/HYBpy under the GNU General Public License v3.0.
{"title":"HYBpy: A web-based framework for hybrid modeling of biological systems","authors":"José Pedreira , José Pinto , Daniel Gonçalves , Pedro Barahona , Rui Oliveira , Rafael S. Costa","doi":"10.1016/j.dche.2025.100278","DOIUrl":"10.1016/j.dche.2025.100278","url":null,"abstract":"<div><div>Hybrid modeling is gaining prominence in various industrial sectors because it offers a flexible balance between mechanistic and data-driven modeling. However, the adoption of such hybrid modeling techniques has been rather limited. Only few expert researchers using in-house tools have technical background and skills to develop such hybrid models worldwide. Additionally, freely available and user-friendly software tools for developing hybrid models in bioprocesses and biological systems are lacking.</div><div>To address these gaps, we developed HYBpy. HYBpy is a user-friendly web-based framework based on a generalized step-by-step pipeline for quick and easy generation/training of hybrid models compliant with current file formats. We demonstrated the HYBpy functionalities using two literature case studies in the biological engineering domain. HYBpy is expected to greatly facilitate the usage of hybrid modeling, making these approaches accessible for the nonexpert community.</div><div>Availability: HYBpy and two case examples can be accessed online at <span><span>www.hybpy.com</span><svg><path></path></svg></span>. Although HYBpy is offered as a web-based tool, it can also be installed locally as described in the GitHub repository instructions. The source code is hosted and publicly available on GitHub at <span><span>https://github.com/joko1712/HYBpy</span><svg><path></path></svg></span> under the GNU General Public License v3.0.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100278"},"PeriodicalIF":4.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.dche.2025.100276
Z. Tabrizi , E. Barbera , W.R. Leal da Silva , F. Bezzo
Mathematical modelling plays a critical role in the design, optimisation, and control of dynamic systems in the process industry. While mechanistic models offer strong explanatory and predictive power, their effectiveness depends on informed model selection and precise parameter calibration. Model-based design of experiments (MBDoE) provides a framework for addressing these challenges by designing experiments that accelerate model discrimination and parameter precision tasks. However, its practical application is frequently constrained by fragmented digital tools that lack integration and make MBDoE implementation a task for expert users. To address that – thus supporting the widespread use of MBDoE – MIDDoE, a modular and user-friendly Python-based framework centred on MBDoE is introduced. MIDDoE supports both model discrimination and parameter precision design strategies, incorporating physical constraints and non-convex design spaces. To provide a comprehensive MBDoE digital tool, the framework integrates numerical techniques such as Global Sensitivity Analysis, Estimability Analysis, parameter estimation, uncertainty analysis, and model validation. Its architecture decouples simulation from analysis, enabling compatibility with both built-in and external simulators, which allows MIDDoE to be applied across different systems. MIDDoE practical application is demonstrated through two case studies in bioprocess and pharmaceutical systems for model discrimination and parameter precision tasks.
{"title":"MIDDoE: An MBDoE Python package for model identification, discrimination, and calibration","authors":"Z. Tabrizi , E. Barbera , W.R. Leal da Silva , F. Bezzo","doi":"10.1016/j.dche.2025.100276","DOIUrl":"10.1016/j.dche.2025.100276","url":null,"abstract":"<div><div>Mathematical modelling plays a critical role in the design, optimisation, and control of dynamic systems in the process industry. While mechanistic models offer strong explanatory and predictive power, their effectiveness depends on informed model selection and precise parameter calibration. Model-based design of experiments (MBDoE) provides a framework for addressing these challenges by designing experiments that accelerate model discrimination and parameter precision tasks. However, its practical application is frequently constrained by fragmented digital tools that lack integration and make MBDoE implementation a task for expert users. To address that – thus supporting the widespread use of MBDoE – <em>MIDDoE</em>, a modular and user-friendly <em>Python</em>-based framework centred on MBDoE is introduced. <em>MIDDoE</em> supports both model discrimination and parameter precision design strategies, incorporating physical constraints and non-convex design spaces. To provide a comprehensive MBDoE digital tool, the framework integrates numerical techniques such as Global Sensitivity Analysis, Estimability Analysis, parameter estimation, uncertainty analysis, and model validation. Its architecture decouples simulation from analysis, enabling compatibility with both built-in and external simulators, which allows <em>MIDDoE</em> to be applied across different systems. <em>MIDDoE</em> practical application is demonstrated through two case studies in bioprocess and pharmaceutical systems for model discrimination and parameter precision tasks.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100276"},"PeriodicalIF":4.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145623996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.dche.2025.100275
Mohd Fauzi Zanil , Zainal Ahmad , Syamsul Rizal Abd Shukor , Mohmmad Jakir Hossain Khan , Mohd Hardyianto Vai Bahrun
This study proposes the optimal development of rule bases using Type-2 fuzzy logic specifically designed for stochastic chemical control systems. The research addresses complexities and uncertainties inherent in stochastic pH neutralisation processes with Type-2 fuzzy logic as inversed hybrid model which able to provide good control action in the fuzzy rule. Comprehensive simulation and experimental-based performance evaluations, including setpoint tracking accuracy and disturbance rejection capabilities, were conducted to rigorously compare the proposed Type-2 fuzzy logic controller with traditional PID and conventional fuzzy logic controllers. Results demonstrate that the optimized Type-2 fuzzy logic controller significantly outperforms existing methods, offering faster system responses, minimized overshoot, and improved system stability. Further, robustness tests involving stochastic perturbations, such as variable flow rates of NaOH and HCl solutions and random acid injections during operational conditions, confirm the controller’s enhanced adaptability and effectiveness. The study concludes that the developed Type-2 fuzzy logic controller provides a robust, efficient, and reliable control solution constructed through simulation and validated using real experimental data, suitable for real-time (stochastic) management of complex stochastic chemical systems.
{"title":"Optimal rules base development in Type-2 fuzzy logic for stochastic chemical control system","authors":"Mohd Fauzi Zanil , Zainal Ahmad , Syamsul Rizal Abd Shukor , Mohmmad Jakir Hossain Khan , Mohd Hardyianto Vai Bahrun","doi":"10.1016/j.dche.2025.100275","DOIUrl":"10.1016/j.dche.2025.100275","url":null,"abstract":"<div><div>This study proposes the optimal development of rule bases using Type-2 fuzzy logic specifically designed for stochastic chemical control systems. The research addresses complexities and uncertainties inherent in stochastic pH neutralisation processes with Type-2 fuzzy logic as inversed hybrid model which able to provide good control action in the fuzzy rule. Comprehensive simulation and experimental-based performance evaluations, including setpoint tracking accuracy and disturbance rejection capabilities, were conducted to rigorously compare the proposed Type-2 fuzzy logic controller with traditional PID and conventional fuzzy logic controllers. Results demonstrate that the optimized Type-2 fuzzy logic controller significantly outperforms existing methods, offering faster system responses, minimized overshoot, and improved system stability. Further, robustness tests involving stochastic perturbations, such as variable flow rates of NaOH and HCl solutions and random acid injections during operational conditions, confirm the controller’s enhanced adaptability and effectiveness. The study concludes that the developed Type-2 fuzzy logic controller provides a robust, efficient, and reliable control solution constructed through simulation and validated using real experimental data, suitable for real-time (stochastic) management of complex stochastic chemical systems.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"18 ","pages":"Article 100275"},"PeriodicalIF":4.1,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1016/j.dche.2025.100274
Nian Ran , Fayez M. Al-Alweet , Richard Allmendinger , Ahmad Almakhlafi
Accurate classification of flow patterns in multiphase systems is pivotal for optimizing fluid transport and enhancing overall system performance. Conventional methods—such as visual inspection, standard video analysis, and high-speed imaging—remain widely used in industrial and laboratory settings. However, these approaches are often constrained by subjective interpretation, limited applicability to non-transparent pipelines, and inconsistent performance under varying operating conditions. To overcome these limitations, this study introduces a novel framework that integrates capacitance sensing with Artificial Intelligence (AI)-driven classification. The proposed methodology employs a one-dimensional Squeeze-and-Excitation Network (1D SENet) to extract and interpret time-series features from raw capacitance signals. Experimental validation demonstrates robust classification accuracies, achieving over 85 % on in-distribution datasets and 71 % on out-of-distribution scenarios—substantially outperforming traditional techniques. These findings underscore the enhanced generalization and reliability of the proposed system. This work establishes a scalable foundation for real-time flow regime monitoring and predictive analytics, offering transformative potential for intelligent fluid management in complex industrial environments.
{"title":"Automated flow pattern classification in multiphase systems using artificial intelligence and capacitance sensing techniques","authors":"Nian Ran , Fayez M. Al-Alweet , Richard Allmendinger , Ahmad Almakhlafi","doi":"10.1016/j.dche.2025.100274","DOIUrl":"10.1016/j.dche.2025.100274","url":null,"abstract":"<div><div>Accurate classification of flow patterns in multiphase systems is pivotal for optimizing fluid transport and enhancing overall system performance. Conventional methods—such as visual inspection, standard video analysis, and high-speed imaging—remain widely used in industrial and laboratory settings. However, these approaches are often constrained by subjective interpretation, limited applicability to non-transparent pipelines, and inconsistent performance under varying operating conditions. To overcome these limitations, this study introduces a novel framework that integrates capacitance sensing with Artificial Intelligence (AI)-driven classification. The proposed methodology employs a one-dimensional Squeeze-and-Excitation Network (1D SENet) to extract and interpret time-series features from raw capacitance signals. Experimental validation demonstrates robust classification accuracies, achieving over 85 % on in-distribution datasets and 71 % on out-of-distribution scenarios—substantially outperforming traditional techniques. These findings underscore the enhanced generalization and reliability of the proposed system. This work establishes a scalable foundation for real-time flow regime monitoring and predictive analytics, offering transformative potential for intelligent fluid management in complex industrial environments.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100274"},"PeriodicalIF":4.1,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145578596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of hydrogen into underground storage systems is pivotal for large-scale energy management, often involving blends with methane to leverage existing infrastructure. Accurate viscosity prediction of hydrogen – methane blends under subsurface conditions is essential for optimizing flow assurance and operational safety. Accordingly, this study employs three data-driven models, namely Genetic Expression Programming (GEP), Group Method of Data Handling (GMDH), and Multi-Gene Genetic Programming (MGGP), to predict the viscosity of hydrogen – methane mixtures for transportation and underground storage applications. A comprehensive dataset of 313 experimentally measured values from the literature were utilized to develop and validate the established correlations. The MGGP paradigm emerged as the top performer, achieving a root mean square error (RMSE) of 0.4054 and an R2 value of 0.9940, outperforming both GEP and GMDH, as well as prior predictive models. The consistency of the dataset was confirmed using the Leverage approach, ensuring robust predictions. In addition, the Shapley Additive Explanations technique revealed key factors influencing the viscosity predictions, enhancing the interpretability of the best-performing correlation. Furthermore, comparative trend analysis demonstrated the MGGP correlation's superior accuracy and robustness across varying blend compositions and operational conditions. These findings offer a reliable and simple-to-use predictive correlation for engineers and researchers designing hydrogen transport and storage systems, supporting efficient energy storage and the transition to a low-carbon economy.
{"title":"Predicting the viscosity of hydrogen – methane blends at high pressure for hydrogen transportation and geo-storage: Integration of robust white-box machine learning frameworks","authors":"Saad Alatefi , Mohamed Riad Youcefi , Menad Nait Amar , Hakim Djema","doi":"10.1016/j.dche.2025.100273","DOIUrl":"10.1016/j.dche.2025.100273","url":null,"abstract":"<div><div>The integration of hydrogen into underground storage systems is pivotal for large-scale energy management, often involving blends with methane to leverage existing infrastructure. Accurate viscosity prediction of hydrogen – methane blends under subsurface conditions is essential for optimizing flow assurance and operational safety. Accordingly, this study employs three data-driven models, namely Genetic Expression Programming (GEP), Group Method of Data Handling (GMDH), and Multi-Gene Genetic Programming (MGGP), to predict the viscosity of hydrogen – methane mixtures for transportation and underground storage applications. A comprehensive dataset of 313 experimentally measured values from the literature were utilized to develop and validate the established correlations. The MGGP paradigm emerged as the top performer, achieving a root mean square error (RMSE) of 0.4054 and an R<sup>2</sup> value of 0.9940, outperforming both GEP and GMDH, as well as prior predictive models. The consistency of the dataset was confirmed using the Leverage approach, ensuring robust predictions. In addition, the Shapley Additive Explanations technique revealed key factors influencing the viscosity predictions, enhancing the interpretability of the best-performing correlation. Furthermore, comparative trend analysis demonstrated the MGGP correlation's superior accuracy and robustness across varying blend compositions and operational conditions. These findings offer a reliable and simple-to-use predictive correlation for engineers and researchers designing hydrogen transport and storage systems, supporting efficient energy storage and the transition to a low-carbon economy.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100273"},"PeriodicalIF":4.1,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145473898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-27DOI: 10.1016/j.dche.2025.100272
Haoran Ji, Lena Fuhrmann, Juan Fernando Meza Gonzalez, Frank Rhein
This study presents a robust, parallelized optimization framework for kernel parameter identification that is adaptable to any population balance equation (PBE) formulation and process type. The framework addresses the challenge of incomplete 2D particle size distribution (PSD) measurements in multi-material systems by combining a reduced 2D PSD with complementary 1D datasets. The framework was validated by using noisy synthetic PSD data and evaluating both the error in PSD and kernel values across eight kernel parameters. Hyperparameter and sensitivity analyses provided configuration recommendations and insights into the influence of individual parameters, thus guiding kernel model selection. Incorporating prior knowledge of one kernel parameter (e.g., through multi-scale simulations) mitigated non-unique solutions and enhanced noise tolerance, ultimately improving the framework’s robustness and reliability. A case study based on experimental data from a dispersion process demonstrated the framework’s flexibility and practical relevance.
{"title":"Optimization-based framework for kernel parameter identification in multi-material population balance models","authors":"Haoran Ji, Lena Fuhrmann, Juan Fernando Meza Gonzalez, Frank Rhein","doi":"10.1016/j.dche.2025.100272","DOIUrl":"10.1016/j.dche.2025.100272","url":null,"abstract":"<div><div>This study presents a robust, parallelized optimization framework for kernel parameter identification that is adaptable to any population balance equation (PBE) formulation and process type. The framework addresses the challenge of incomplete 2D particle size distribution (PSD) measurements in multi-material systems by combining a reduced 2D PSD with complementary 1D datasets. The framework was validated by using noisy synthetic PSD data and evaluating both the error in PSD and kernel values across eight kernel parameters. Hyperparameter and sensitivity analyses provided configuration recommendations and insights into the influence of individual parameters, thus guiding kernel model selection. Incorporating prior knowledge of one kernel parameter (e.g., through multi-scale simulations) mitigated non-unique solutions and enhanced noise tolerance, ultimately improving the framework’s robustness and reliability. A case study based on experimental data from a dispersion process demonstrated the framework’s flexibility and practical relevance.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100272"},"PeriodicalIF":4.1,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-24DOI: 10.1016/j.dche.2025.100271
Pál Péter Hanzelik , Szilveszter Gergely , János Abonyi , Alex Kummer
Developing accurate industrial prediction models for complex industrial and geological applications remains a significant challenge, particularly when relying on limited and disparate spectroscopic data. Traditional data fusion methods often fall short in effectively integrating complementary information across different spectral sources, limiting predictive performance. Complex-level ensemble fusion (CLF) is presented as a two-layer chemometric algorithm that jointly selects variables from concatenated mid-infrared (MIR) and Raman spectra with a genetic algorithm, projects them with partial least squares and stacks the latent variables into an XGBoost regressor, thereby capturing feature- and model-level complementarities in a single workflow. When benchmarked against single-source models and classical low-, mid-, and high-level data-fusion schemes, the CLF technique consistently demonstrated significantly improved predictive accuracy. Evaluated on paired Mid-Infrared (MIR) and Raman datasets from industrial lubricant additives and RRUFF minerals, CLF robustly outperformed established methodologies by effectively leveraging complementary spectral information. Mid-level fusion yielded no improvement, underscoring the need for supervised integration. These results constitute the first evidence that a stacked, complex-level scheme can surpass all established fusion levels on real-world spectroscopic regressions comprising fewer than one hundred samples and provide a transferable recipe for building more accurate and resilient soft sensors in quality-control and geochemical applications.
{"title":"Data fusion of spectroscopic data for enhancing machine learning model performance","authors":"Pál Péter Hanzelik , Szilveszter Gergely , János Abonyi , Alex Kummer","doi":"10.1016/j.dche.2025.100271","DOIUrl":"10.1016/j.dche.2025.100271","url":null,"abstract":"<div><div>Developing accurate industrial prediction models for complex industrial and geological applications remains a significant challenge, particularly when relying on limited and disparate spectroscopic data. Traditional data fusion methods often fall short in effectively integrating complementary information across different spectral sources, limiting predictive performance. Complex-level ensemble fusion (CLF) is presented as a two-layer chemometric algorithm that jointly selects variables from concatenated mid-infrared (MIR) and Raman spectra with a genetic algorithm, projects them with partial least squares and stacks the latent variables into an XGBoost regressor, thereby capturing feature- and model-level complementarities in a single workflow. When benchmarked against single-source models and classical low-, mid-, and high-level data-fusion schemes, the CLF technique consistently demonstrated significantly improved predictive accuracy. Evaluated on paired Mid-Infrared (MIR) and Raman datasets from industrial lubricant additives and RRUFF minerals, CLF robustly outperformed established methodologies by effectively leveraging complementary spectral information. Mid-level fusion yielded no improvement, underscoring the need for supervised integration. These results constitute the first evidence that a stacked, complex-level scheme can surpass all established fusion levels on real-world spectroscopic regressions comprising fewer than one hundred samples and provide a transferable recipe for building more accurate and resilient soft sensors in quality-control and geochemical applications.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100271"},"PeriodicalIF":4.1,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1016/j.dche.2025.100270
Abid Aman, Yiqi Liu, Yan Chen
Early fault identification and evaluation are crucial to ensure the efficiency, safety, and reliability of the industrial process. With the rapid growth of process data in modern industries, machine learning and data-driven methods have become indispensable for effective process monitoring and fault diagnosis. This study proposes a fault detection framework that effectively leverages feature fusion and ensemble learning to boost monitoring performance under intricate industrial conditions. The proposed method combines Slow Feature Analysis (SFA), Kernel SFA (KSFA), and Dynamic SFA (DSFA) to extract distinctive features that accurately reflect linear, nonlinear, and dynamic changes in process data. Furthermore, independent applications of ensemble learning techniques, such as majority and weighted voting, can further increase the reliability of identifying faults with the help of statistical monitoring metrics. The effectiveness of this approach is confirmed using the Tennessee Eastman (TE) benchmark dataset alongside real-world data from a wastewater treatment facility in Beijing. The study spans simulated and real industrial settings to develop a robust framework for fault detection in dynamic and nonlinear processes. The results show that feature fusion and ensemble learning outperform single-model approaches, offering higher sensitivity and reliability. The framework demonstrates strong potential to reduce false alarms, improve anomaly detection, and enhance both efficiency and safety in industrial operations.
{"title":"Towards robust fault detection for industrial processes with a hybrid feature fusion and ensemble learning framework","authors":"Abid Aman, Yiqi Liu, Yan Chen","doi":"10.1016/j.dche.2025.100270","DOIUrl":"10.1016/j.dche.2025.100270","url":null,"abstract":"<div><div>Early fault identification and evaluation are crucial to ensure the efficiency, safety, and reliability of the industrial process. With the rapid growth of process data in modern industries, machine learning and data-driven methods have become indispensable for effective process monitoring and fault diagnosis. This study proposes a fault detection framework that effectively leverages feature fusion and ensemble learning to boost monitoring performance under intricate industrial conditions. The proposed method combines Slow Feature Analysis (SFA), Kernel SFA (KSFA), and Dynamic SFA (DSFA) to extract distinctive features that accurately reflect linear, nonlinear, and dynamic changes in process data. Furthermore, independent applications of ensemble learning techniques, such as majority and weighted voting, can further increase the reliability of identifying faults with the help of statistical monitoring metrics. The effectiveness of this approach is confirmed using the Tennessee Eastman (TE) benchmark dataset alongside real-world data from a wastewater treatment facility in Beijing. The study spans simulated and real industrial settings to develop a robust framework for fault detection in dynamic and nonlinear processes. The results show that feature fusion and ensemble learning outperform single-model approaches, offering higher sensitivity and reliability. The framework demonstrates strong potential to reduce false alarms, improve anomaly detection, and enhance both efficiency and safety in industrial operations.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100270"},"PeriodicalIF":4.1,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-11DOI: 10.1016/j.dche.2025.100269
Jakob Kjøbsted Huusom , Mark N. Jones , Julian Kager , Kim Dam-Johansen , Jochen A.H. Dreyer
The digitalization of pilot-scale chemical engineering facilities offers significant potential for enabling e.g. data-driven research, process modeling, closed loop process control, and digital twin development, but the implementation of robust and maintainable infrastructure remains a practical challenge. This case study presents the digitalization of the Pilot Plant at DTU Chemical Engineering, with a focus on building a scalable and reproducible architecture for real-time data access, structured data storage, and unified system control.
A key feature of the infrastructure is the use of standardized OPC UA gateways to establish encrypted connections to a diverse set of legacy and modern unit operations. While the supervisory control and data acquisition (SCADA) system communicates directly with the OPC UA gateways, the data streams are also structured using an intermediate data broker. Here, each tag is organized by unit operation and type (e.g., sensors, controls, configurations) aligned with the underlying database schema. The broker then publishes all real-time data via MQTT. Containerized Python applications deployed on a dedicated server subscribe to the MQTT data streams and whenever experiments are active, write the real-time data to an SQL database. The system is fully extensible: new units or sensors can be added without modifying the database schema or Python code.
Unified operation and metadata collection are enabled through a web-based SCADA system, while version-controlled CI/CD pipelines ensure reproducible deployment of all services on the server. This workflow avoids manual modifications to the server and simplifies long-term maintenance. The use of open communication protocols minimizes dependency on proprietary services and ensures that individual components can be replaced or extended without vendor lock-in.
The resulting infrastructure provides both real-time and historical access to high-quality experimental data, supporting applications ranging from digital twin development and process optimization to machine learning. It serves as an educational resource used annually by approximately 150–200 students across five courses, in addition to student and Ph.D. projects. The SCADA system is routinely applied during pilot-scale unit operation exercises, while advanced courses make use of live data access and interaction with the SQL database. Beyond education, the infrastructure has been adopted across multiple research centers: for example, it underpins recent work on hybrid modeling and digital twins for pilot-scale bubble column and distillation units, and its modular components (CI/CD pipelines, database, MQTT broker, data broker) are being reused in other digitalization initiatives. These developments highlight both the scalability of the approach and its value as a transferable reference for academic and industrial pilot plants.
{"title":"Building a scalable digital infrastructure for a (bio)chemical engineering pilot plant: A case study from DTU","authors":"Jakob Kjøbsted Huusom , Mark N. Jones , Julian Kager , Kim Dam-Johansen , Jochen A.H. Dreyer","doi":"10.1016/j.dche.2025.100269","DOIUrl":"10.1016/j.dche.2025.100269","url":null,"abstract":"<div><div>The digitalization of pilot-scale chemical engineering facilities offers significant potential for enabling e.g. data-driven research, process modeling, closed loop process control, and digital twin development, but the implementation of robust and maintainable infrastructure remains a practical challenge. This case study presents the digitalization of the Pilot Plant at DTU Chemical Engineering, with a focus on building a scalable and reproducible architecture for real-time data access, structured data storage, and unified system control.</div><div>A key feature of the infrastructure is the use of standardized OPC UA gateways to establish encrypted connections to a diverse set of legacy and modern unit operations. While the supervisory control and data acquisition (SCADA) system communicates directly with the OPC UA gateways, the data streams are also structured using an intermediate data broker. Here, each tag is organized by unit operation and type (e.g., sensors, controls, configurations) aligned with the underlying database schema. The broker then publishes all real-time data via MQTT. Containerized Python applications deployed on a dedicated server subscribe to the MQTT data streams and whenever experiments are active, write the real-time data to an SQL database. The system is fully extensible: new units or sensors can be added without modifying the database schema or Python code.</div><div>Unified operation and metadata collection are enabled through a web-based SCADA system, while version-controlled CI/CD pipelines ensure reproducible deployment of all services on the server. This workflow avoids manual modifications to the server and simplifies long-term maintenance. The use of open communication protocols minimizes dependency on proprietary services and ensures that individual components can be replaced or extended without vendor lock-in.</div><div>The resulting infrastructure provides both real-time and historical access to high-quality experimental data, supporting applications ranging from digital twin development and process optimization to machine learning. It serves as an educational resource used annually by approximately 150–200 students across five courses, in addition to student and Ph.D. projects. The SCADA system is routinely applied during pilot-scale unit operation exercises, while advanced courses make use of live data access and interaction with the SQL database. Beyond education, the infrastructure has been adopted across multiple research centers: for example, it underpins recent work on hybrid modeling and digital twins for pilot-scale bubble column and distillation units, and its modular components (CI/CD pipelines, database, MQTT broker, data broker) are being reused in other digitalization initiatives. These developments highlight both the scalability of the approach and its value as a transferable reference for academic and industrial pilot plants.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"17 ","pages":"Article 100269"},"PeriodicalIF":4.1,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145321012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}