João Guilherme Mattos , Patrick Nigri Happ , William Fernandes , Helio Côrtes Vieira Lopes , Simone D J Barbosa , Marcos Kalinowski , Luisa Silveira Rosa , Cassia Novello , Leonardo Dorigo Ribeiro , Patricia Rodrigues Ventura , Marcelo Cardoso Marques , Renato Neves Pitta , Valmir Jose Camolesi , Livia Pereira Lemos Costa , Bruno Itagyba Paravidino , Cristiane Salgado Pereira
{"title":"A framework for enhancing industrial soft sensor learning models","authors":"João Guilherme Mattos , Patrick Nigri Happ , William Fernandes , Helio Côrtes Vieira Lopes , Simone D J Barbosa , Marcos Kalinowski , Luisa Silveira Rosa , Cassia Novello , Leonardo Dorigo Ribeiro , Patricia Rodrigues Ventura , Marcelo Cardoso Marques , Renato Neves Pitta , Valmir Jose Camolesi , Livia Pereira Lemos Costa , Bruno Itagyba Paravidino , Cristiane Salgado Pereira","doi":"10.1016/j.dche.2023.100112","DOIUrl":null,"url":null,"abstract":"<div><p>Refinery industrial processes are very complex with nonlinear dynamics resulting from varying feedstock characteristics and also from changes in product prioritization. Along these processes, there are key properties of intermediate compounds that must be monitored and controlled since they directly affect the quality of the end products commercialized by these manufacturers. However, most of these properties can only be measured through time-consuming and expensive laboratory analysis, which is impossible to obtain in high frequencies, as required to properly monitor them. In this sense, developing soft sensors is the most common way to obtain high-frequency estimations for these measurements, helping advanced control systems to establish the correct setpoints for temperatures, pressures, and other sensors along the refining process, controlling the quality of end products. Since the amount of labeled data is scarce, most academic research has focused on employing semi- supervised learning strategies to develop machine learning (ML) models as soft sensors. Our research, on the other hand, goes in another direction. We aim to elaborate a framework that leverages the knowledge of domain experts and employs data augmentation techniques to build an enhanced fully labeled dataset that could be fed to any supervised ML algorithm to generate a quality soft sensor. We applied our framework together with Automated ML to train a model capable of predicting a specific key property associated with the production of Naphtha compounds in a refinery: the ASTM <span>95</span><svg><path></path></svg>% distillation temperature of the Heavy Naphtha. Although our framework is model agnostic, we opted by using Automated ML for the optimization strategy, since it applies a diverse set of models to the dataset, reducing the bias of utilizing a single optimization algorithm. We evaluated the proposed framework on a case study carried out in an industrial refinery in Brazil, where the previous model in production for estimating the ASTM <span>95</span><svg><path></path></svg>% distillation temperature of the Heavy Naphtha was based entirely on the physicochemical knowledge of the process. By adopting our framework with Automated ML, we were capable of improving the R<sup>2</sup> score by 120%. The resulting ML model is currently operating in real-time inside the refinery, leading to significant economic gains.</p></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"8 ","pages":"Article 100112"},"PeriodicalIF":3.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Chemical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772508123000303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 1
Abstract
Refinery industrial processes are very complex with nonlinear dynamics resulting from varying feedstock characteristics and also from changes in product prioritization. Along these processes, there are key properties of intermediate compounds that must be monitored and controlled since they directly affect the quality of the end products commercialized by these manufacturers. However, most of these properties can only be measured through time-consuming and expensive laboratory analysis, which is impossible to obtain in high frequencies, as required to properly monitor them. In this sense, developing soft sensors is the most common way to obtain high-frequency estimations for these measurements, helping advanced control systems to establish the correct setpoints for temperatures, pressures, and other sensors along the refining process, controlling the quality of end products. Since the amount of labeled data is scarce, most academic research has focused on employing semi- supervised learning strategies to develop machine learning (ML) models as soft sensors. Our research, on the other hand, goes in another direction. We aim to elaborate a framework that leverages the knowledge of domain experts and employs data augmentation techniques to build an enhanced fully labeled dataset that could be fed to any supervised ML algorithm to generate a quality soft sensor. We applied our framework together with Automated ML to train a model capable of predicting a specific key property associated with the production of Naphtha compounds in a refinery: the ASTM 95% distillation temperature of the Heavy Naphtha. Although our framework is model agnostic, we opted by using Automated ML for the optimization strategy, since it applies a diverse set of models to the dataset, reducing the bias of utilizing a single optimization algorithm. We evaluated the proposed framework on a case study carried out in an industrial refinery in Brazil, where the previous model in production for estimating the ASTM 95% distillation temperature of the Heavy Naphtha was based entirely on the physicochemical knowledge of the process. By adopting our framework with Automated ML, we were capable of improving the R2 score by 120%. The resulting ML model is currently operating in real-time inside the refinery, leading to significant economic gains.