Jiannan Zhu , Chen Fan , Minglei Yang , Feng Qian , Vladimir Mahalec
{"title":"A semi-supervised learning algorithm for high and low-frequency variable imbalances in industrial data","authors":"Jiannan Zhu , Chen Fan , Minglei Yang , Feng Qian , Vladimir Mahalec","doi":"10.1016/j.compchemeng.2024.108933","DOIUrl":null,"url":null,"abstract":"<div><div>This work introduces a semi-supervised learning algorithm to estimate missing data for processes where measured data is comprised of variables that are measured at high frequency and low frequency. A semi-supervised learning algorithm named “Weight-Adjusted Consistency Regularization Algorithm for Semi-Supervised Learning” (WACR-SSL) based on consistency regularization is proposed. The algorithm splits the irregular unbalanced data set into three parts and processes them separately. To address the loss balancing problem, five loss balancing methods have been tested: Uncertainty Weights (UW), Random Loss Weighting (RLW), Dynamic Weight Average (DWA), Geometric Loss Strategy (GLS) and the logarithmic transformation (LogT). When applied to data from a hydrocracking process, the algorithm effectively leverages partially labeled data. With carefully chosen noise scales and the coefficient for the unsupervised loss, the uncertainty weight (UW) variant performs the best when compared to the other loss balancing methods.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"193 ","pages":"Article 108933"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S009813542400351X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
This work introduces a semi-supervised learning algorithm to estimate missing data for processes where measured data is comprised of variables that are measured at high frequency and low frequency. A semi-supervised learning algorithm named “Weight-Adjusted Consistency Regularization Algorithm for Semi-Supervised Learning” (WACR-SSL) based on consistency regularization is proposed. The algorithm splits the irregular unbalanced data set into three parts and processes them separately. To address the loss balancing problem, five loss balancing methods have been tested: Uncertainty Weights (UW), Random Loss Weighting (RLW), Dynamic Weight Average (DWA), Geometric Loss Strategy (GLS) and the logarithmic transformation (LogT). When applied to data from a hydrocracking process, the algorithm effectively leverages partially labeled data. With carefully chosen noise scales and the coefficient for the unsupervised loss, the uncertainty weight (UW) variant performs the best when compared to the other loss balancing methods.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.