Learning-to-learn efficiently with self-learning

Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning Pub Date : 2022-06-12 DOI:10.1145/3533028.3533307

Shruti Kunde, Sharod Roy Choudhury, Amey Pandit, Rekha Singhal

{"title":"Learning-to-learn efficiently with self-learning","authors":"Shruti Kunde, Sharod Roy Choudhury, Amey Pandit, Rekha Singhal","doi":"10.1145/3533028.3533307","DOIUrl":null,"url":null,"abstract":"Digital Twins of industrial process plants enable various what-if and if-what scenarios of the plants' functioning for fault diagnosis and general monitoring in the real-world. They do so through machine learning (ML) models built using data from sensors fitted in the plant. Over time, environmental factors cause variations in sensor readings, adversely affecting quality of the models' predictions. This triggers the self-learning loop, leading to the re-tuning/re-training of models. Reducing the time spent in self-learning of the models is a challenging task since there exist multiple models that need to be trained repeatedly using multiple algorithms which translates into large training time. We propose a metalearner which recommends the optimal regression algorithm for a model, thereby eliminating the need for training the model on multiple algorithms for every self-learning instance. The metalearner is trained on metafeatures extracted from the data which makes it application agnostic. We introduce domain metafeatures, which enhance metalearner prediction accuracy and propose machine learning and deep learning based approaches for selecting optimal metafeatures. To ensure relevance of selected metafeatures, we introduce novel static and dynamic reward functions for dynamic metafeature selection using a Q-Learning based approach. Our metalearning approach accelerates the time for determining the optimal regressor among 5 potential regressors from 5X to 27X over the traditional self-learning approaches. The incremental pre-processing approach achieves a speed-up of 25X over the traditional approach. The proposed metalearner achieves an AUC of 0.989, 0.954 and 0.998 for ML, DL and RL based approaches for metafeature selection respectively. We illustrate our findings on 3 datasets from the industrial process domain.","PeriodicalId":345888,"journal":{"name":"Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533028.3533307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Digital Twins of industrial process plants enable various what-if and if-what scenarios of the plants' functioning for fault diagnosis and general monitoring in the real-world. They do so through machine learning (ML) models built using data from sensors fitted in the plant. Over time, environmental factors cause variations in sensor readings, adversely affecting quality of the models' predictions. This triggers the self-learning loop, leading to the re-tuning/re-training of models. Reducing the time spent in self-learning of the models is a challenging task since there exist multiple models that need to be trained repeatedly using multiple algorithms which translates into large training time. We propose a metalearner which recommends the optimal regression algorithm for a model, thereby eliminating the need for training the model on multiple algorithms for every self-learning instance. The metalearner is trained on metafeatures extracted from the data which makes it application agnostic. We introduce domain metafeatures, which enhance metalearner prediction accuracy and propose machine learning and deep learning based approaches for selecting optimal metafeatures. To ensure relevance of selected metafeatures, we introduce novel static and dynamic reward functions for dynamic metafeature selection using a Q-Learning based approach. Our metalearning approach accelerates the time for determining the optimal regressor among 5 potential regressors from 5X to 27X over the traditional self-learning approaches. The incremental pre-processing approach achieves a speed-up of 25X over the traditional approach. The proposed metalearner achieves an AUC of 0.989, 0.954 and 0.998 for ML, DL and RL based approaches for metafeature selection respectively. We illustrate our findings on 3 datasets from the industrial process domain.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学会通过自学有效地学习

工业过程工厂的数字孪生可以实现各种假设和假设场景的工厂功能，用于故障诊断和现实世界中的一般监控。他们通过使用安装在工厂中的传感器的数据建立的机器学习(ML)模型来做到这一点。随着时间的推移，环境因素导致传感器读数的变化，对模型预测的质量产生不利影响。这触发了自学习循环，导致模型的重新调整/重新训练。减少模型的自学习时间是一项具有挑战性的任务，因为存在多个需要使用多种算法重复训练的模型，这意味着大量的训练时间。我们提出了一个元学习器，它为模型推荐最优回归算法，从而消除了为每个自学习实例在多个算法上训练模型的需要。元学习器是根据从数据中提取的元特征进行训练的，这使得它与应用无关。我们引入了域元特征，提高了元学习器的预测精度，并提出了基于机器学习和深度学习的方法来选择最优元特征。为了确保所选元特征的相关性，我们使用基于q学习的方法为动态元特征选择引入了新的静态和动态奖励函数。与传统的自学习方法相比，我们的元学习方法将从5个潜在回归量中确定最佳回归量的时间从5X加快到27X。增量预处理方法比传统方法的速度提高了25倍。对于基于ML、DL和RL的元特征选择方法，本文提出的元学习器的AUC分别为0.989、0.954和0.998。我们用来自工业过程领域的3个数据集来说明我们的发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning

自引率

0.00%

发文量

期刊最新文献

dcbench GouDa - generation of universal data sets: improving analysis and evaluation of data preparation pipelines How I stopped worrying about training data bugs and started complaining Evaluating model serving strategies over streaming data Accelerating container-based deep learning hyperparameter optimization workloads