{"title":"基于共形马丁格尔框架的半监督概念漂移检测与适应","authors":"Yu Zhang, Ping Zhou, Ruiyao Zhang, Shaowen Lu, Tianyou Chai","doi":"10.1016/j.jprocont.2025.103374","DOIUrl":null,"url":null,"abstract":"<div><div>In the realm of industrial applications for machine learning, multiple challenges are frequently encountered, such as concept drift (CD) and the prohibitive costs associated with data labeling. CD refers to the scenario where the underlying data distribution of the model shifts over time, potentially deteriorating model performance. Addressing these challenges, this paper proposes an innovative semi-supervised CD detection method, specifically designed to tackle both CD and the high costs of data labeling in regression tasks. Initially, considering the high expense of acquiring labeled data in industrial application scenarios, a semi-supervised learning strategy based on self-training is utilized. In this strategy, prediction intervals generated by Conformal Prediction (CP) are used to select high-reliability pseudo-labels. Furthermore, to effectively address CD in real-world industrial settings, the Conformal Martingale (CM) is employed for real-time detection. This framework detects changes by identifying increases in martingale values when CD occurs. Upon detection, the model is promptly retrained using the most recent data following the drift. Finally, the proposed method is validated through experiments conducted on three datasets: the UCI dataset, the alumina evaporation process dataset, and the blast furnace ironmaking dataset. Experimental results demonstrate that the proposed semi-supervised method significantly enhances the performance of the original training model. The detection method accurately identifies CD and notably reduces test errors through model retraining, thereby improving the effectiveness of the model in real-world industrial applications.</div></div>","PeriodicalId":50079,"journal":{"name":"Journal of Process Control","volume":"147 ","pages":"Article 103374"},"PeriodicalIF":3.3000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-supervised concept drift detection and adaptation based on conformal martingale framework\",\"authors\":\"Yu Zhang, Ping Zhou, Ruiyao Zhang, Shaowen Lu, Tianyou Chai\",\"doi\":\"10.1016/j.jprocont.2025.103374\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the realm of industrial applications for machine learning, multiple challenges are frequently encountered, such as concept drift (CD) and the prohibitive costs associated with data labeling. CD refers to the scenario where the underlying data distribution of the model shifts over time, potentially deteriorating model performance. Addressing these challenges, this paper proposes an innovative semi-supervised CD detection method, specifically designed to tackle both CD and the high costs of data labeling in regression tasks. Initially, considering the high expense of acquiring labeled data in industrial application scenarios, a semi-supervised learning strategy based on self-training is utilized. In this strategy, prediction intervals generated by Conformal Prediction (CP) are used to select high-reliability pseudo-labels. Furthermore, to effectively address CD in real-world industrial settings, the Conformal Martingale (CM) is employed for real-time detection. This framework detects changes by identifying increases in martingale values when CD occurs. Upon detection, the model is promptly retrained using the most recent data following the drift. Finally, the proposed method is validated through experiments conducted on three datasets: the UCI dataset, the alumina evaporation process dataset, and the blast furnace ironmaking dataset. Experimental results demonstrate that the proposed semi-supervised method significantly enhances the performance of the original training model. The detection method accurately identifies CD and notably reduces test errors through model retraining, thereby improving the effectiveness of the model in real-world industrial applications.</div></div>\",\"PeriodicalId\":50079,\"journal\":{\"name\":\"Journal of Process Control\",\"volume\":\"147 \",\"pages\":\"Article 103374\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Process Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0959152425000022\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Process Control","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0959152425000022","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
在机器学习的工业应用领域,经常会遇到多种挑战,例如概念漂移(CD)和与数据标注相关的高昂成本。概念漂移指的是模型的基础数据分布随着时间的推移而发生变化,从而可能导致模型性能下降。为了应对这些挑战,本文提出了一种创新的半监督 CD 检测方法,专门用于解决回归任务中的 CD 和数据标注的高成本问题。首先,考虑到在工业应用场景中获取标记数据的高昂成本,本文采用了基于自我训练的半监督学习策略。在这一策略中,利用共形预测(CP)生成的预测区间来选择高可靠性的伪标签。此外,为了有效解决实际工业环境中的 CD 问题,还采用了共形马丁格尔(CM)进行实时检测。当 CD 发生时,该框架通过识别马氏值的增加来检测变化。一经检测到,就会立即使用漂移后的最新数据对模型进行重新训练。最后,通过在三个数据集(UCI 数据集、氧化铝蒸发过程数据集和高炉炼铁数据集)上进行实验,对所提出的方法进行了验证。实验结果表明,所提出的半监督方法显著提高了原始训练模型的性能。该检测方法能准确识别 CD,并通过模型再训练显著减少了测试误差,从而提高了模型在实际工业应用中的有效性。
Semi-supervised concept drift detection and adaptation based on conformal martingale framework
In the realm of industrial applications for machine learning, multiple challenges are frequently encountered, such as concept drift (CD) and the prohibitive costs associated with data labeling. CD refers to the scenario where the underlying data distribution of the model shifts over time, potentially deteriorating model performance. Addressing these challenges, this paper proposes an innovative semi-supervised CD detection method, specifically designed to tackle both CD and the high costs of data labeling in regression tasks. Initially, considering the high expense of acquiring labeled data in industrial application scenarios, a semi-supervised learning strategy based on self-training is utilized. In this strategy, prediction intervals generated by Conformal Prediction (CP) are used to select high-reliability pseudo-labels. Furthermore, to effectively address CD in real-world industrial settings, the Conformal Martingale (CM) is employed for real-time detection. This framework detects changes by identifying increases in martingale values when CD occurs. Upon detection, the model is promptly retrained using the most recent data following the drift. Finally, the proposed method is validated through experiments conducted on three datasets: the UCI dataset, the alumina evaporation process dataset, and the blast furnace ironmaking dataset. Experimental results demonstrate that the proposed semi-supervised method significantly enhances the performance of the original training model. The detection method accurately identifies CD and notably reduces test errors through model retraining, thereby improving the effectiveness of the model in real-world industrial applications.
期刊介绍:
This international journal covers the application of control theory, operations research, computer science and engineering principles to the solution of process control problems. In addition to the traditional chemical processing and manufacturing applications, the scope of process control problems involves a wide range of applications that includes energy processes, nano-technology, systems biology, bio-medical engineering, pharmaceutical processing technology, energy storage and conversion, smart grid, and data analytics among others.
Papers on the theory in these areas will also be accepted provided the theoretical contribution is aimed at the application and the development of process control techniques.
Topics covered include:
• Control applications• Process monitoring• Plant-wide control• Process control systems• Control techniques and algorithms• Process modelling and simulation• Design methods
Advanced design methods exclude well established and widely studied traditional design techniques such as PID tuning and its many variants. Applications in fields such as control of automotive engines, machinery and robotics are not deemed suitable unless a clear motivation for the relevance to process control is provided.