{"title":"Measuring Drift Severity by Tree Structure Classifiers","authors":"Di Zhao, Yun Sing Koh, Philippe Fournier-Viger","doi":"10.1109/IJCNN55064.2022.9892439","DOIUrl":null,"url":null,"abstract":"Streaming data has become more common as our ability to collect data in real-time increases. A primary concern in dealing with data streams is concept drift, which describes changes in the underlying distribution of streaming data. Measuring drift severity is crucial for model adaptation. Drift severity can be a proxy in choosing concept drift adaptation strategies. Current methods measure drift severity by monitoring the changes in the learner performance or measuring the difference between data distributions. However, these methods cannot measure the drift severity if the ground truth labels are unavailable. Specifically, performance-based methods cannot measure marginal drift, and distribution-based methods cannot measure conditional drift. We propose a novel framework named Tree-based Drift Measurement (TDM) that measures both marginal and conditional drift without revisiting historical data. TDM measures the difference between tree classifiers by transforming them into sets of binary vectors. An experiment shows that TDM achieves similar performance to the state-of-the-art methods and provides the best trade-off between runtime and memory usage. A case study shows that the online learner performance can be improved by adapting different drift adaptation strategies based on the drift severity.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892439","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Streaming data has become more common as our ability to collect data in real-time increases. A primary concern in dealing with data streams is concept drift, which describes changes in the underlying distribution of streaming data. Measuring drift severity is crucial for model adaptation. Drift severity can be a proxy in choosing concept drift adaptation strategies. Current methods measure drift severity by monitoring the changes in the learner performance or measuring the difference between data distributions. However, these methods cannot measure the drift severity if the ground truth labels are unavailable. Specifically, performance-based methods cannot measure marginal drift, and distribution-based methods cannot measure conditional drift. We propose a novel framework named Tree-based Drift Measurement (TDM) that measures both marginal and conditional drift without revisiting historical data. TDM measures the difference between tree classifiers by transforming them into sets of binary vectors. An experiment shows that TDM achieves similar performance to the state-of-the-art methods and provides the best trade-off between runtime and memory usage. A case study shows that the online learner performance can be improved by adapting different drift adaptation strategies based on the drift severity.