{"title":"Scarcity of Labels in Non-Stationary Data Streams: A Survey","authors":"Conor Fahy, Shengxiang Yang, Mario Gongora","doi":"10.1145/3494832","DOIUrl":null,"url":null,"abstract":"In a dynamic stream there is an assumption that the underlying process generating the stream is non-stationary and that concepts within the stream will drift and change as the stream progresses. Concepts learned by a classification model are prone to change and non-adaptive models are likely to deteriorate and become ineffective over time. The challenge of recognising and reacting to change in a stream is compounded by the scarcity of labels problem. This refers to the very realistic situation in which the true class label of an incoming point is not immediately available (or might never be available) or in situations where manually annotating data points are prohibitively expensive. In a high-velocity stream, it is perhaps impossible to manually label every incoming point and pursue a fully supervised approach. In this article, we formally describe the types of change, which can occur in a data-stream and then catalogue the methods for dealing with change when there is limited access to labels. We present an overview of the most influential ideas in the field along with recent advancements and we highlight trends, research gaps, and future research directions.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"2 1","pages":"1 - 39"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys (CSUR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3494832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
In a dynamic stream there is an assumption that the underlying process generating the stream is non-stationary and that concepts within the stream will drift and change as the stream progresses. Concepts learned by a classification model are prone to change and non-adaptive models are likely to deteriorate and become ineffective over time. The challenge of recognising and reacting to change in a stream is compounded by the scarcity of labels problem. This refers to the very realistic situation in which the true class label of an incoming point is not immediately available (or might never be available) or in situations where manually annotating data points are prohibitively expensive. In a high-velocity stream, it is perhaps impossible to manually label every incoming point and pursue a fully supervised approach. In this article, we formally describe the types of change, which can occur in a data-stream and then catalogue the methods for dealing with change when there is limited access to labels. We present an overview of the most influential ideas in the field along with recent advancements and we highlight trends, research gaps, and future research directions.