结合实例选择和自训练，提高数据流量化

Journal of the Brazilian Computer Society Pub Date : 2018-10-12 DOI:10.1186/s13173-018-0076-0

André G. Maletzke, Denis M. dos Reis, Gustavo E. A. P. A. Batista

{"title":"结合实例选择和自训练，提高数据流量化","authors":"André G. Maletzke, Denis M. dos Reis, Gustavo E. A. P. A. Batista","doi":"10.1186/s13173-018-0076-0","DOIUrl":null,"url":null,"abstract":"In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.","PeriodicalId":39760,"journal":{"name":"Journal of the Brazilian Computer Society","volume":"33 1","pages":"1-17"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Combining instance selection and self-training to improve data stream quantification\",\"authors\":\"André G. Maletzke, Denis M. dos Reis, Gustavo E. A. P. A. Batista\",\"doi\":\"10.1186/s13173-018-0076-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.\",\"PeriodicalId\":39760,\"journal\":{\"name\":\"Journal of the Brazilian Computer Society\",\"volume\":\"33 1\",\"pages\":\"1-17\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Brazilian Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13173-018-0076-0\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Brazilian Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13173-018-0076-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

近年来，数据流学习因其大量的应用而引起了研究人员和实践者的关注。这些应用促使研究界提出了大量的方法来解决不同任务中的问题，尤其是在分类、聚类和异常检测方面。然而，一项被称为量化的相关任务大部分仍未被探索。量化目标是在未标记的集合中提供类流行率的估计。最近，我们提出了SQSI算法来量化带有概念漂移的数据流。SQSI使用统计测试来识别概念漂移并重新训练分类器。然而，再培训涉及到要求所有新到达实例的标签。在本文中，我们通过探索与半监督学习相关的实例选择技术来扩展SQSI算法。其思想是请求最近示例的一个较小子集的类。我们的实验表明，SQSI的扩展显著降低了对实际标签的依赖，同时保持或提高了量化准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Combining instance selection and self-training to improve data stream quantification

In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Brazilian Computer Society Computer Science-Computer Science (all)

CiteScore

2.40

自引率

0.00%

发文量

期刊介绍： JBCS is a formal quarterly publication of the Brazilian Computer Society. It is a peer-reviewed international journal which aims to serve as a forum to disseminate innovative research in all fields of computer science and related subjects. Theoretical, practical and experimental papers reporting original research contributions are welcome, as well as high quality survey papers. The journal is open to contributions in all computer science topics, computer systems development or in formal and theoretical aspects of computing, as the list of topics below is not exhaustive. Contributions will be considered for publication in JBCS if they have not been published previously and are not under consideration for publication elsewhere.