从流到在线合奏的多类不平衡半监督学习

P. Vafaie, H. Viktor, W. Michalowski
{"title":"从流到在线合奏的多类不平衡半监督学习","authors":"P. Vafaie, H. Viktor, W. Michalowski","doi":"10.1109/ICDMW51313.2020.00124","DOIUrl":null,"url":null,"abstract":"Multi-class imbalance, in which the rates of instances in the various classes differ substantially, poses a major challenge when learning from evolving streams. In this setting, minority class instances may arrive infrequently and in bursts, making accurate model construction problematic. Further, skewed streams are not only susceptible to concept drifts, but class labels may also be absent, expensive to obtain, or only arrive after some delay. The combined effects of multi-class skew, concept drift and semi-supervised learning have received limited attention in the online learning community. In this paper, we introduce a multi-class online ensemble algorithm that is suitable for learning in such settings. Specifically, our algorithm uses sampling with replacement while dynamically increasing the weights of underrepresented classes based on recall in order to produce models that benefit all classes. Our approach addresses the potential lack of labels by incorporating a self-training semi-supervised learning method for labeling instances. Our experimental results show that our online ensemble performs well against multi-class imbalanced data containing concept drifts. In addition, our algorithm produces accurate predictions, even in the presence of unlabeled data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Multi-class imbalanced semi-supervised learning from streams through online ensembles\",\"authors\":\"P. Vafaie, H. Viktor, W. Michalowski\",\"doi\":\"10.1109/ICDMW51313.2020.00124\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-class imbalance, in which the rates of instances in the various classes differ substantially, poses a major challenge when learning from evolving streams. In this setting, minority class instances may arrive infrequently and in bursts, making accurate model construction problematic. Further, skewed streams are not only susceptible to concept drifts, but class labels may also be absent, expensive to obtain, or only arrive after some delay. The combined effects of multi-class skew, concept drift and semi-supervised learning have received limited attention in the online learning community. In this paper, we introduce a multi-class online ensemble algorithm that is suitable for learning in such settings. Specifically, our algorithm uses sampling with replacement while dynamically increasing the weights of underrepresented classes based on recall in order to produce models that benefit all classes. Our approach addresses the potential lack of labels by incorporating a self-training semi-supervised learning method for labeling instances. Our experimental results show that our online ensemble performs well against multi-class imbalanced data containing concept drifts. In addition, our algorithm produces accurate predictions, even in the presence of unlabeled data.\",\"PeriodicalId\":426846,\"journal\":{\"name\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW51313.2020.00124\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

多类不平衡,即不同类的实例率存在很大差异,这对从不断发展的流中学习构成了重大挑战。在这种情况下,少数类实例可能不经常出现,并且会突然出现,这使得准确的模型构建成为问题。此外,扭曲的流不仅容易受到概念漂移的影响,而且类标签也可能不存在,难以获得,或者在一段时间后才到达。多类偏差、概念漂移和半监督学习的综合效应在在线学习界受到的关注有限。在本文中,我们介绍了一种适合在这种情况下学习的多类在线集成算法。具体来说,我们的算法使用带有替换的抽样,同时基于召回动态地增加未被充分代表的类的权重,以产生对所有类都有利的模型。我们的方法通过结合标记实例的自训练半监督学习方法来解决潜在的标签缺乏问题。实验结果表明,我们的在线集成对包含概念漂移的多类不平衡数据有很好的处理效果。此外,即使存在未标记的数据,我们的算法也能产生准确的预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-class imbalanced semi-supervised learning from streams through online ensembles
Multi-class imbalance, in which the rates of instances in the various classes differ substantially, poses a major challenge when learning from evolving streams. In this setting, minority class instances may arrive infrequently and in bursts, making accurate model construction problematic. Further, skewed streams are not only susceptible to concept drifts, but class labels may also be absent, expensive to obtain, or only arrive after some delay. The combined effects of multi-class skew, concept drift and semi-supervised learning have received limited attention in the online learning community. In this paper, we introduce a multi-class online ensemble algorithm that is suitable for learning in such settings. Specifically, our algorithm uses sampling with replacement while dynamically increasing the weights of underrepresented classes based on recall in order to produce models that benefit all classes. Our approach addresses the potential lack of labels by incorporating a self-training semi-supervised learning method for labeling instances. Our experimental results show that our online ensemble performs well against multi-class imbalanced data containing concept drifts. In addition, our algorithm produces accurate predictions, even in the presence of unlabeled data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Synthetic Data by Principal Component Analysis Deep Contextualized Word Embedding for Text-based Online User Profiling to Detect Social Bots on Twitter Integration of Fuzzy and Deep Learning in Three-Way Decisions Mining Heterogeneous Data for Formulation Design Restructuring of Hoeffding Trees for Trapezoidal Data Streams
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1