The MOA Data Stream Mining Tool: A Mid-Term Report

MLSDA '13 Pub Date : 2013-12-02 DOI:10.1145/2542652.2542660
B. Pfahringer
{"title":"The MOA Data Stream Mining Tool: A Mid-Term Report","authors":"B. Pfahringer","doi":"10.1145/2542652.2542660","DOIUrl":null,"url":null,"abstract":"Stream mining research has seen an impressive increase in the number of publications over the last few years. It borrows heavily from more established research fields in Machine Learning, especially from so-called online learning as well as from time series analysis. It fuses ideas and methods of both these fields and extends them in unique new ways. Stream mining needs to process potentially infinite streams of data, where the source, which generates the data, may change over time, or in other words, the source is nonstationary. Most standard learning approaches assume a stationary data source. Data may also include categorical features, something time series analysis cannot cope with that well. Additionally to models needing to be adapted continuously, they also need to be able to predict at any time, and usually cannot afford to spend much time or memory on every single example. So polynomial behaviour is not good enough, usually logarithmic complexity per example is a strict upper limit on computational resources. The MOA (Massive Online Analysis) stream mining software suite was started already in 2005, and the first open source release took place in 2007. In this talk I will first very briefly present MOA’s history, and then explain and discuss the challenges stream mining faces, and how MOA tries to address them. Finally, I will also focus on current shortcomings, and suggest ways of addressing them. As this last part is the most useful one in terms of further research, I will briefly outline these points here.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MLSDA '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2542652.2542660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Stream mining research has seen an impressive increase in the number of publications over the last few years. It borrows heavily from more established research fields in Machine Learning, especially from so-called online learning as well as from time series analysis. It fuses ideas and methods of both these fields and extends them in unique new ways. Stream mining needs to process potentially infinite streams of data, where the source, which generates the data, may change over time, or in other words, the source is nonstationary. Most standard learning approaches assume a stationary data source. Data may also include categorical features, something time series analysis cannot cope with that well. Additionally to models needing to be adapted continuously, they also need to be able to predict at any time, and usually cannot afford to spend much time or memory on every single example. So polynomial behaviour is not good enough, usually logarithmic complexity per example is a strict upper limit on computational resources. The MOA (Massive Online Analysis) stream mining software suite was started already in 2005, and the first open source release took place in 2007. In this talk I will first very briefly present MOA’s history, and then explain and discuss the challenges stream mining faces, and how MOA tries to address them. Finally, I will also focus on current shortcomings, and suggest ways of addressing them. As this last part is the most useful one in terms of further research, I will briefly outline these points here.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MOA数据流挖掘工具:中期报告
在过去的几年里,流挖掘研究的出版物数量有了令人印象深刻的增长。它大量借鉴了机器学习中更成熟的研究领域,尤其是所谓的在线学习和时间序列分析。它融合了这两个领域的思想和方法,并以独特的新方式扩展它们。流挖掘需要处理潜在的无限数据流,其中生成数据的源可能会随时间变化,或者换句话说,源是非平稳的。大多数标准的学习方法假设一个固定的数据源。数据还可能包含分类特征,这是时间序列分析无法很好地处理的。除了需要不断调整模型之外,它们还需要能够随时进行预测,并且通常不能在每个单独的示例上花费太多时间或内存。所以多项式行为是不够好的,通常每个例子的对数复杂度是计算资源的严格上限。MOA(大规模在线分析)流挖掘软件套件早在2005年就开始了,第一个开源版本是在2007年发布的。在这次演讲中,我将首先简要介绍MOA的历史,然后解释和讨论流采矿面临的挑战,以及MOA如何尝试解决这些挑战。最后,我还将重点讨论当前的不足之处,并提出解决这些问题的方法。由于最后一部分对进一步研究最有用,我将在这里简要概述这些要点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Light-weight Online Predictive Data Aggregation for Wireless Sensor Networks Predicting Petroleum Reservoir Properties from Downhole Sensor Data using an Ensemble Model of Neural Networks Ensemble Feature Ranking for Shellfish Farm Closure Cause Identification The MOA Data Stream Mining Tool: A Mid-Term Report From Association Analysis to Causal Discovery
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1