The MOA Data Stream Mining Tool: A Mid-Term Report

MLSDA '13 Pub Date : 2013-12-02 DOI:10.1145/2542652.2542660

B. Pfahringer

{"title":"The MOA Data Stream Mining Tool: A Mid-Term Report","authors":"B. Pfahringer","doi":"10.1145/2542652.2542660","DOIUrl":null,"url":null,"abstract":"Stream mining research has seen an impressive increase in the number of publications over the last few years. It borrows heavily from more established research fields in Machine Learning, especially from so-called online learning as well as from time series analysis. It fuses ideas and methods of both these fields and extends them in unique new ways. Stream mining needs to process potentially infinite streams of data, where the source, which generates the data, may change over time, or in other words, the source is nonstationary. Most standard learning approaches assume a stationary data source. Data may also include categorical features, something time series analysis cannot cope with that well. Additionally to models needing to be adapted continuously, they also need to be able to predict at any time, and usually cannot afford to spend much time or memory on every single example. So polynomial behaviour is not good enough, usually logarithmic complexity per example is a strict upper limit on computational resources. The MOA (Massive Online Analysis) stream mining software suite was started already in 2005, and the first open source release took place in 2007. In this talk I will first very briefly present MOA’s history, and then explain and discuss the challenges stream mining faces, and how MOA tries to address them. Finally, I will also focus on current shortcomings, and suggest ways of addressing them. As this last part is the most useful one in terms of further research, I will briefly outline these points here.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MLSDA '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2542652.2542660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Stream mining research has seen an impressive increase in the number of publications over the last few years. It borrows heavily from more established research fields in Machine Learning, especially from so-called online learning as well as from time series analysis. It fuses ideas and methods of both these fields and extends them in unique new ways. Stream mining needs to process potentially infinite streams of data, where the source, which generates the data, may change over time, or in other words, the source is nonstationary. Most standard learning approaches assume a stationary data source. Data may also include categorical features, something time series analysis cannot cope with that well. Additionally to models needing to be adapted continuously, they also need to be able to predict at any time, and usually cannot afford to spend much time or memory on every single example. So polynomial behaviour is not good enough, usually logarithmic complexity per example is a strict upper limit on computational resources. The MOA (Massive Online Analysis) stream mining software suite was started already in 2005, and the first open source release took place in 2007. In this talk I will first very briefly present MOA’s history, and then explain and discuss the challenges stream mining faces, and how MOA tries to address them. Finally, I will also focus on current shortcomings, and suggest ways of addressing them. As this last part is the most useful one in terms of further research, I will briefly outline these points here.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MOA数据流挖掘工具:中期报告

在过去的几年里，流挖掘研究的出版物数量有了令人印象深刻的增长。它大量借鉴了机器学习中更成熟的研究领域，尤其是所谓的在线学习和时间序列分析。它融合了这两个领域的思想和方法，并以独特的新方式扩展它们。流挖掘需要处理潜在的无限数据流，其中生成数据的源可能会随时间变化，或者换句话说，源是非平稳的。大多数标准的学习方法假设一个固定的数据源。数据还可能包含分类特征，这是时间序列分析无法很好地处理的。除了需要不断调整模型之外，它们还需要能够随时进行预测，并且通常不能在每个单独的示例上花费太多时间或内存。所以多项式行为是不够好的，通常每个例子的对数复杂度是计算资源的严格上限。MOA(大规模在线分析)流挖掘软件套件早在2005年就开始了，第一个开源版本是在2007年发布的。在这次演讲中，我将首先简要介绍MOA的历史，然后解释和讨论流采矿面临的挑战，以及MOA如何尝试解决这些挑战。最后，我还将重点讨论当前的不足之处，并提出解决这些问题的方法。由于最后一部分对进一步研究最有用，我将在这里简要概述这些要点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

MLSDA '13

自引率

0.00%

发文量

期刊最新文献

Light-weight Online Predictive Data Aggregation for Wireless Sensor Networks Predicting Petroleum Reservoir Properties from Downhole Sensor Data using an Ensemble Model of Neural Networks Ensemble Feature Ranking for Shellfish Farm Closure Cause Identification The MOA Data Stream Mining Tool: A Mid-Term Report From Association Analysis to Causal Discovery