{"title":"The MOA Data Stream Mining Tool: A Mid-Term Report","authors":"B. Pfahringer","doi":"10.1145/2542652.2542660","DOIUrl":null,"url":null,"abstract":"Stream mining research has seen an impressive increase in the number of publications over the last few years. It borrows heavily from more established research fields in Machine Learning, especially from so-called online learning as well as from time series analysis. It fuses ideas and methods of both these fields and extends them in unique new ways. Stream mining needs to process potentially infinite streams of data, where the source, which generates the data, may change over time, or in other words, the source is nonstationary. Most standard learning approaches assume a stationary data source. Data may also include categorical features, something time series analysis cannot cope with that well. Additionally to models needing to be adapted continuously, they also need to be able to predict at any time, and usually cannot afford to spend much time or memory on every single example. So polynomial behaviour is not good enough, usually logarithmic complexity per example is a strict upper limit on computational resources. The MOA (Massive Online Analysis) stream mining software suite was started already in 2005, and the first open source release took place in 2007. In this talk I will first very briefly present MOA’s history, and then explain and discuss the challenges stream mining faces, and how MOA tries to address them. Finally, I will also focus on current shortcomings, and suggest ways of addressing them. As this last part is the most useful one in terms of further research, I will briefly outline these points here.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MLSDA '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2542652.2542660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Stream mining research has seen an impressive increase in the number of publications over the last few years. It borrows heavily from more established research fields in Machine Learning, especially from so-called online learning as well as from time series analysis. It fuses ideas and methods of both these fields and extends them in unique new ways. Stream mining needs to process potentially infinite streams of data, where the source, which generates the data, may change over time, or in other words, the source is nonstationary. Most standard learning approaches assume a stationary data source. Data may also include categorical features, something time series analysis cannot cope with that well. Additionally to models needing to be adapted continuously, they also need to be able to predict at any time, and usually cannot afford to spend much time or memory on every single example. So polynomial behaviour is not good enough, usually logarithmic complexity per example is a strict upper limit on computational resources. The MOA (Massive Online Analysis) stream mining software suite was started already in 2005, and the first open source release took place in 2007. In this talk I will first very briefly present MOA’s history, and then explain and discuss the challenges stream mining faces, and how MOA tries to address them. Finally, I will also focus on current shortcomings, and suggest ways of addressing them. As this last part is the most useful one in terms of further research, I will briefly outline these points here.