直接内存不足分布式并行频繁模式挖掘

BigMine '13 Pub Date : 2013-08-11 DOI:10.1145/2501221.2501229

Z. Rong, J. D. Knijf

{"title":"直接内存不足分布式并行频繁模式挖掘","authors":"Z. Rong, J. D. Knijf","doi":"10.1145/2501221.2501229","DOIUrl":null,"url":null,"abstract":"Frequent itemset mining is a well studied and important problem in the datamining community. An abundance of different mining algorithms exists, all with different flavor and characteristics, but almost all suffer from two major shortcomings. First, in general frequent itemset mining algorithms perform exhaustive search over a huge pattern space. Second, most algorithms assume that the input data fits into main memory. The first problem was recently tackled in the work of [2], by direct sampling the required number of patterns over the pattern space. This paper extends the direct sampling approach by casting the algorithm into the MapReduce framework, effectively ceasing the memory requirements that the data should fit into main memory. The results show that the algorithm scales well for large data sets, while the memory requirements are solely dependent on the required number of patterns in the output.","PeriodicalId":441216,"journal":{"name":"BigMine '13","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Direct out-of-memory distributed parallel frequent pattern mining\",\"authors\":\"Z. Rong, J. D. Knijf\",\"doi\":\"10.1145/2501221.2501229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Frequent itemset mining is a well studied and important problem in the datamining community. An abundance of different mining algorithms exists, all with different flavor and characteristics, but almost all suffer from two major shortcomings. First, in general frequent itemset mining algorithms perform exhaustive search over a huge pattern space. Second, most algorithms assume that the input data fits into main memory. The first problem was recently tackled in the work of [2], by direct sampling the required number of patterns over the pattern space. This paper extends the direct sampling approach by casting the algorithm into the MapReduce framework, effectively ceasing the memory requirements that the data should fit into main memory. The results show that the algorithm scales well for large data sets, while the memory requirements are solely dependent on the required number of patterns in the output.\",\"PeriodicalId\":441216,\"journal\":{\"name\":\"BigMine '13\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BigMine '13\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2501221.2501229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BigMine '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2501221.2501229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

频繁项集挖掘是数据挖掘界研究的一个重要问题。存在大量不同的挖掘算法，它们都具有不同的风格和特征，但几乎所有算法都存在两个主要缺点。首先，通常频繁项集挖掘算法在巨大的模式空间中执行穷举搜索。其次，大多数算法假设输入数据适合主存储器。第一个问题最近在[2]的工作中得到了解决，通过在模式空间上直接采样所需数量的模式。本文扩展了直接抽样的方法，将算法转换到MapReduce框架中，有效地停止了数据应该放在主存中的内存要求。结果表明，该算法适用于大型数据集，而内存需求仅取决于输出中所需模式的数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Direct out-of-memory distributed parallel frequent pattern mining

Frequent itemset mining is a well studied and important problem in the datamining community. An abundance of different mining algorithms exists, all with different flavor and characteristics, but almost all suffer from two major shortcomings. First, in general frequent itemset mining algorithms perform exhaustive search over a huge pattern space. Second, most algorithms assume that the input data fits into main memory. The first problem was recently tackled in the work of [2], by direct sampling the required number of patterns over the pattern space. This paper extends the direct sampling approach by casting the algorithm into the MapReduce framework, effectively ceasing the memory requirements that the data should fit into main memory. The results show that the algorithm scales well for large data sets, while the memory requirements are solely dependent on the required number of patterns in the output.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BigMine '13

自引率

0.00%

发文量

期刊最新文献

Forecasting building occupancy using sensor network data Maintaining connected components for infinite graph streams Soft-CsGDT: soft cost-sensitive Gaussian decision tree for cost-sensitive classification of data streams Data-driven study of urban infrastructure to enable city-wide ubiquitous computing Big & personal: data and models behind netflix recommendations