有趣的关联规则挖掘，从分布式环境下的大销售数据中检测出一致和不一致的规则

Future Computing and Informatics Journal Pub Date : 2017-06-01 DOI:10.1016/j.fcij.2017.04.003

Dinesh J. Prajapati , Sanjay Garg , N.C. Chauhan

{"title":"有趣的关联规则挖掘，从分布式环境下的大销售数据中检测出一致和不一致的规则","authors":"Dinesh J. Prajapati , Sanjay Garg , N.C. Chauhan","doi":"10.1016/j.fcij.2017.04.003","DOIUrl":null,"url":null,"abstract":"<div><p>Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly, the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Algorithm (CDA) and Fast Distributed Mining (FDM) algorithms. The association generated from frequent itemsets are too large that it becomes complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD) algorithm is proposed to detect the consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.</p></div>","PeriodicalId":100561,"journal":{"name":"Future Computing and Informatics Journal","volume":"2 1","pages":"Pages 19-30"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.fcij.2017.04.003","citationCount":"56","resultStr":"{\"title\":\"Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment\",\"authors\":\"Dinesh J. Prajapati , Sanjay Garg , N.C. Chauhan\",\"doi\":\"10.1016/j.fcij.2017.04.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly, the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Algorithm (CDA) and Fast Distributed Mining (FDM) algorithms. The association generated from frequent itemsets are too large that it becomes complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD) algorithm is proposed to detect the consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.</p></div>\",\"PeriodicalId\":100561,\"journal\":{\"name\":\"Future Computing and Informatics Journal\",\"volume\":\"2 1\",\"pages\":\"Pages 19-30\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.fcij.2017.04.003\",\"citationCount\":\"56\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Computing and Informatics Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2314728816300460\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Computing and Informatics Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2314728816300460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 56

摘要

如今，从大数据中挖掘有趣模式的需求越来越大。使用传统方法分析如此大量的数据的过程在计算上是非常复杂的任务。本文的总体目的有两个方面。首先，本文提出了一种从分布环境中的销售数据中识别一致和不一致关联规则的新方法。其次，通过将计算应用于多节点集群，克服了单一计算系统的主要内存瓶颈和计算时间开销。该方法首先利用现有的分布式频繁模式挖掘算法提取每个区域的频繁项集。本文还比较了基于Mapreduce的频繁模式挖掘算法与计数分布算法(CDA)和快速分布挖掘(FDM)算法的时间效率。频繁项集产生的关联太大，分析起来很复杂。为此，提出了基于Mapreduce的一致和不一致规则检测算法(MR-CIRD)，从大数据中检测一致和不一致规则，为领域专家提供有用和可操作的知识。这些精简的有趣规则也为更好的营销策略提供了有用的知识。基于不同的兴趣度度量和实验结果，对提取的一致和不一致规则进行评估和比较，从而得出最终结论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment

Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly, the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Algorithm (CDA) and Fast Distributed Mining (FDM) algorithms. The association generated from frequent itemsets are too large that it becomes complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD) algorithm is proposed to detect the consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Computing and Informatics Journal

自引率

0.00%

发文量