Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data

Yongxin Tong, Lei Chen, Bolin Ding
{"title":"Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data","authors":"Yongxin Tong, Lei Chen, Bolin Ding","doi":"10.1109/ICDE.2012.51","DOIUrl":null,"url":null,"abstract":"In recent years, many new applications, such as sensor network monitoring and moving object search, show a growing amount of importance of uncertain data management and mining. In this paper, we study the problem of discovering threshold-based frequent closed item sets over probabilistic data. Frequent item set mining over probabilistic database has attracted much attention recently. However, existing solutions may lead an exponential number of results due to the downward closure property over probabilistic data. Moreover, it is hard to directly extend the successful experiences from mining exact data to a probabilistic environment due to the inherent uncertainty of data. Thus, in order to obtain a reasonable result set with small size, we study discovering frequent closed item sets over probabilistic data. We prove that even a sub-problem of this problem, computing the frequent closed probability of an item set, is #P-Hard. Therefore, we develop an efficient mining algorithm based on depth-first search strategy to obtain all probabilistic frequent closed item sets. To reduce the search space and avoid redundant computation, we further design several probabilistic pruning and bounding techniques. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 28th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2012.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 72

Abstract

In recent years, many new applications, such as sensor network monitoring and moving object search, show a growing amount of importance of uncertain data management and mining. In this paper, we study the problem of discovering threshold-based frequent closed item sets over probabilistic data. Frequent item set mining over probabilistic database has attracted much attention recently. However, existing solutions may lead an exponential number of results due to the downward closure property over probabilistic data. Moreover, it is hard to directly extend the successful experiences from mining exact data to a probabilistic environment due to the inherent uncertainty of data. Thus, in order to obtain a reasonable result set with small size, we study discovering frequent closed item sets over probabilistic data. We prove that even a sub-problem of this problem, computing the frequent closed probability of an item set, is #P-Hard. Therefore, we develop an efficient mining algorithm based on depth-first search strategy to obtain all probabilistic frequent closed item sets. To reduce the search space and avoid redundant computation, we further design several probabilistic pruning and bounding techniques. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在概率数据上发现基于阈值的频繁封闭项目集
近年来,许多新的应用,如传感器网络监测和运动目标搜索,显示不确定数据的管理和挖掘越来越重要。本文研究了概率数据上基于阈值的频繁闭项集的发现问题。基于概率数据库的频繁项集挖掘是近年来备受关注的问题。然而,由于对概率数据的向下闭合性,现有的解可能导致指数数量的结果。此外,由于数据固有的不确定性,很难将挖掘精确数据的成功经验直接推广到概率环境中。因此,为了得到一个合理的小尺寸结果集,我们研究了在概率数据上发现频繁闭项集的问题。我们证明了这个问题的一个子问题,即计算项目集的频繁闭概率,是#P-Hard。因此,我们开发了一种基于深度优先搜索策略的高效挖掘算法,以获得所有概率频繁闭项集。为了减少搜索空间和避免冗余计算,我们进一步设计了几种概率剪枝和边界技术。最后,我们通过大量的实验验证了所提出方法的有效性和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Keyword Query Reformulation on Structured Data Accuracy-Aware Uncertain Stream Databases Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks Project Daytona: Data Analytics as a Cloud Service Automatic Extraction of Structured Web Data with Domain Knowledge
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1