Huisheng Zhu, Peng Wang, Xianmang He, Yujiao Li, Wei Wang, Baile Shi
{"title":"Efficient Episode Mining with Minimal and Non-overlapping Occurrences","authors":"Huisheng Zhu, Peng Wang, Xianmang He, Yujiao Li, Wei Wang, Baile Shi","doi":"10.1109/ICDM.2010.25","DOIUrl":null,"url":null,"abstract":"Frequent serial episodes within an event sequence describe the behavior of users or systems about the application. Existing mining algorithms calculate the frequency of an episode based on overlapping or non-minimal occurrences, which is prone to over-counting the support of long episodes or poorly characterizing the followed-by-closely relationship over event types. In addition, due to utilizing the Apriori-style level wise approach, these algorithms are computationally expensive. In this paper, we propose an efficient algorithm MANEPI (Minimal And Non-overlapping EPIsode) for mining more interesting frequent episodes within the given event sequence. The proposed frequency measure takes both minimal and non-overlapping occurrences of an episode into consideration and ensures better mining quality. The introduced depth first search strategy with the Apriori Property for performing episode growth greatly improves the efficiency of mining long episodes because of scanning the given sequence only once and not generating candidate episodes. Moreover, an optimization technique is presented to narrow down search space and speed up the mining process. Experimental evaluation on both synthetic and real-world datasets demonstrates that our algorithms are more efficient and effective.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"130 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2010.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Frequent serial episodes within an event sequence describe the behavior of users or systems about the application. Existing mining algorithms calculate the frequency of an episode based on overlapping or non-minimal occurrences, which is prone to over-counting the support of long episodes or poorly characterizing the followed-by-closely relationship over event types. In addition, due to utilizing the Apriori-style level wise approach, these algorithms are computationally expensive. In this paper, we propose an efficient algorithm MANEPI (Minimal And Non-overlapping EPIsode) for mining more interesting frequent episodes within the given event sequence. The proposed frequency measure takes both minimal and non-overlapping occurrences of an episode into consideration and ensures better mining quality. The introduced depth first search strategy with the Apriori Property for performing episode growth greatly improves the efficiency of mining long episodes because of scanning the given sequence only once and not generating candidate episodes. Moreover, an optimization technique is presented to narrow down search space and speed up the mining process. Experimental evaluation on both synthetic and real-world datasets demonstrates that our algorithms are more efficient and effective.
事件序列中频繁出现的连续事件描述了用户或系统对应用程序的行为。现有的挖掘算法基于重叠或非最小的出现来计算事件的频率,这很容易过度计算长事件的支持,或者在事件类型上对紧随其后的密切关系的描述不佳。此外,由于使用apriori风格的分层方法,这些算法在计算上是昂贵的。在本文中,我们提出了一种高效的算法MANEPI (Minimal And non -overlap EPIsode),用于挖掘给定事件序列中更有趣的频繁事件。建议的频率度量考虑了事件的最小发生和非重叠发生,并确保了更好的采矿质量。引入深度优先搜索策略,利用Apriori属性进行集增长,极大地提高了挖掘长集的效率,因为它只扫描给定序列一次,不生成候选集。在此基础上,提出了一种优化技术来缩小搜索空间,加快挖掘速度。在合成数据集和真实数据集上的实验评估表明,我们的算法更加高效。