基于子序列约束的频繁序列挖掘的统一框架

ACM Transactions on Database Systems (TODS) Pub Date : 2019-06-05 DOI:10.1145/3321486

Kaustubh Beedkar, Rainer Gemulla, W. Martens

{"title":"基于子序列约束的频繁序列挖掘的统一框架","authors":"Kaustubh Beedkar, Rainer Gemulla, W. Martens","doi":"10.1145/3321486","DOIUrl":null,"url":null,"abstract":"Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive “pattern expressions” to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite-state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms—although more general—are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"11 Suppl 3 1","pages":"1 - 42"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Unified Framework for Frequent Sequence Mining with Subsequence Constraints\",\"authors\":\"Kaustubh Beedkar, Rainer Gemulla, W. Martens\",\"doi\":\"10.1145/3321486\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive “pattern expressions” to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite-state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms—although more general—are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.\",\"PeriodicalId\":6983,\"journal\":{\"name\":\"ACM Transactions on Database Systems (TODS)\",\"volume\":\"11 Suppl 3 1\",\"pages\":\"1 - 42\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Database Systems (TODS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3321486\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems (TODS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3321486","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

频繁的序列挖掘方法通常使用约束来控制应该挖掘哪些子序列。各种这样的子序列约束已经在文献中进行了研究，包括长度、间隙、跨度、正则表达式和层次约束。在本文中，我们展示了许多子序列约束(包括文献中考虑的约束和超出这些约束的约束)可以统一到一个框架中。统一处理允许研究人员联合研究多种类型的子序列约束(而不是单独研究每种约束)，有助于提高模式挖掘系统对从业者的可用性。更详细地说，我们提出了一组简单直观的“模式表达式”来描述子序列约束，并探索了在这种一般约束下有效挖掘频繁子序列的算法。我们的算法将模式表达式转换为简洁的有限状态传感器，我们将其用作计算模型，并以适合频繁序列挖掘的方式模拟这些传感器。我们对真实世界数据集的实验研究表明，我们的算法(虽然更通用)是高效的，并且当用于具有文献研究的先验约束的序列挖掘时，与最先进的专业方法相竞争(在某些情况下优于)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Unified Framework for Frequent Sequence Mining with Subsequence Constraints

Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive “pattern expressions” to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite-state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms—although more general—are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Database Systems (TODS)

自引率

0.00%

发文量

期刊最新文献

On Finding Rank Regret Representatives Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration Persistent Summaries Influence Maximization Revisited: Efficient Sampling with Bound Tightened The Space-Efficient Core of Vadalog