Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints

Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, M. Weidlich
{"title":"Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints","authors":"Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, M. Weidlich","doi":"10.4230/LIPIcs.ICDT.2022.18","DOIUrl":null,"url":null,"abstract":"We introduce subsequence-queries with wildcards and gap-size constraints (swg-queries, for short) as a tool for querying event traces. An swg-query q is given by a string s over an alphabet of variables and types, a global window size w , and a tuple c = (( c − 1 , c +1 ) , ( c − 2 , c +2 ) , . . . , ( c −| s |− 1 , c + | s |− 1 )) of local gap-size constraints over N × ( N ∪ {∞} ). The query q matches in a trace t (i. e., a sequence of types) if the variables can uniformly be substituted by types such that the resulting string occurs in t as a subsequence that spans an area of length at most w , and the i th gap of the subsequence (i. e., the distance between the i th and ( i +1) th position of the subsequence) has length at least c − i and at most c + i . We formalise and investigate the task of discovering an swg-query that describes best the traces from a given sample S of traces, and we present an algorithm solving this task. As a central component, our algorithm repeatedly solves the matching problem (i. e., deciding whether a given query q matches in a given trace t ), which is an NP-complete problem (in combined complexity). Hence, the matching problem is of special interest in the context of query discovery, and we therefore subject it to a detailed (parameterised) complexity analysis to identify tractable subclasses, which lead to tractable subclasses of the discovery problem as well. We complement this by a reduction proving Proof sketch. A natural brute-force approach is as follows: Upon input of an swg-query q = ( s, w, c ) and a trace t , we enumerate all mappings π : repvars ( q ) → types ( t ), and for each such mapping, we construct a regular expression R π that describes all traces t ′ for which there exists a substitution µ : vars ( q ) ∪ Γ → Γ such that µ is an extension of π and µ ( s ) ≼ e t ′ for some embedding e that satisfies w and c . Then, we only have to check for each of these mappings π , if the regular expression R π matches in t . Another approach is to enumerate all embeddings e : [ | s | ] → [ | t | ] that satisfy w and c and check for each such embedding e whether µ ( s ) ≼ e t for some substitution µ (which can be done in time O( | s | ), since µ must satisfy µ ( s ) = t [ e (1)] t [ e (2)] . . . t [ e ( | s | )]). From these two algorithms and the obvious dependencies between the parameters, we can directly conclude the statements of the theorem.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.ICDT.2022.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

We introduce subsequence-queries with wildcards and gap-size constraints (swg-queries, for short) as a tool for querying event traces. An swg-query q is given by a string s over an alphabet of variables and types, a global window size w , and a tuple c = (( c − 1 , c +1 ) , ( c − 2 , c +2 ) , . . . , ( c −| s |− 1 , c + | s |− 1 )) of local gap-size constraints over N × ( N ∪ {∞} ). The query q matches in a trace t (i. e., a sequence of types) if the variables can uniformly be substituted by types such that the resulting string occurs in t as a subsequence that spans an area of length at most w , and the i th gap of the subsequence (i. e., the distance between the i th and ( i +1) th position of the subsequence) has length at least c − i and at most c + i . We formalise and investigate the task of discovering an swg-query that describes best the traces from a given sample S of traces, and we present an algorithm solving this task. As a central component, our algorithm repeatedly solves the matching problem (i. e., deciding whether a given query q matches in a given trace t ), which is an NP-complete problem (in combined complexity). Hence, the matching problem is of special interest in the context of query discovery, and we therefore subject it to a detailed (parameterised) complexity analysis to identify tractable subclasses, which lead to tractable subclasses of the discovery problem as well. We complement this by a reduction proving Proof sketch. A natural brute-force approach is as follows: Upon input of an swg-query q = ( s, w, c ) and a trace t , we enumerate all mappings π : repvars ( q ) → types ( t ), and for each such mapping, we construct a regular expression R π that describes all traces t ′ for which there exists a substitution µ : vars ( q ) ∪ Γ → Γ such that µ is an extension of π and µ ( s ) ≼ e t ′ for some embedding e that satisfies w and c . Then, we only have to check for each of these mappings π , if the regular expression R π matches in t . Another approach is to enumerate all embeddings e : [ | s | ] → [ | t | ] that satisfy w and c and check for each such embedding e whether µ ( s ) ≼ e t for some substitution µ (which can be done in time O( | s | ), since µ must satisfy µ ( s ) = t [ e (1)] t [ e (2)] . . . t [ e ( | s | )]). From these two algorithms and the obvious dependencies between the parameters, we can directly conclude the statements of the theorem.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从跟踪中发现事件查询:为带有通配符和间隙大小约束的子序列查询奠定基础
我们引入带有通配符和间隙大小约束的子查询(简称swg查询)作为查询事件跟踪的工具。swg查询q由变量和类型字母表上的字符串s、全局窗口大小w和元组c = ((c−1,c +1), (c−2,c +2),…, (c−| s |−1,c + | s |−1))在N × (N∪{∞})上的局部间隙大小约束。查询问匹配跟踪t(即一系列类型)如果一致可以替换的变量类型,这样生成的字符串出现在t作为子序列跨度的长度最多w,我th差距的子序列(即我th和之间的距离(i + 1) th子序列的位置)长度至少c−我最多和c +。我们形式化并研究了从给定的轨迹样本S中发现最能描述轨迹的swg查询的任务,并提出了解决该任务的算法。作为中心组件,我们的算法反复解决匹配问题(即决定给定查询q在给定跟踪t中是否匹配),这是一个np完全问题(组合复杂度)。因此,匹配问题在查询发现的上下文中特别重要,因此我们对其进行了详细的(参数化的)复杂性分析,以识别可处理的子类,这也会导致发现问题的可处理子类。我们补充了一个简化证明的证明草图。自然蛮力方法如下:在输入一个swg-query q = (s, w c)和跟踪t,我们列举所有映射π:repvars (q)→类型(t),对于每一次这样的映射,我们构造一个正则表达式Rπ,描述了所有的痕迹t’的存在一个替换µ:var (q)∪Γ→Γ这样µ是π的扩展和µ(s)≼e t '对于一些嵌入满足w e和c。然后,我们只需要检查每一个映射,如果正则表达式R在t中匹配。另一种方法是枚举满足w和c的所有嵌入e: [| s |]→[| t |],并检查每个这样的嵌入e是否对某些替换μ(这可以在时间O(| s |)中完成),因为µ必须满足µ(s) = t [e (1)] t [e(2)]。T [e (| s |)])。从这两种算法和参数之间明显的依赖关系,我们可以直接得出定理的表述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs A Simple Algorithm for Consistent Query Answering under Primary Keys Size Bounds and Algorithms for Conjunctive Regular Path Queries Compact Data Structures Meet Databases (Invited Talk) Enumerating Subgraphs of Constant Sizes in External Memory
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1