首页 > 最新文献

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory最新文献

英文 中文
A Formal Framework for Complex Event Processing 复杂事件处理的形式化框架
Alejandro Grez, Cristian Riveros, M. Ugarte
Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real-time. CEP finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. However, existing CEP languages lack from a clear semantics, making them hard to understand and generalize. Moreover, there are no general techniques for evaluating CEP query languages with clear performance guarantees. In this paper we embark on the task of giving a rigorous and efficient framework to CEP. We propose a formal language for specifying complex events, called CEL, that contains the main features used in the literature and has a denotational and compositional semantics. We also formalize the so-called selection strategies, which had only been presented as by-design extensions to existing frameworks. With a well-defined semantics at hand, we discuss how to efficiently process complex events by evaluating CEL formulas with unary filters. We start by studying the syntactical properties of CEL and propose rewriting optimization techniques for simplifying the evaluation of formulas. Then, we introduce a formal computational model for CEP, called complex event automata (CEA), and study how to compile CEL formulas with unary filters into CEA. Furthermore, we provide efficient algorithms for evaluating CEA over event streams using constant time per event followed by constant-delay enumeration of the results. Finally, we gather the main results of this work to present an efficient and declarative framework for CEP.
复杂事件处理(CEP)已经成为需要实时处理和关联分布式数据源的技术的统一领域。CEP在不同的领域都有应用,这就产生了大量表达和处理复杂事件的建议。然而,现有的CEP语言缺乏清晰的语义,难以理解和泛化。此外,没有通用的技术来评估具有明确性能保证的CEP查询语言。在本文中,我们着手为CEP提供一个严格而有效的框架。我们提出了一种用于指定复杂事件的形式化语言,称为CEL,它包含了文献中使用的主要特征,并具有指称和组合语义。我们还形式化了所谓的选择策略,它只是作为对现有框架的设计扩展而呈现的。有了定义良好的语义,我们讨论了如何通过使用一元过滤器评估CEL公式来有效地处理复杂事件。我们首先研究了CEL的语法特性,并提出了简化公式计算的重写优化技术。然后,引入了复杂事件自动机(CEA)的形式化计算模型,并研究了如何将带有一元滤波器的CEL公式编译成复杂事件自动机。此外,我们提供了有效的算法来评估事件流上的CEA,使用每个事件的恒定时间,然后是结果的恒定延迟枚举。最后,我们收集了这项工作的主要结果,为CEP提供了一个有效的声明性框架。
{"title":"A Formal Framework for Complex Event Processing","authors":"Alejandro Grez, Cristian Riveros, M. Ugarte","doi":"10.4230/LIPICS.ICDT.2019.5","DOIUrl":"https://doi.org/10.4230/LIPICS.ICDT.2019.5","url":null,"abstract":"Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real-time. CEP finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. However, existing CEP languages lack from a clear semantics, making them hard to understand and generalize. Moreover, there are no general techniques for evaluating CEP query languages with clear performance guarantees. In this paper we embark on the task of giving a rigorous and efficient framework to CEP. We propose a formal language for specifying complex events, called CEL, that contains the main features used in the literature and has a denotational and compositional semantics. We also formalize the so-called selection strategies, which had only been presented as by-design extensions to existing frameworks. With a well-defined semantics at hand, we discuss how to efficiently process complex events by evaluating CEL formulas with unary filters. We start by studying the syntactical properties of CEL and propose rewriting optimization techniques for simplifying the evaluation of formulas. Then, we introduce a formal computational model for CEP, called complex event automata (CEA), and study how to compile CEL formulas with unary filters into CEA. Furthermore, we provide efficient algorithms for evaluating CEA over event streams using constant time per event followed by constant-delay enumeration of the results. Finally, we gather the main results of this work to present an efficient and declarative framework for CEP.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76726950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Parallel-Correctness and Parallel-Boundedness for Datalog Programs 数据程序的并行正确性和并行有界性
F. Neven, T. Schwentick, Christopher Spinrath, Brecht Vandevoort
Recently, Ketsman et al. started the investigation of the parallel evaluation of recursive queries in the Massively Parallel Communication (MPC) model. Among other things, it was shown that parallelcorrectness and parallel-boundedness for general Datalog programs is undecidable, by a reduction from the undecidable containment problem for Datalog. Furthermore, economic policies were introduced as a means to specify data distribution in a recursive setting. In this paper, we extend the latter framework to account for more general distributed evaluation strategies in terms of communication policies. We then show that the undecidability of parallel-correctness runs deeper: it already holds for fragments of Datalog, e.g., monadic and frontier-guarded Datalog, with a decidable containment problem, under relatively simple evaluation strategies. These simple evaluation strategies are defined w.r.t. data-moving distribution constraints. We then investigate restrictions of economic policies that yield decidability. In particular, we show that parallel-correctness is 2EXPTIME-complete for monadic and frontier-guarded Datalog under hash-based economic policies. Next, we consider restrictions of data-moving constraints and show that parallel-correctness and parallel-boundedness are 2EXPTIME-complete for frontier-guarded Datalog. Interestingly, distributed evaluation no longer preserves the usual containment relationships between fragments of Datalog. Indeed, not every monadic Datalog program is equivalent to a frontier-guarded one in the distributed setting. We illustrate the latter by considering two alternative settings where in one of these parallel-correctness is decidable for frontier-guarded Datalog but undecidable for monadic Datalog. 2012 ACM Subject Classification Theory of computation → Database theory
最近,Ketsman等人开始研究大规模并行通信(MPC)模型中递归查询的并行计算。通过对Datalog不可确定的包含问题的简化,证明了一般Datalog程序的并行正确性和并行有界性是不可确定的。此外,在递归设置中引入经济政策作为指定数据分布的手段。在本文中,我们扩展了后一种框架,以考虑通信策略方面更一般的分布式评估策略。然后,我们表明并行正确性的不可判定性更深:它已经适用于Datalog的片段,例如,在相对简单的求值策略下,具有可判定的包含问题的一元和边界保护Datalog。这些简单的求值策略是根据数据移动分布约束定义的。然后,我们研究了产生可决定性的经济政策的限制。特别是,我们证明了在基于哈希的经济政策下,一元和边界保护的Datalog的并行正确性是2EXPTIME-complete的。其次,我们考虑了数据移动约束的限制,并证明了边界保护数据的并行正确性和并行有界性是2EXPTIME-complete的。有趣的是,分布式计算不再保留Datalog片段之间通常的包含关系。实际上,并非每个一元Datalog程序都等同于分布式设置中的边界保护程序。我们通过考虑两种可选设置来说明后者,其中其中一种并行正确性对于边界保护数据是可确定的,而对于一元数据是不可确定的
{"title":"Parallel-Correctness and Parallel-Boundedness for Datalog Programs","authors":"F. Neven, T. Schwentick, Christopher Spinrath, Brecht Vandevoort","doi":"10.4230/LIPIcs.ICDT.2019.14","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2019.14","url":null,"abstract":"Recently, Ketsman et al. started the investigation of the parallel evaluation of recursive queries in the Massively Parallel Communication (MPC) model. Among other things, it was shown that parallelcorrectness and parallel-boundedness for general Datalog programs is undecidable, by a reduction from the undecidable containment problem for Datalog. Furthermore, economic policies were introduced as a means to specify data distribution in a recursive setting. In this paper, we extend the latter framework to account for more general distributed evaluation strategies in terms of communication policies. We then show that the undecidability of parallel-correctness runs deeper: it already holds for fragments of Datalog, e.g., monadic and frontier-guarded Datalog, with a decidable containment problem, under relatively simple evaluation strategies. These simple evaluation strategies are defined w.r.t. data-moving distribution constraints. We then investigate restrictions of economic policies that yield decidability. In particular, we show that parallel-correctness is 2EXPTIME-complete for monadic and frontier-guarded Datalog under hash-based economic policies. Next, we consider restrictions of data-moving constraints and show that parallel-correctness and parallel-boundedness are 2EXPTIME-complete for frontier-guarded Datalog. Interestingly, distributed evaluation no longer preserves the usual containment relationships between fragments of Datalog. Indeed, not every monadic Datalog program is equivalent to a frontier-guarded one in the distributed setting. We illustrate the latter by considering two alternative settings where in one of these parallel-correctness is decidable for frontier-guarded Datalog but undecidable for monadic Datalog. 2012 ACM Subject Classification Theory of computation → Database theory","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86257128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The Power of the Terminating Chase (Invited Talk) 终止追逐的力量(特邀演讲)
M. Krötzsch, Maximilian Marx, S. Rudolph
{"title":"The Power of the Terminating Chase (Invited Talk)","authors":"M. Krötzsch, Maximilian Marx, S. Rudolph","doi":"10.4230/LIPIcs.ICDT.2019.3","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2019.3","url":null,"abstract":"","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80834751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Categorical Range Reporting with Frequencies 分类范围报告与频率
A. Ganguly, J. Munro, Yakov Nekrich, R. Shah, Sharma V. Thankachan
In this paper, we consider a variant of the color range reporting problem called color reporting with frequencies. Our goal is to pre-process a set of colored points into a data structure, so that given a query range Q, we can report all colors that appear in Q, along with their respective frequencies. In other words, for each reported color, we also output the number of times it occurs in Q. We describe an external-memory data structure that uses O(N(1 + log2 D/ logN)) words and answers one-dimensional queries in O(1 +K/B) I/Os, where N is the total number of points in the data structure, D is the total number of colors in the data structure, K is the number of reported colors, and B is the block size. Next we turn to an approximate version of this problem: report all colors σ that appear in the query range; for every reported color, we provide a constant-factor approximation on its frequency. We consider color reporting with approximate frequencies in two dimensions. Our data structure uses O(N) space and answers two-dimensional queries in O(logB N + log∗B +K/B) I/Os in the special case when the query range is bounded on two sides. As a corollary, we can also answer one-dimensional approximate queries within the same time and space bounds. 2012 ACM Subject Classification Theory of computation → Data structures design and analysis
在本文中,我们考虑了颜色范围报告问题的一种变体,称为带频率的颜色报告。我们的目标是将一组有颜色的点预处理成一个数据结构,这样,给定一个查询范围Q,我们就可以报告Q中出现的所有颜色,以及它们各自的频率。换句话说,对于每个报告的颜色,我们也输出它在q中出现的次数。我们描述了一个外部内存数据结构,它使用O(N(1 + log2 D/ logN))个单词,并在O(1 +K/B)个I/O中回答一维查询,其中N是数据结构中点的总数,D是数据结构中颜色的总数,K是报告的颜色的数量,B是块大小。接下来我们转到这个问题的近似版本:报告在查询范围内出现的所有颜色σ;对于每一种报告的颜色,我们提供其频率的常数因子近似值。我们考虑二维近似频率的颜色报告。我们的数据结构使用O(N)空间,在查询范围两侧有界的特殊情况下,以O(logB N + log * B +K/B) I/O回答二维查询。作为推论,我们也可以在相同的时间和空间范围内回答一维近似查询。2012 ACM学科分类:计算理论→数据结构设计与分析
{"title":"Categorical Range Reporting with Frequencies","authors":"A. Ganguly, J. Munro, Yakov Nekrich, R. Shah, Sharma V. Thankachan","doi":"10.4230/LIPIcs.ICDT.2019.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2019.9","url":null,"abstract":"In this paper, we consider a variant of the color range reporting problem called color reporting with frequencies. Our goal is to pre-process a set of colored points into a data structure, so that given a query range Q, we can report all colors that appear in Q, along with their respective frequencies. In other words, for each reported color, we also output the number of times it occurs in Q. We describe an external-memory data structure that uses O(N(1 + log2 D/ logN)) words and answers one-dimensional queries in O(1 +K/B) I/Os, where N is the total number of points in the data structure, D is the total number of colors in the data structure, K is the number of reported colors, and B is the block size. Next we turn to an approximate version of this problem: report all colors σ that appear in the query range; for every reported color, we provide a constant-factor approximation on its frequency. We consider color reporting with approximate frequencies in two dimensions. Our data structure uses O(N) space and answers two-dimensional queries in O(logB N + log∗B +K/B) I/Os in the special case when the query range is bounded on two sides. As a corollary, we can also answer one-dimensional approximate queries within the same time and space bounds. 2012 ACM Subject Classification Theory of computation → Data structures design and analysis","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73119728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Approximating Distance Measures for the Skyline 地平线的近似距离测量
Nirman Kumar, Benjamin Raichel, Stavros Sintos, G. V. Buskirk
In multi-parameter decision making, data is usually modeled as a set of points whose dimension is the number of parameters, and the skyline or Pareto points represent the possible optimal solutions for various optimization problems. The structure and computation of such points have been well studied, particularly in the database community. As the skyline can be quite large in high dimensions, one often seeks a compact summary. In particular, for a given integer parameter k, a subset of k points is desired which best approximates the skyline under some measure. Various measures have been proposed, but they mostly treat the skyline as a discrete object. By viewing the skyline as a continuous geometric hull, we propose a new measure that evaluates the quality of a subset by the Hausdorff distance of its hull to the full hull. We argue that in many ways our measure more naturally captures what it means to approximate the skyline. For our new geometric skyline approximation measure, we provide a plethora of results. Specifically, we provide (1) a near linear time exact algorithm in two dimensions, (2) APX-hardness results for dimensions three and higher, (3) approximation algorithms for related variants of our problem, and (4) a practical and efficient heuristic which uses our geometric insights into the problem, as well as various experimental results to show the efficacy of our approach.
在多参数决策中,数据通常被建模为点的集合,其维数是参数的个数,天际线或帕累托点代表各种优化问题的可能最优解。这些点的结构和计算已经得到了很好的研究,特别是在数据库界。由于高维的天际线可能相当大,人们经常寻求一个简洁的概括。特别地,对于给定的整数参数k,需要k个点的子集,它在某种度量下最接近天际线。人们提出了各种各样的措施,但它们大多将天际线视为一个离散的对象。通过将天际线视为一个连续的几何船体,我们提出了一种新的测量方法,通过船体到整个船体的豪斯多夫距离来评估子集的质量。我们认为,在许多方面,我们的测量方法更自然地捕捉到了接近天际线的含义。对于我们新的几何天际线近似度量,我们提供了大量的结果。具体来说,我们提供了(1)二维的近线性时间精确算法,(2)三维及更高维度的apx硬度结果,(3)问题相关变体的近似算法,以及(4)一个实用且高效的启发式算法,该算法使用我们对问题的几何见解,以及各种实验结果来显示我们方法的有效性。
{"title":"Approximating Distance Measures for the Skyline","authors":"Nirman Kumar, Benjamin Raichel, Stavros Sintos, G. V. Buskirk","doi":"10.4230/LIPICS.ICDT.2019.10","DOIUrl":"https://doi.org/10.4230/LIPICS.ICDT.2019.10","url":null,"abstract":"In multi-parameter decision making, data is usually modeled as a set of points whose dimension is the number of parameters, and the skyline or Pareto points represent the possible optimal solutions for various optimization problems. The structure and computation of such points have been well studied, particularly in the database community. As the skyline can be quite large in high dimensions, one often seeks a compact summary. In particular, for a given integer parameter k, a subset of k points is desired which best approximates the skyline under some measure. Various measures have been proposed, but they mostly treat the skyline as a discrete object. By viewing the skyline as a continuous geometric hull, we propose a new measure that evaluates the quality of a subset by the Hausdorff distance of its hull to the full hull. We argue that in many ways our measure more naturally captures what it means to approximate the skyline. For our new geometric skyline approximation measure, we provide a plethora of results. Specifically, we provide (1) a near linear time exact algorithm in two dimensions, (2) APX-hardness results for dimensions three and higher, (3) approximation algorithms for related variants of our problem, and (4) a practical and efficient heuristic which uses our geometric insights into the problem, as well as various experimental results to show the efficacy of our approach.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76930745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Models over Relational Databases (Invited Talk) 基于关系数据库的学习模型(特邀演讲)
Dan Olteanu
{"title":"Learning Models over Relational Databases (Invited Talk)","authors":"Dan Olteanu","doi":"10.4230/LIPIcs.ICDT.2019.1","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2019.1","url":null,"abstract":"","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84814942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Power of Relational Learning (Invited Talk) 关系学习的力量(特邀演讲)
L. Getoor
{"title":"The Power of Relational Learning (Invited Talk)","authors":"L. Getoor","doi":"10.4230/LIPIcs.ICDT.2019.2","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2019.2","url":null,"abstract":"","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82481828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integrity Constraints Revisited: From Exact to Approximate Implication 重新审视完整性约束:从精确到近似含义
Batya Kenig, Dan Suciu
Integrity constraints such as functional dependencies (FD) and multi-valueddependencies (MVD) are fundamental in database schema design. Likewise,probabilistic conditional independences (CI) are crucial for reasoning aboutmultivariate probability distributions. The implication problem studies whethera set of constraints (antecedents) implies another constraint (consequent), andhas been investigated in both the database and the AI literature, under theassumption that all constraints hold exactly. However, many applications todayconsider constraints that hold only approximately. In this paper we define anapproximate implication as a linear inequality between the degree ofsatisfaction of the antecedents and consequent, and we study the relaxationproblem: when does an exact implication relax to an approximate implication? Weuse information theory to define the degree of satisfaction, and prove severalresults. First, we show that any implication from a set of data dependencies(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at mostquadratic in the number of variables; when the consequent is an FD, the factorcan be reduced to 1. Second, we prove that there exists an implication betweenCIs that does not admit any relaxation; however, we prove that everyimplication between CIs relaxes "in the limit". Then, we show that theimplication problem for differential constraints in market basket analysis alsoadmits a relaxation with a factor equal to 1. Finally, we show how some of theresults in the paper can be derived using the I-measure theory, which relatesbetween information theoretic measures and set theory. Our results recover, andsometimes extend, previously known results about the implication problem: theimplication of MVDs and FDs can be checked by considering only 2-tuplerelations.
完整性约束,如功能依赖关系(FD)和多值依赖关系(MVD)是数据库模式设计中的基础。同样,概率条件独立性(CI)对于多元概率分布的推理也至关重要。隐含问题研究一组约束(前件)是否隐含另一组约束(后件),在假设所有约束都准确成立的情况下,数据库和人工智能文献中都对该问题进行了研究。然而,今天的许多应用程序只考虑近似的约束。本文将近似蕴涵定义为前因式和后因式的满足程度之间的线性不等式,并研究了松弛问题:精确蕴涵何时松弛为近似蕴涵?我们运用信息论来定义满意度,并证明了几个结果。首先,我们证明了一组数据依赖关系(mvd +FDs)的任何含义都可以松弛为一个简单的线性不等式,其变量数量最多为二次因子;当结果是FD时,因子可以简化为1。其次,我们证明了二者之间存在不允许任何松弛的蕴涵;然而,我们证明了ci之间的所有蕴涵都是“在极限内”松弛的。然后,我们证明了市场篮子分析中微分约束的隐含问题也允许一个因子等于1的松弛。最后,我们展示了如何使用i -测度理论来推导本文中的一些结果,该理论是信息测度和集合论之间的联系。我们的结果恢复,有时扩展,以前已知的结果关于蕴涵问题:mvd和FDs的蕴涵可以通过考虑2-双相关来检查。
{"title":"Integrity Constraints Revisited: From Exact to Approximate Implication","authors":"Batya Kenig, Dan Suciu","doi":"10.46298/lmcs-18(1:5)2022","DOIUrl":"https://doi.org/10.46298/lmcs-18(1:5)2022","url":null,"abstract":"Integrity constraints such as functional dependencies (FD) and multi-valued\u0000dependencies (MVD) are fundamental in database schema design. Likewise,\u0000probabilistic conditional independences (CI) are crucial for reasoning about\u0000multivariate probability distributions. The implication problem studies whether\u0000a set of constraints (antecedents) implies another constraint (consequent), and\u0000has been investigated in both the database and the AI literature, under the\u0000assumption that all constraints hold exactly. However, many applications today\u0000consider constraints that hold only approximately. In this paper we define an\u0000approximate implication as a linear inequality between the degree of\u0000satisfaction of the antecedents and consequent, and we study the relaxation\u0000problem: when does an exact implication relax to an approximate implication? We\u0000use information theory to define the degree of satisfaction, and prove several\u0000results. First, we show that any implication from a set of data dependencies\u0000(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most\u0000quadratic in the number of variables; when the consequent is an FD, the factor\u0000can be reduced to 1. Second, we prove that there exists an implication between\u0000CIs that does not admit any relaxation; however, we prove that every\u0000implication between CIs relaxes \"in the limit\". Then, we show that the\u0000implication problem for differential constraints in market basket analysis also\u0000admits a relaxation with a factor equal to 1. Finally, we show how some of the\u0000results in the paper can be derived using the I-measure theory, which relates\u0000between information theoretic measures and set theory. Our results recover, and\u0000sometimes extend, previously known results about the implication problem: the\u0000implication of MVDs and FDs can be checked by considering only 2-tuple\u0000relations.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78306650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
The First Order Truth behind Undecidability of Regular Path Queries Determinacy 正则路径查询确定性不可判定背后的一阶真理
Grzegorz Gluch, J. Marcinkowski, Piotr Ostropolski-Nalewaja
In our paper [Gluch, Marcinkowski, Ostropolski-Nalewaja, LICS ACM, 2018] we have solved an old problem stated in [Calvanese, De Giacomo, Lenzerini, Vardi, SPDS ACM, 2000] showing that query determinacy is undecidable for Regular Path Queries. Here a strong generalisation of this result is shown, and -- we think -- a very unexpected one. We prove that no regularity is needed: determinacy remains undecidable even for finite unions of conjunctive path queries.
在我们的论文[Gluch, Marcinkowski, Ostropolski-Nalewaja, LICS ACM, 2018]中,我们解决了[Calvanese, De Giacomo, Lenzerini, Vardi, SPDS ACM, 2000]中提出的一个老问题,表明常规路径查询的查询确定性是不可确定的。这里展示了对这一结果的强烈概括,而且——我们认为——是一个非常出乎意料的结果。我们证明了不需要正则性:即使对于连接路径查询的有限联合,确定性仍然是不可判定的。
{"title":"The First Order Truth behind Undecidability of Regular Path Queries Determinacy","authors":"Grzegorz Gluch, J. Marcinkowski, Piotr Ostropolski-Nalewaja","doi":"10.4230/LIPIcs.ICDT.2019.15","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2019.15","url":null,"abstract":"In our paper [Gluch, Marcinkowski, Ostropolski-Nalewaja, LICS ACM, 2018] we have solved an old problem stated in [Calvanese, De Giacomo, Lenzerini, Vardi, SPDS ACM, 2000] showing that query determinacy is undecidable for Regular Path Queries. Here a strong generalisation of this result is shown, and -- we think -- a very unexpected one. We prove that no regularity is needed: determinacy remains undecidable even for finite unions of conjunctive path queries.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75059623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reverse Prevention Sampling for Misinformation Mitigation in Social Networks 社交网络中错误信息缓解的反向预防抽样
Michael Simpson, Venkatesh Srinivasan, Alex Thomo
In this work, we consider misinformation propagating through a social network and study the problem of its prevention. In this problem, a "bad" campaign starts propagating from a set of seed nodes in the network and we use the notion of a limiting (or "good") campaign to counteract the effect of misinformation. The goal is to identify a set of $k$ users that need to be convinced to adopt the limiting campaign so as to minimize the number of people that adopt the "bad" campaign at the end of both propagation processes. This work presents emph{RPS} (Reverse Prevention Sampling), an algorithm that provides a scalable solution to the misinformation mitigation problem. Our theoretical analysis shows that emph{RPS} runs in $O((k + l)(n + m)(frac{1}{1 - gamma}) log n / epsilon^2 )$ expected time and returns a $(1 - 1/e - epsilon)$-approximate solution with at least $1 - n^{-l}$ probability (where $gamma$ is a typically small network parameter and $l$ is a confidence parameter). The time complexity of emph{RPS} substantially improves upon the previously best-known algorithms that run in time $Omega(m n k cdot POLY(epsilon^{-1}))$. We experimentally evaluate emph{RPS} on large datasets and show that it outperforms the state-of-the-art solution by several orders of magnitude in terms of running time. This demonstrates that misinformation mitigation can be made practical while still offering strong theoretical guarantees.
在这项工作中,我们考虑通过社交网络传播错误信息,并研究其预防问题。在这个问题中,“坏”活动从网络中的一组种子节点开始传播,我们使用限制(或“好”)活动的概念来抵消错误信息的影响。目标是确定一组需要被说服采用限制性活动的$k$用户,以便在两个传播过程结束时尽量减少采用“坏”活动的人数。这项工作提出了emph{RPS}(反向预防采样),这是一种算法,为错误信息缓解问题提供了可扩展的解决方案。我们的理论分析表明,emph{RPS}在$O((k + l)(n + m)(frac{1}{1 - gamma}) log n / epsilon^2 )$预期时间内运行,并以至少$1 - n^{-l}$的概率返回$(1 - 1/e - epsilon)$ -近似解(其中$gamma$是一个典型的小网络参数,$l$是一个置信度参数)。emph{RPS}的时间复杂度大大提高了以前最著名的实时运行算法$Omega(m n k cdot POLY(epsilon^{-1}))$。我们通过实验评估了大型数据集上的emph{RPS},并表明它在运行时间方面优于最先进的解决方案几个数量级。这表明,在提供强有力的理论保证的同时,减少错误信息是可以实现的。
{"title":"Reverse Prevention Sampling for Misinformation Mitigation in Social Networks","authors":"Michael Simpson, Venkatesh Srinivasan, Alex Thomo","doi":"10.4230/LIPIcs.ICDT.2020.24","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2020.24","url":null,"abstract":"In this work, we consider misinformation propagating through a social network and study the problem of its prevention. In this problem, a \"bad\" campaign starts propagating from a set of seed nodes in the network and we use the notion of a limiting (or \"good\") campaign to counteract the effect of misinformation. The goal is to identify a set of $k$ users that need to be convinced to adopt the limiting campaign so as to minimize the number of people that adopt the \"bad\" campaign at the end of both propagation processes. \u0000This work presents emph{RPS} (Reverse Prevention Sampling), an algorithm that provides a scalable solution to the misinformation mitigation problem. Our theoretical analysis shows that emph{RPS} runs in $O((k + l)(n + m)(frac{1}{1 - gamma}) log n / epsilon^2 )$ expected time and returns a $(1 - 1/e - epsilon)$-approximate solution with at least $1 - n^{-l}$ probability (where $gamma$ is a typically small network parameter and $l$ is a confidence parameter). The time complexity of emph{RPS} substantially improves upon the previously best-known algorithms that run in time $Omega(m n k cdot POLY(epsilon^{-1}))$. We experimentally evaluate emph{RPS} on large datasets and show that it outperforms the state-of-the-art solution by several orders of magnitude in terms of running time. This demonstrates that misinformation mitigation can be made practical while still offering strong theoretical guarantees.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76626355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1