重新审视完整性约束:从精确到近似含义

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2018-12-01 DOI:10.46298/lmcs-18(1:5)2022

Batya Kenig, Dan Suciu

{"title":"重新审视完整性约束:从精确到近似含义","authors":"Batya Kenig, Dan Suciu","doi":"10.46298/lmcs-18(1:5)2022","DOIUrl":null,"url":null,"abstract":"Integrity constraints such as functional dependencies (FD) and multi-valued\ndependencies (MVD) are fundamental in database schema design. Likewise,\nprobabilistic conditional independences (CI) are crucial for reasoning about\nmultivariate probability distributions. The implication problem studies whether\na set of constraints (antecedents) implies another constraint (consequent), and\nhas been investigated in both the database and the AI literature, under the\nassumption that all constraints hold exactly. However, many applications today\nconsider constraints that hold only approximately. In this paper we define an\napproximate implication as a linear inequality between the degree of\nsatisfaction of the antecedents and consequent, and we study the relaxation\nproblem: when does an exact implication relax to an approximate implication? We\nuse information theory to define the degree of satisfaction, and prove several\nresults. First, we show that any implication from a set of data dependencies\n(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most\nquadratic in the number of variables; when the consequent is an FD, the factor\ncan be reduced to 1. Second, we prove that there exists an implication between\nCIs that does not admit any relaxation; however, we prove that every\nimplication between CIs relaxes \"in the limit\". Then, we show that the\nimplication problem for differential constraints in market basket analysis also\nadmits a relaxation with a factor equal to 1. Finally, we show how some of the\nresults in the paper can be derived using the I-measure theory, which relates\nbetween information theoretic measures and set theory. Our results recover, and\nsometimes extend, previously known results about the implication problem: the\nimplication of MVDs and FDs can be checked by considering only 2-tuple\nrelations.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"26 1","pages":"18:1-18:20"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Integrity Constraints Revisited: From Exact to Approximate Implication\",\"authors\":\"Batya Kenig, Dan Suciu\",\"doi\":\"10.46298/lmcs-18(1:5)2022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Integrity constraints such as functional dependencies (FD) and multi-valued\\ndependencies (MVD) are fundamental in database schema design. Likewise,\\nprobabilistic conditional independences (CI) are crucial for reasoning about\\nmultivariate probability distributions. The implication problem studies whether\\na set of constraints (antecedents) implies another constraint (consequent), and\\nhas been investigated in both the database and the AI literature, under the\\nassumption that all constraints hold exactly. However, many applications today\\nconsider constraints that hold only approximately. In this paper we define an\\napproximate implication as a linear inequality between the degree of\\nsatisfaction of the antecedents and consequent, and we study the relaxation\\nproblem: when does an exact implication relax to an approximate implication? We\\nuse information theory to define the degree of satisfaction, and prove several\\nresults. First, we show that any implication from a set of data dependencies\\n(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most\\nquadratic in the number of variables; when the consequent is an FD, the factor\\ncan be reduced to 1. Second, we prove that there exists an implication between\\nCIs that does not admit any relaxation; however, we prove that every\\nimplication between CIs relaxes \\\"in the limit\\\". Then, we show that the\\nimplication problem for differential constraints in market basket analysis also\\nadmits a relaxation with a factor equal to 1. Finally, we show how some of the\\nresults in the paper can be derived using the I-measure theory, which relates\\nbetween information theoretic measures and set theory. Our results recover, and\\nsometimes extend, previously known results about the implication problem: the\\nimplication of MVDs and FDs can be checked by considering only 2-tuple\\nrelations.\",\"PeriodicalId\":90482,\"journal\":{\"name\":\"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory\",\"volume\":\"26 1\",\"pages\":\"18:1-18:20\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46298/lmcs-18(1:5)2022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46298/lmcs-18(1:5)2022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

完整性约束，如功能依赖关系(FD)和多值依赖关系(MVD)是数据库模式设计中的基础。同样，概率条件独立性(CI)对于多元概率分布的推理也至关重要。隐含问题研究一组约束(前件)是否隐含另一组约束(后件)，在假设所有约束都准确成立的情况下，数据库和人工智能文献中都对该问题进行了研究。然而，今天的许多应用程序只考虑近似的约束。本文将近似蕴涵定义为前因式和后因式的满足程度之间的线性不等式，并研究了松弛问题:精确蕴涵何时松弛为近似蕴涵?我们运用信息论来定义满意度，并证明了几个结果。首先，我们证明了一组数据依赖关系(mvd +FDs)的任何含义都可以松弛为一个简单的线性不等式，其变量数量最多为二次因子;当结果是FD时，因子可以简化为1。其次，我们证明了二者之间存在不允许任何松弛的蕴涵;然而，我们证明了ci之间的所有蕴涵都是“在极限内”松弛的。然后，我们证明了市场篮子分析中微分约束的隐含问题也允许一个因子等于1的松弛。最后，我们展示了如何使用i -测度理论来推导本文中的一些结果，该理论是信息测度和集合论之间的联系。我们的结果恢复，有时扩展，以前已知的结果关于蕴涵问题:mvd和FDs的蕴涵可以通过考虑2-双相关来检查。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Integrity Constraints Revisited: From Exact to Approximate Implication

Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Then, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Finally, we show how some of the results in the paper can be derived using the I-measure theory, which relates between information theoretic measures and set theory. Our results recover, and sometimes extend, previously known results about the implication problem: the implication of MVDs and FDs can be checked by considering only 2-tuple relations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

自引率

0.00%

发文量

期刊最新文献

Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs A Simple Algorithm for Consistent Query Answering under Primary Keys Size Bounds and Algorithms for Conjunctive Regular Path Queries Compact Data Structures Meet Databases (Invited Talk) Enumerating Subgraphs of Constant Sizes in External Memory