Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems最新文献

英文中文

Fair Near Neighbor Search: Independent Range Sampling in High Dimensions 公平近邻搜索:高维独立距离采样

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-06-05 DOI: 10.1145/3375395.3387648

Martin Aumüller, R. Pagh, Francesco Silvestri

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the r-near neighbor (r-NN) problem: given a radius r>0 and a set of points S, construct a data structure that, for any given query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for r-NN where all points in S that are near q have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights (un)fairness in a recommendation setting on real-world datasets and discusses the inherent unfairness introduced by solving other variants of the problem.

相似搜索是一种基本的算法原语，广泛应用于许多计算机科学学科。相似搜索问题有几种变体，其中最相关的是r-近邻(r- nn)问题:给定半径r>0和一组点S，构造一个数据结构，对于任意给定的查询点q，返回距离q最大为r的点p。本文从公平性的角度研究r- nn问题。我们从机会均等的意义上考虑公平性:距离查询r以内的所有点应该具有相同的返回概率。在低维情况下，这个问题首先由Hu, Qiao, and Tao (PODS 2014)研究。局部敏感哈希(LSH)是理论上最强大的高维相似性搜索方法，但它不能提供这样的公平性保证。为了解决这个问题，我们为r-NN提出了有效的数据结构，其中S中靠近q的所有点都有相同的概率被查询选择和返回。具体来说，我们首先提出了一种黑盒方法，该方法在给定任意LSH方案的情况下，为查询的邻域内的均匀采样点构建数据结构。然后，我们开发了一种内积下公平相似搜索的数据结构，该结构需要近线性空间并利用局域敏感滤波器。本文以一个实验评估作为结论，强调了在现实世界数据集上推荐设置的公平性，并讨论了通过解决问题的其他变体引入的固有不公平性。

{"title":"Fair Near Neighbor Search: Independent Range Sampling in High Dimensions","authors":"Martin Aumüller, R. Pagh, Francesco Silvestri","doi":"10.1145/3375395.3387648","DOIUrl":"https://doi.org/10.1145/3375395.3387648","url":null,"abstract":"Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the r-near neighbor (r-NN) problem: given a radius r>0 and a set of points S, construct a data structure that, for any given query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for r-NN where all points in S that are near q have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights (un)fairness in a recommendation setting on real-world datasets and discusses the inherent unfairness introduced by solving other variants of the problem.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117079924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A Tight Lower Bound for Comparison-Based Quantile Summaries 基于比较的分位数摘要的紧密下界

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-05-09 DOI: 10.1145/3375395.3387650

Graham Cormode, P. Veselý

Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles of a stream of items, up to an error of at most ε. That is, an ε-approximate quantile summary first processes a stream and then, given any quantile query 0łe φłe 1, returns an item from the stream, which is a φ'-quantile for some φ' = φ +- ε. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna [6], stores at most O(1/ε ⋅ log ε N) items, where N is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(ε)⋅ o(log N), for any function f that does not depend on N. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1+-ε)⋅ φ, and for other related computational tasks.

分位数，如中位数或百分位数，提供了关于从一个完全有序的宇宙中抽取的一组项目的分布的简洁而有用的信息。我们研究数据结构，称为分位数摘要，它跟踪项目流的所有分位数，误差最多为ε。也就是说，一个ε-近似分位数汇总首先处理一个流，然后，给定任何分位数查询0łe φłe 1，从流中返回一个项目，对于某些φ' = φ +- ε，它是一个φ'-分位数。我们关注的是基于比较的分位数摘要，它只能比较两个项目，否则就完全忽略了整个宇宙。由于Greenwald和Khanna[6]，迄今为止最好的这种确定性分位数总结最多存储O(1/ε⋅log ε N)个项目，其中N是流中的项目数。我们通过给出一个匹配的下界来证明这个空间界是最优的。因此，我们的结果排除了在空间f(ε)⋅o(log N)中构建基于确定性比较的分位数总结的可能性，对于任何不依赖于N的函数f，作为推论，我们改进了有偏分位数的下界，从而为(1+-ε)⋅φ提供了更强的相对误差保证，并适用于其他相关的计算任务。

{"title":"A Tight Lower Bound for Comparison-Based Quantile Summaries","authors":"Graham Cormode, P. Veselý","doi":"10.1145/3375395.3387650","DOIUrl":"https://doi.org/10.1145/3375395.3387650","url":null,"abstract":"Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles of a stream of items, up to an error of at most ε. That is, an ε-approximate quantile summary first processes a stream and then, given any quantile query 0łe φłe 1, returns an item from the stream, which is a φ'-quantile for some φ' = φ +- ε. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna [6], stores at most O(1/ε ⋅ log ε N) items, where N is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(ε)⋅ o(log N), for any function f that does not depend on N. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1+-ε)⋅ φ, and for other related computational tasks.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121618270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

All-Instances Restricted Chase Termination 所有实例限制追逐终止

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-01-12 DOI: 10.1145/3375395.3387644

Tomasz Gogacz, J. Marcinkowski, Andreas Pieris

The chase procedure is a fundamental algorithmic tool in database theory with a variety of applications. A key problem concerning the chase procedure is all-instances termination: for a given set of tuple-generating dependencies (TGDs), is it the case that the chase terminates for every input database? In view of the fact that this problem is undecidable, it is natural to ask whether known well-behaved classes of TGDs ensure decidability. We consider here the main paradigms that led to robust TGD-based formalisms, that is, guardedness and stickiness. Although all-instances termination is well-understood for the oblivious chase, the more subtle case of the restricted (a.k.a. the standard) chase is rather unexplored. We show that all-instances restricted chase termination for guarded/sticky single-head TGDs is decidable in elementary time.

追逐程序是数据库理论中的一种基本算法工具，有着广泛的应用。关于追踪过程的一个关键问题是所有实例的终止:对于给定的一组元组生成依赖项(tgd)，是否每个输入数据库的追踪都会终止?鉴于这个问题是不可判定的，人们自然会问，已知的行为良好的tgd类是否确保了可判定性。我们在这里考虑导致健壮的基于tgd的形式化的主要范例，即保护性和粘性。尽管对于遗忘追逐的所有实例终止都很容易理解，但对于受限(即标准)追逐的更微妙的情况却没有进行探索。我们证明了保护/粘性单头TGDs的所有实例限制追逐终止在初等时间是可决定的。

引用次数: 12

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems 第39届ACM SIGMOD-SIGACT-SIGAI数据库系统原理研讨会论文集

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2015-05-20 DOI: 10.1145/3403468

T. Milo, Diego Calvanese

It is our great pleasure to welcome you to the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2015), held in Melbourne, Victoria, Australia, on May 31 -- June 4, 2015, in conjunction with the 2015 ACM SIGMOD International Conference on Management of Data. Since the first edition of the symposium in 1982, the PODS papers are distinguished by a rigorous approach to widely diverse problems in data management, often bringing to bear techniques from a variety of different areas, including computational logic, finite model theory, computational complexity, algorithm design and analysis, programming languages, and artificial intelligence. The PODS Symposia study data management challenges in a variety of application contexts, including more recently probabilistic data, streaming data, graph data, information retrieval, ontology and semantic web, and data-driven processes and systems. PODS has a tradition of being the premier international conference on the theoretical and foundational aspects of mdata management, and the interested reader is referred to the PODS web pages at http://www.sigmod.org/thepods- pages/ for information on the history of this conference series. This year's symposium continues this tradition, but in addition the PODS Executive Committee decided to broaden the scope of PODS, and to explicitly invite for submission papers providing original, substantial contributions in one or more of the following categories: a) deep theoretical exploration of topical areas central to data management; b) new formal frameworks that aim at providing the basis for deeper theoretical investigation of important emerging issues in data management; and c) validation of theoretical approaches from the lens of practical applicability in data management. This volume contains the proceedings of PODS 2015, which include an abstract for the keynote address by Michael I. Johnson (University of California, Berkeley), papers based on two invited tutorials by Todd J. Green (LogicBlox, USA) and Graham Cormode (University of Warwick, UK), and 25 contributions that were selected by the Program Committee for presentation at the symposium. This year, PODS experimented for the first time with two submission cycles, where the first cycle allowed also for papers to be revised and resubmitted. For the first cycle, 29 papers were submitted, 4 of which were directly selected for inclusion in the proceedings, and 7 were invited for a resubmission after a revision. The quality of most of the revised papers increased substantially with respect to the first submission, and 6 of those in the end were selected for the proceedings. For the second cycle, 51 papers were submitted, 15 of which were selected, resulting in 25 papers selected overall from a total number of 80 submissions. Most of the 25 accepted papers are extended abstracts. While all submissions have been reviewed by at least four Program Committee members, they have not been forma

我们非常高兴地欢迎您参加于2015年5月31日至6月4日在澳大利亚维多利亚州墨尔本举行的第34届ACM SIGMOD- sigact - sigai数据库系统原理研讨会(PODS 2015)，该研讨会与2015年ACM SIGMOD数据管理国际会议同时举行。自1982年研讨会的第一版以来，PODS论文的特点是采用严格的方法来解决数据管理中的各种问题，通常采用来自不同领域的技术，包括计算逻辑，有限模型理论，计算复杂性，算法设计和分析，编程语言和人工智能。PODS专题讨论会研究了各种应用环境中的数据管理挑战，包括最近的概率数据、流数据、图形数据、信息检索、本体和语义网，以及数据驱动的过程和系统。PODS有作为mdata管理理论和基础方面的主要国际会议的传统，感兴趣的读者可以访问PODS的网页http://www.sigmod.org/thepods- pages/以获取有关该系列会议历史的信息。今年的研讨会延续了这一传统，但此外，数据管理中心执行委员会决定扩大数据管理中心的范围，并明确邀请提交在以下一个或多个类别中提供原创、实质性贡献的论文:a)对数据管理中心专题领域的深入理论探索;B)新的正式框架，旨在为数据管理中重要新问题的更深入的理论研究提供基础;c)从数据管理的实际适用性角度验证理论方法。本卷包含2015年PODS会议记录，其中包括Michael I. Johnson(加州大学伯克利分校)的主题演讲摘要，Todd J. Green (LogicBlox，美国)和Graham Cormode(英国华威大学)的两篇受邀教程的论文，以及由项目委员会选择在研讨会上发表的25篇论文。今年，PODS首次尝试了两个提交周期，其中第一个周期也允许论文修改和重新提交。在第一个周期，提交了29篇论文，其中4篇被直接选中列入会议记录，7篇在修订后被邀请重新提交。与第一次提交的论文相比，大多数修订后的论文的质量大大提高，其中6篇最终入选论文集。在第二个周期，提交了51篇论文，其中15篇被选中，结果从总共80篇论文中选出了25篇。被接受的25篇论文大部分是扩展摘要。虽然所有提交的材料都经过至少四名项目委员会成员的审查，但尚未正式提交。预计这些论文中描述的大部分研究将以更完善和详细的形式发表在科学期刊上。关于上述三个类别，在80份意见书中(参见:，接受论文25篇)，47篇(退稿)。， 19)被作者归为(a)类，28(见附件)。在(b)类中，只有6个(见第6条)。(3)类别(c)。类别是非排他性的，分类不是强制性的;事实上，有几篇论文被分类在一个以上的类别中。3)提交作品，未指定类别。项目委员会的一项重要任务是选出2015年PODS最佳论文奖。委员会选择了Tom J. Ameloot、Gaetano Geck、Bas Ketsman、Frank Neven和Thomas Schwentick的论文《Parallel-Correctness and Transferability for Conjunctive Queries》。在此，我们代表委员会向作者表示诚挚的祝贺。自2008年以来，PODS将ACM PODS Alberto O. Mendelzon时间测试奖授予十年前在PODS会议上发表的一篇或少数论文，这些论文在其间的十年中影响最大。今年的委员会由Dan Suciu(主席)、Foto Afrati和Frank Neven组成，他们选择了以下两篇论文。向他们的作者致以最热烈的祝贺!Michael Benedikt、Wenfei Fan和Floris Geerts撰写的“dtd存在下的XPath可满足性”以及Luc Segoufin和Victor Vianu撰写的“视图和查询:确定性和重写”。

{"title":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","authors":"T. Milo, Diego Calvanese","doi":"10.1145/3403468","DOIUrl":"https://doi.org/10.1145/3403468","url":null,"abstract":"It is our great pleasure to welcome you to the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2015), held in Melbourne, Victoria, Australia, on May 31 -- June 4, 2015, in conjunction with the 2015 ACM SIGMOD International Conference on Management of Data. Since the first edition of the symposium in 1982, the PODS papers are distinguished by a rigorous approach to widely diverse problems in data management, often bringing to bear techniques from a variety of different areas, including computational logic, finite model theory, computational complexity, algorithm design and analysis, programming languages, and artificial intelligence. The PODS Symposia study data management challenges in a variety of application contexts, including more recently probabilistic data, streaming data, graph data, information retrieval, ontology and semantic web, and data-driven processes and systems. PODS has a tradition of being the premier international conference on the theoretical and foundational aspects of mdata management, and the interested reader is referred to the PODS web pages at http://www.sigmod.org/thepods- pages/ for information on the history of this conference series. \u0000 \u0000This year's symposium continues this tradition, but in addition the PODS Executive Committee decided to broaden the scope of PODS, and to explicitly invite for submission papers providing original, substantial contributions in one or more of the following categories: a) deep theoretical exploration of topical areas central to data management; b) new formal frameworks that aim at providing the basis for deeper theoretical investigation of important emerging issues in data management; and c) validation of theoretical approaches from the lens of practical applicability in data management. This volume contains the proceedings of PODS 2015, which include an abstract for the keynote address by Michael I. Johnson (University of California, Berkeley), papers based on two invited tutorials by Todd J. Green (LogicBlox, USA) and Graham Cormode (University of Warwick, UK), and 25 contributions that were selected by the Program Committee for presentation at the symposium. \u0000 \u0000This year, PODS experimented for the first time with two submission cycles, where the first cycle allowed also for papers to be revised and resubmitted. For the first cycle, 29 papers were submitted, 4 of which were directly selected for inclusion in the proceedings, and 7 were invited for a resubmission after a revision. The quality of most of the revised papers increased substantially with respect to the first submission, and 6 of those in the end were selected for the proceedings. For the second cycle, 51 papers were submitted, 15 of which were selected, resulting in 25 papers selected overall from a total number of 80 submissions. Most of the 25 accepted papers are extended abstracts. While all submissions have been reviewed by at least four Program Committee members, they have not been forma","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132353170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀