首页 > 最新文献

Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems最新文献

英文 中文
Enumeration of MSO Queries on Strings with Constant Delay and Logarithmic Updates 恒延迟和对数更新字符串上的MSO查询枚举
Matthias Niewerth, L. Segoufin
We consider the enumeration of MSO queries over strings under updates. For each MSO query we build an index structure enjoying the following properties: The index structure can be constructed in linear time, it can be updated in logarithmic time and it allows for constant delay time enumeration. This improves from the previous known index structures allowing for constant delay enumeration that would need to be reconstructed from scratch, hence in linear time, in the presence of updates. We allow relabeling updates, insertion of individual labels and removal of individual labels.
我们考虑更新下字符串上的MSO查询的枚举。对于每个MSO查询,我们构建一个具有以下属性的索引结构:索引结构可以在线性时间内构建,可以在对数时间内更新,并且允许恒定延迟时间枚举。这比以前已知的索引结构有所改进,允许在线性时间内,在有更新的情况下,从头开始重新构建恒定的延迟枚举。我们允许重新标签更新,插入单个标签和删除单个标签。
{"title":"Enumeration of MSO Queries on Strings with Constant Delay and Logarithmic Updates","authors":"Matthias Niewerth, L. Segoufin","doi":"10.1145/3196959.3196961","DOIUrl":"https://doi.org/10.1145/3196959.3196961","url":null,"abstract":"We consider the enumeration of MSO queries over strings under updates. For each MSO query we build an index structure enjoying the following properties: The index structure can be constructed in linear time, it can be updated in logarithmic time and it allows for constant delay time enumeration. This improves from the previous known index structures allowing for constant delay enumeration that would need to be reconstructed from scratch, hence in linear time, in the presence of updates. We allow relabeling updates, insertion of individual labels and removal of individual labels.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114897135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
2018 ACM PODS Alberto O. Mendelzon Test-of-Time Award 2018年ACM PODS Alberto O. Mendelzon时间测试奖
M. Lenzerini, W. Martens, Nicole Schweikardt
In 2007, the PODS Executive Committee established a Test-ofTime Award, named after the late Alberto O. Mendelzon, in recognition of his scientific legacy and his service and dedication to the database community. Mendelzon was an international leader in database theory, whose pioneering and fundamental work has inspired and influenced both database theoreticians and practitioners, and continues to be applied in a variety of advanced settings. He served the database community in many ways: he served as both the Program and the General Chair of the PODS conference, and was instrumental in bringing SIGMOD and PODS together. He was an outstanding educator, who guided the research of numerous doctoral students and postdoctoral fellows. The Award is to be given each year to a paper or a small number of papers published in the PODS proceedings ten years prior, that had the most impact (in terms of research, methodology, or transfer of practice) over the intervening decade. The decision was approved by SIGMOD and ACM. The funds for the Award were contributed by IBM Toronto. The PODS Executive Committee has appointed us to serve as the Award Committee for 2018. After careful consideration and having solicited external nominations and advice, we have selected the following paper as the award winner for 2018: “The Chase Revisited" by Alin Deutsch, Alan Nash and Jeff Remmel. Citation. The chase procedure, introduced in the '70s, is a famous technique in the field and has been proved to be important and effective in providing solutions to several problems related to reasoning on data. The paper revisits the standard chase procedure, studying its properties and applicability to classical database problems. Beside settling the open problem of decidability of termination of the standard chase, it investigates the adequacy of the standard chase for a number of data-oriented tasks. The conceptual insight provided by the paper and the technical results presented go much deeper than the modest title of the paper may suggest. They have had a huge impact on the research work carried out in several topics of data management and knowledge bases, including checking query containment under constraints, constraint implication, computing certain answers in data exchange and data integration, query answering in Datalog and its extensions, and ontology-based data access. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. PODS'18, June 10–15, 2018, Hous
2007年,PODS执行委员会设立了时间测试奖,以已故的Alberto O. Mendelzon命名,以表彰他的科学遗产以及他对数据库社区的服务和奉献。Mendelzon是数据库理论的国际领导者,他的开创性和基础性工作激励和影响了数据库理论家和实践者,并继续在各种先进环境中应用。他以多种方式为数据库社区服务:他担任过PODS会议的项目主席和总主席,并在将SIGMOD和PODS结合在一起方面发挥了重要作用。他是一位杰出的教育家,指导了众多博士生和博士后的研究。该奖项每年颁发给十年前在PODS会议记录上发表的一篇或少数论文,这些论文在过去十年中(在研究、方法或实践转移方面)具有最大的影响力。该决定得到了SIGMOD和ACM的批准。该奖项的资金由IBM多伦多公司提供。PODS执行委员会已任命我们担任2018年的颁奖委员会。经过仔细考虑并征求外部提名和建议,我们选择以下论文为2018年的获奖者:由Alin Deutsch, Alan Nash和Jeff Remmel撰写的“the Chase Revisited”。引用。70年代引入的追逐程序是该领域的一项著名技术,已被证明在解决与数据推理相关的几个问题方面是重要而有效的。本文回顾了标准跟踪程序,研究了其性质及其在经典数据库问题中的适用性。除了解决标准追踪终止的可判定性这一开放性问题外,本文还探讨了标准追踪在若干面向数据的任务中的充分性。论文提供的概念见解和提出的技术结果比论文的谦虚标题可能暗示的要深刻得多。它们对数据管理和知识库的几个主题的研究工作产生了巨大的影响,包括约束下的查询遏制检查、约束蕴涵、数据交换和数据集成中的某些答案计算、Datalog及其扩展中的查询回答以及基于本体的数据访问。允许免费制作本作品的全部或部分数字或硬拷贝供个人或课堂使用,前提是副本不是为了盈利或商业利益而制作或分发的,并且副本在第一页上带有本通知和完整的引用。本作品的版权由作者以外的人所有,必须得到尊重。允许有信用的摘要。以其他方式复制或重新发布,在服务器上发布或重新分发到列表,需要事先获得特定许可和/或付费。从Permissions@acm.org请求权限。PODS'18, 2018年6月10日至15日,休斯顿,德克萨斯州,美国©2018版权归所有人/作者所有。授权给ACM的出版权。ACM 978-1-4503-4706-8/18/06…$15.00 https://doi.org/10.1145/3196959.3196993会议:时间测试奖和PODS’18的精华,2018年6月10日至15日,休斯顿,德克萨斯州,美国
{"title":"2018 ACM PODS Alberto O. Mendelzon Test-of-Time Award","authors":"M. Lenzerini, W. Martens, Nicole Schweikardt","doi":"10.1145/3196959.3196993","DOIUrl":"https://doi.org/10.1145/3196959.3196993","url":null,"abstract":"In 2007, the PODS Executive Committee established a Test-ofTime Award, named after the late Alberto O. Mendelzon, in recognition of his scientific legacy and his service and dedication to the database community. Mendelzon was an international leader in database theory, whose pioneering and fundamental work has inspired and influenced both database theoreticians and practitioners, and continues to be applied in a variety of advanced settings. He served the database community in many ways: he served as both the Program and the General Chair of the PODS conference, and was instrumental in bringing SIGMOD and PODS together. He was an outstanding educator, who guided the research of numerous doctoral students and postdoctoral fellows. The Award is to be given each year to a paper or a small number of papers published in the PODS proceedings ten years prior, that had the most impact (in terms of research, methodology, or transfer of practice) over the intervening decade. The decision was approved by SIGMOD and ACM. The funds for the Award were contributed by IBM Toronto. The PODS Executive Committee has appointed us to serve as the Award Committee for 2018. After careful consideration and having solicited external nominations and advice, we have selected the following paper as the award winner for 2018: “The Chase Revisited\" by Alin Deutsch, Alan Nash and Jeff Remmel. Citation. The chase procedure, introduced in the '70s, is a famous technique in the field and has been proved to be important and effective in providing solutions to several problems related to reasoning on data. The paper revisits the standard chase procedure, studying its properties and applicability to classical database problems. Beside settling the open problem of decidability of termination of the standard chase, it investigates the adequacy of the standard chase for a number of data-oriented tasks. The conceptual insight provided by the paper and the technical results presented go much deeper than the modest title of the paper may suggest. They have had a huge impact on the research work carried out in several topics of data management and knowledge bases, including checking query containment under constraints, constraint implication, computing certain answers in data exchange and data integration, query answering in Datalog and its extensions, and ontology-based data access. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. PODS'18, June 10–15, 2018, Hous","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134558759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distinct Sampling on Streaming Data with Near-Duplicates 近重复流数据的不同采样
Jiecao Chen, Qin Zhang
In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates. The goal of distinct sampling is to return a distinct element uniformly at random from the universe of elements, given that all the near-duplicates are treated as the same element. We also extend the result to the sliding window cases in which we are only interested in the most recent items. We present algorithms with provable theoretical guarantees for datasets in the Euclidean space, and also verify their effectiveness via an extensive set of experiments.
本文研究了在数据包含近重复项的流模型中如何进行不同采样。不同采样的目标是从所有元素中均匀随机地返回一个不同的元素,假设所有近似重复的元素都被视为相同的元素。我们还将结果扩展到滑动窗口案例,其中我们只对最近的项目感兴趣。我们提出了在欧几里得空间中对数据集具有可证明的理论保证的算法,并通过大量的实验验证了它们的有效性。
{"title":"Distinct Sampling on Streaming Data with Near-Duplicates","authors":"Jiecao Chen, Qin Zhang","doi":"10.1145/3196959.3196978","DOIUrl":"https://doi.org/10.1145/3196959.3196978","url":null,"abstract":"In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates. The goal of distinct sampling is to return a distinct element uniformly at random from the universe of elements, given that all the near-duplicates are treated as the same element. We also extend the result to the sliding window cases in which we are only interested in the most recent items. We present algorithms with provable theoretical guarantees for datasets in the Euclidean space, and also verify their effectiveness via an extensive set of experiments.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133799694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
How Can Reasoners Simplify Database Querying (And Why Haven't They Done It Yet)? 推理器如何简化数据库查询(为什么他们还没有这样做)?
Michael Benedikt
The last few decades have seen vast progress in computational reasoning. This has included significant developments in theory, increasing maturity of tools both in performance and usability, and the evolution of standards and benchmarks. The purpose of this article is to reflect on the use of reasoning for rewriting and simplifying relational database queries. We undertake a review of some of the results and reasoning algorithms that have been developed with a motivation from query evaluation, and add to this a look at open problems in the area as well as a critique of prior work from the point of view of practice.
在过去的几十年里,计算推理取得了巨大的进步。这包括理论的重大发展,工具在性能和可用性方面的日益成熟,以及标准和基准的演变。本文的目的是讨论如何使用推理来重写和简化关系数据库查询。我们对一些结果和推理算法进行了回顾,这些结果和推理算法是基于查询评估的动机而开发的,并在此基础上对该领域的开放问题进行了审视,并从实践的角度对先前的工作进行了批评。
{"title":"How Can Reasoners Simplify Database Querying (And Why Haven't They Done It Yet)?","authors":"Michael Benedikt","doi":"10.1145/3196959.3196989","DOIUrl":"https://doi.org/10.1145/3196959.3196989","url":null,"abstract":"The last few decades have seen vast progress in computational reasoning. This has included significant developments in theory, increasing maturity of tools both in performance and usability, and the evolution of standards and benchmarks. The purpose of this article is to reflect on the use of reasoning for rewriting and simplifying relational database queries. We undertake a review of some of the results and reasoning algorithms that have been developed with a motivation from query evaluation, and add to this a look at open problems in the area as well as a critique of prior work from the point of view of practice.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"107 S118","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132904990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Distributed Statistical Estimation of Matrix Products with Applications 矩阵乘积的分布统计估计及其应用
David P. Woodruff, Qin Zhang
We consider statistical estimations of a matrix product over the integers in a distributed setting, where we have two parties Alice and Bob; Alice holds a matrix A and Bob holds a matrix B, and they want to estimate statistics of $A cdot B$. We focus on the well-studied $ell_p$-norm, distinct elements ($p = 0$), $ell_0$-sampling, and heavy hitter problems. The goal is to minimize both the communication cost and the number of rounds of communication. This problem is closely related to the fundamental set-intersection join problem in databases: when $p = 0$ the problem corresponds to the size of the set-intersection join. When $p = ınfty$ the output is simply the pair of sets with the maximum intersection size. When $p = 1$ the problem corresponds to the size of the corresponding natural join. We also consider the heavy hitters problem which corresponds to finding the pairs of sets with intersection size above a certain threshold, and the problem of sampling an intersecting pair of sets uniformly at random.
我们考虑一个分布集合中整数上矩阵积的统计估计,其中我们有两方Alice和Bob;Alice持有矩阵a Bob持有矩阵B,他们想要估计a cdot B的统计量。我们专注于研究得很好的$ell_p$-norm、distinct elements ($p = 0$)、$ell_0$-sampling和重磅问题。目标是最小化通信成本和通信轮数。这个问题与数据库中基本的集-交连接问题密切相关:当$p = 0$时,问题对应于集-交连接的大小。当$p = ınfty$时,输出只是具有最大交集大小的集合对。当p = 1时,问题对应于相应的自然连接的大小。我们还考虑了寻找相交大小超过一定阈值的集合对的重拳问题,以及对相交的集合对进行均匀随机抽样的问题。
{"title":"Distributed Statistical Estimation of Matrix Products with Applications","authors":"David P. Woodruff, Qin Zhang","doi":"10.1145/3196959.3196964","DOIUrl":"https://doi.org/10.1145/3196959.3196964","url":null,"abstract":"We consider statistical estimations of a matrix product over the integers in a distributed setting, where we have two parties Alice and Bob; Alice holds a matrix A and Bob holds a matrix B, and they want to estimate statistics of $A cdot B$. We focus on the well-studied $ell_p$-norm, distinct elements ($p = 0$), $ell_0$-sampling, and heavy hitter problems. The goal is to minimize both the communication cost and the number of rounds of communication. This problem is closely related to the fundamental set-intersection join problem in databases: when $p = 0$ the problem corresponds to the size of the set-intersection join. When $p = ınfty$ the output is simply the pair of sets with the maximum intersection size. When $p = 1$ the problem corresponds to the size of the corresponding natural join. We also consider the heavy hitters problem which corresponds to finding the pairs of sets with intersection size above a certain threshold, and the problem of sampling an intersecting pair of sets uniformly at random.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133456260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Operational Approach to Consistent Query Answering 一致性查询应答的操作方法
M. Calautti, L. Libkin, Andreas Pieris
Consistent query answering (CQA) aims to find meaningful answers to queries when databases are inconsistent, i.e., do not conform to their specifications. Such answers must be certainly true in all repairs, which are consistent databases whose difference from the inconsistent one is minimal, according to some measure. This task is often computationally intractable, and much of CQA research concentrated on finding islands of tractability. Nevertheless, there are many relevant queries for which no efficient solutions exist, which is reflected by the limited practical applicability of the CQA approach. To remedy this, one needs to devise a new CQA framework that provides explicit guarantees on the quality of query answers. However, the standard notions of repair and certain answers are too coarse to permit more elaborate schemes of query answering. Our goal is to provide a new framework for CQA based on revised definitions of repairs and query answering that opens up the possibility of efficient approximate query answering with explicit guarantees. The key idea is to replace the current declarative definition of a repair with an operational one, which explains how a repair is constructed, and how likely it is that a consistent instance is a repair. This allows us to define how certain we are that a tuple should be in the answer. Using this approach, we study the complexity of both exact and approximate CQA. Even though some of the problems remain hard, for many common classes of constraints we can provide meaningful answers in reasonable time, for queries going far beyond the standard CQA approach.
一致性查询应答(CQA)旨在为数据库不一致(即不符合其规范)时的查询找到有意义的答案。这样的答案在所有的修复中肯定是正确的,这些修复是一致的数据库,根据某种衡量标准,它们与不一致的数据库的差异是最小的。这个任务通常在计算上是难以处理的,许多CQA研究都集中在寻找可处理的孤岛上。然而,有许多相关的查询没有有效的解决方案,这反映在CQA方法的实际适用性有限。为了解决这个问题,需要设计一个新的CQA框架,为查询答案的质量提供明确的保证。然而,修复和某些答案的标准概念太粗糙,不允许更详细的查询回答方案。我们的目标是为CQA提供一个新的框架,该框架基于修订的修复和查询回答的定义,从而打开了具有显式保证的有效近似查询回答的可能性。关键思想是用操作性定义取代当前修复的声明性定义,该定义解释了如何构造修复,以及一致实例是修复的可能性有多大。这允许我们定义一个元组在答案中的确定程度。利用这种方法,我们研究了精确和近似CQA的复杂性。尽管有些问题仍然很难,但对于许多常见的约束类,我们可以在合理的时间内提供有意义的答案,对于远远超出标准CQA方法的查询。
{"title":"An Operational Approach to Consistent Query Answering","authors":"M. Calautti, L. Libkin, Andreas Pieris","doi":"10.1145/3196959.3196966","DOIUrl":"https://doi.org/10.1145/3196959.3196966","url":null,"abstract":"Consistent query answering (CQA) aims to find meaningful answers to queries when databases are inconsistent, i.e., do not conform to their specifications. Such answers must be certainly true in all repairs, which are consistent databases whose difference from the inconsistent one is minimal, according to some measure. This task is often computationally intractable, and much of CQA research concentrated on finding islands of tractability. Nevertheless, there are many relevant queries for which no efficient solutions exist, which is reflected by the limited practical applicability of the CQA approach. To remedy this, one needs to devise a new CQA framework that provides explicit guarantees on the quality of query answers. However, the standard notions of repair and certain answers are too coarse to permit more elaborate schemes of query answering. Our goal is to provide a new framework for CQA based on revised definitions of repairs and query answering that opens up the possibility of efficient approximate query answering with explicit guarantees. The key idea is to replace the current declarative definition of a repair with an operational one, which explains how a repair is constructed, and how likely it is that a consistent instance is a repair. This allows us to define how certain we are that a tuple should be in the answer. Using this approach, we study the complexity of both exact and approximate CQA. Even though some of the problems remain hard, for many common classes of constraints we can provide meaningful answers in reasonable time, for queries going far beyond the standard CQA approach.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123771240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Active Learning of GAV Schema Mappings GAV模式映射的主动学习
B. T. Cate, Phokion G. Kolaitis, Kun Qian, W. Tan
Schema mappings are syntactic specifications of the relationship between two database schemas, typically called the source schema and the target schema. They have been used extensively in formalizing and analyzing data inter-operability tasks, especially data exchange and data integration. There is a growing body of research on deriving schema mappings from data examples, that is, pairs of source and target instances that depict the behavior of the unknown schema mapping. One of the approaches used in this endeavor casts the derivation of a schema mapping from data examples as a learning problem. Earlier work has shown that GAV mappings (global-as-view schema mappings) are learnable in Angluin's model of exact learning with membership queries and equivalence queries. Here, we validate the practical applicability of this theoretical result by designing and implementing an active learning algorithm, called GAV-Learn that derives a syntactic specification of a GAV mapping from a given set of data examples and from a "black-box" implementation. We analyze the properties of GAV-Learn and, among other results, we show that it produces a GAV mapping that has minimal size and is a good approximation of the unknown GAV mapping. Furthermore, we carry out a detailed experimental evaluation that demonstrates the effectiveness of GAV-Learn along different metrics. In particular, we compare GAV-Learn with two earlier approaches for deriving GAV mappings from data examples, and establish that it performs significantly better than the two baselines.
模式映射是两个数据库模式(通常称为源模式和目标模式)之间关系的语法规范。它们已广泛用于形式化和分析数据互操作性任务,特别是数据交换和数据集成。从数据示例(即描述未知模式映射行为的源实例和目标实例对)派生模式映射的研究越来越多。在此工作中使用的一种方法将从数据示例派生模式映射作为一个学习问题。早期的研究表明,GAV映射(全局即视图模式映射)在Angluin的精确学习模型中是可学习的,该模型具有成员查询和等价查询。在这里,我们通过设计和实现一种称为GAV- learn的主动学习算法来验证这一理论结果的实际适用性,该算法从一组给定的数据示例和“黑盒”实现中派生出GAV映射的语法规范。我们分析了GAV- learn的特性,并且在其他结果中,我们表明它产生具有最小尺寸的GAV映射,并且是未知GAV映射的良好近似值。此外,我们进行了详细的实验评估,以证明GAV-Learn在不同度量下的有效性。特别是,我们将GAV- learn与两种早期的方法进行比较,以从数据示例中获得GAV映射,并确定它的性能明显优于两个基线。
{"title":"Active Learning of GAV Schema Mappings","authors":"B. T. Cate, Phokion G. Kolaitis, Kun Qian, W. Tan","doi":"10.1145/3196959.3196974","DOIUrl":"https://doi.org/10.1145/3196959.3196974","url":null,"abstract":"Schema mappings are syntactic specifications of the relationship between two database schemas, typically called the source schema and the target schema. They have been used extensively in formalizing and analyzing data inter-operability tasks, especially data exchange and data integration. There is a growing body of research on deriving schema mappings from data examples, that is, pairs of source and target instances that depict the behavior of the unknown schema mapping. One of the approaches used in this endeavor casts the derivation of a schema mapping from data examples as a learning problem. Earlier work has shown that GAV mappings (global-as-view schema mappings) are learnable in Angluin's model of exact learning with membership queries and equivalence queries. Here, we validate the practical applicability of this theoretical result by designing and implementing an active learning algorithm, called GAV-Learn that derives a syntactic specification of a GAV mapping from a given set of data examples and from a \"black-box\" implementation. We analyze the properties of GAV-Learn and, among other results, we show that it produces a GAV mapping that has minimal size and is a good approximation of the unknown GAV mapping. Furthermore, we carry out a detailed experimental evaluation that demonstrates the effectiveness of GAV-Learn along different metrics. In particular, we compare GAV-Learn with two earlier approaches for deriving GAV mappings from data examples, and establish that it performs significantly better than the two baselines.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131894816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Explanations and Transparency in Collaborative Workflows 协同工作流程中的解释和透明度
S. Abiteboul, P. Bourhis, V. Vianu
We pursue an investigation of data-driven collaborative workflows. In the model, peers can access and update local data, causing side-effects on other peers' data. In this paper, we study means of explaining to a peer her local view of a global run, both at runtime and statically. We consider the notion of "scenario for a given peer" that is a subrun observationally equivalent to the original run for that peer. Because such a scenario can sometimes differ significantly from what happens in the actual run, thus providing a misleading explanation, we introduce and study a faithfulness requirement that ensures closer adherence to the global run. We show that there is a unique minimal faithful scenario, that explains what is happening in the global run by extracting only the portion relevant to the peer. With regard to static explanations, we consider the problem of synthesizing, for each peer, a "view program" whose runs generate exactly the peer's observations of the global runs. Assuming some conditions desirable in their own right, namely transparency and boundedness, we show that such a view program exists and can be synthesized. As an added benefit, the view program rules provide provenance information for the updates observed by the peer.
我们对数据驱动的协作工作流程进行了调查。在该模型中,对等体可以访问和更新本地数据,从而对其他对等体的数据产生副作用。在本文中,我们研究了向同伴解释全局运行的局部视图的方法,包括在运行时和静态时。我们考虑“给定对等体的场景”的概念,它在观测上等同于该对等体的原始运行。由于这样的场景有时可能与实际运行中发生的情况有很大的不同,从而提供了误导性的解释,因此我们引入并研究了确保更紧密地遵守全局运行的忠实要求。我们展示了一个唯一的最小忠实度场景,它通过仅提取与对等体相关的部分来解释全局运行中发生的事情。关于静态解释,我们考虑为每个对等体合成一个“视图程序”的问题,该程序的运行恰好生成对等体对全局运行的观察。在假设了透明和有界性的条件下,我们证明了这样的视图程序是存在的,并且是可以合成的。作为一个额外的好处,视图程序规则为对等体观察到的更新提供了来源信息。
{"title":"Explanations and Transparency in Collaborative Workflows","authors":"S. Abiteboul, P. Bourhis, V. Vianu","doi":"10.1145/3196959.3196975","DOIUrl":"https://doi.org/10.1145/3196959.3196975","url":null,"abstract":"We pursue an investigation of data-driven collaborative workflows. In the model, peers can access and update local data, causing side-effects on other peers' data. In this paper, we study means of explaining to a peer her local view of a global run, both at runtime and statically. We consider the notion of \"scenario for a given peer\" that is a subrun observationally equivalent to the original run for that peer. Because such a scenario can sometimes differ significantly from what happens in the actual run, thus providing a misleading explanation, we introduce and study a faithfulness requirement that ensures closer adherence to the global run. We show that there is a unique minimal faithful scenario, that explains what is happening in the global run by extracting only the portion relevant to the peer. With regard to static explanations, we consider the problem of synthesizing, for each peer, a \"view program\" whose runs generate exactly the peer's observations of the global runs. Assuming some conditions desirable in their own right, namely transparency and boundedness, we show that such a view program exists and can be synthesized. As an added benefit, the view program rules provide provenance information for the updates observed by the peer.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128959188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Subtrajectory Clustering: Models and Algorithms 子轨迹聚类:模型和算法
P. Agarwal, K. Fox, Kamesh Munagala, Abhinandan Nath, Jiangwei Pan, Erin Taylor
We propose a model for subtrajectory clustering ---the clustering of subsequences of trajectories; each cluster of subtrajectories is represented as a pathlet, a sequence of points that is not necessarily a subsequence of an input trajectory. Given a set of trajectories, our clustering model attempts to capture the shared portions between them by assuming each trajectory is a concatenation of a small set of pathlets, with possible gaps in between. We present a single objective function for finding the optimal collection of pathlets that best represents the trajectories taking into account noise and other artifacts of the data. We show that the subtrajectory clustering problem is NP-Hard and present fast approximation algorithms for subtrajectory clustering. We further improve the running time of our algorithm if the input trajectories are "well-behaved." Finally, we present experimental results on both real and synthetic data sets. We show via visualization and quantitative analysis that the algorithm indeed handles the desiderata of being robust to variations, being efficient and accurate, and being data-driven.
我们提出了一种子轨迹聚类模型——轨迹子序列聚类;每个子轨迹簇被表示为一个路径,一个不一定是输入轨迹子序列的点序列。给定一组轨迹,我们的聚类模型试图通过假设每个轨迹是一小组路径的连接来捕获它们之间的共享部分,其中可能存在间隙。我们提出了一个单一的目标函数,用于寻找最优的路径集合,该集合最能代表考虑到噪声和数据的其他伪影的轨迹。我们证明了子轨迹聚类问题是NP-Hard问题,并给出了子轨迹聚类的快速逼近算法。如果输入轨迹“表现良好”,我们将进一步改善算法的运行时间。最后,我们给出了在真实数据集和合成数据集上的实验结果。我们通过可视化和定量分析表明,该算法确实处理了对变化的鲁棒性,效率和准确性以及数据驱动的要求。
{"title":"Subtrajectory Clustering: Models and Algorithms","authors":"P. Agarwal, K. Fox, Kamesh Munagala, Abhinandan Nath, Jiangwei Pan, Erin Taylor","doi":"10.1145/3196959.3196972","DOIUrl":"https://doi.org/10.1145/3196959.3196972","url":null,"abstract":"We propose a model for subtrajectory clustering ---the clustering of subsequences of trajectories; each cluster of subtrajectories is represented as a pathlet, a sequence of points that is not necessarily a subsequence of an input trajectory. Given a set of trajectories, our clustering model attempts to capture the shared portions between them by assuming each trajectory is a concatenation of a small set of pathlets, with possible gaps in between. We present a single objective function for finding the optimal collection of pathlets that best represents the trajectories taking into account noise and other artifacts of the data. We show that the subtrajectory clustering problem is NP-Hard and present fast approximation algorithms for subtrajectory clustering. We further improve the running time of our algorithm if the input trajectories are \"well-behaved.\" Finally, we present experimental results on both real and synthetic data sets. We show via visualization and quantitative analysis that the algorithm indeed handles the desiderata of being robust to variations, being efficient and accurate, and being data-driven.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132212021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
In-memory Representations of Databases via Succinct Data Structures: Tutorial Abstract 通过简洁的数据结构在内存中表示数据库:教程摘要
R. Raman
In recent years, the field of succinct data structures (SDS) has grown rapidly. SDS store data in main memory space that approaches an information-theoretic minimum, and support operations on the data with little or no slow-down compared to their conventional counterparts. In practice, an SDS uses one to two orders of magnitude less main memory than a conventional data structure. For this reason, SDS are becoming a popular approach for storing data that is only somewhat bigger than main memory. This tutorial explores the fundamentals of SDS and their applications to a variety of database problems.
近年来,简洁数据结构(SDS)领域发展迅速。SDS将数据存储在接近信息论最小值的主存空间中,并且与传统的对应程序相比,支持对数据的操作几乎没有或几乎没有减速。实际上,SDS使用的主存比传统数据结构少一到两个数量级。由于这个原因,SDS正在成为一种流行的方法,用于存储仅比主存稍大的数据。本教程探讨了SDS的基础知识及其在各种数据库问题中的应用。
{"title":"In-memory Representations of Databases via Succinct Data Structures: Tutorial Abstract","authors":"R. Raman","doi":"10.1145/3196959.3196992","DOIUrl":"https://doi.org/10.1145/3196959.3196992","url":null,"abstract":"In recent years, the field of succinct data structures (SDS) has grown rapidly. SDS store data in main memory space that approaches an information-theoretic minimum, and support operations on the data with little or no slow-down compared to their conventional counterparts. In practice, an SDS uses one to two orders of magnitude less main memory than a conventional data structure. For this reason, SDS are becoming a popular approach for storing data that is only somewhat bigger than main memory. This tutorial explores the fundamentals of SDS and their applications to a variety of database problems.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"43 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126117887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1